<<

Cladistics00,387᎐400(1998) WWWhttp:rrwww.apnet.com ArticleNo.cl980082

FastFitch-ParsimonyAlgorithmsforLargeDataSets

FredrikRonquist DepartmentofZoology,UppsalaUniversity,Villavage¨ n9,SE-75236Uppsala,Sweden Receivedforpublication3November1998

Thespeedofanalyticalalgorithmsbecomesincreas- oneofthebestmethodsofphylogeneticinference inglyimportantassystematistsaccumulatelargerdata Ž...,HuelsenbeckandHillis,1993.Comparedwith sets.InthispaperIdiscussseveraltime-savingmodifi- alternativemethodsbasedonthesearchforthebest cationstopublishedFitch-parsimonytreesearchalgo- treeŽ.sunderanexplicitoptimalitycriterion,parsi- rithms,includingshortcutsthatallowrapidevaluation monyanalysisisfast.Yet,thetimeconsumptionmay oftreelengthsandfastreoptimizationoftreesafter beprohibitiveforsomedatasets,forcingcompart- clippingorjoiningofsubtrees,aswellassearchstrate- mentalizationoftheproblemorothermodifications giesthatallowsonetosuccessivelyincreasethe thatpossiblydistorttheresultsŽDonoghue,1994; exhaustivenessofbranchswapping.Ialsodescribehow Nixonetal.,1994..Assystematistsaccumulatelarger Fitch-parsimonyalgorithmscanberestructuredtotake datasets,thesedifficultiesbecomeamajorobstacleto fulladvantageofthecomputingpowerofmodern furtherprogress.Thus,increasedefficiencyofparsi- microprocessorsbyhorizontalorverticalpackingof monyalgorithmsshouldbeanimportantobjectivein characters,allowingsimultaneousprocessingofmany theresearchagendaofphylogeneticsystematists. characters,andbyavoidanceofconditionalbranches Thefundamentalparsimonyoptimizationalgo- thatdisturbinstructionflow.Thesenewmulticharacter rithmsarewellknownŽFarris,1970;Fitch,1970,1971; algorithmsareparticularlyusefulforlargedatasetsof SwoffordandMaddison,1987;Maddisonand characterswithasmallnumberofstates,suchasnu- Maddison,1992;Goloboff,1994.,aswellasgeneral cleotidecharacters.Asanexample,themulticharacter featuresofexactandheuristictreesearchstrategies algorithmsareestimatedtobe3.6–10timesfasterthan implementedincurrentcomputerprogramsŽe.g., single-characterequivalentsonaPowerPC604.The SwoffordandMaddison,1987;Swofford,1993; speedgainisevenlargeronprocessorsusingMMX, SwoffordandOlsen,1991;Kumaretal.,1994.. Altivecorsimilartechnologiesallowingsingleinstruc- However,specificdetailsinthealgorithmsare tionstobeperformedonmultipledatasimultaneously. rarelydiscussed,suchasshortcutsandothertricks ᮊ1998TheWilliHennigSociety thatcanimprovespeedsignificantly.Anotableexcep- tionisGoloboffŽ.1993,1994,whodescribedarapid bisection᎐reconnectionalgorithmaswellasseveral shortcutsimplementedinNONA,oneofthefastest INTRODUCTION programsforheuristicparsimonyanalysisoflarge datasets.IconcurwithGoloboffŽ.1993thatsharing ideasforbettermethodswilleventuallyfosterre- Parsimonyanalysisiswidelyacceptedas finementofparsimonyprogramsandphylogenetic analysis.Inthisvein,Idescribeheresomealgorithms CorrespondencetoFredrikRonquist. fortreesearchesunderFitchparsimonythatarefaster E-mail:[email protected] thanthosepublishedbyGoloboff.

0748-3007r98r040401q10$30.00r0 Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved 387 388 Ronquist

TERMINOLOGYANDASSUMPTIONS shortercanbefound. Thealgorithmsworkwithunrooted,dichotomous treesconsistingofanumberofinternalandterminal ᎐ Iwillonlyconsidertreebisection reconnection nodesconnectedbybranches.Thenodesaredesig- searchesbutmostofthetechniquesareapplicableto natedwithcapitallettersŽ.A,,,;Fig.1A.Atree othertypesofsearches,including subtree canberootedbyaddingarootnodetoanyoneofthe ᎐ pruning regrafting,stepwiseaddition,andbranch- branches.Thus,thereisapotentialrootnodeforeach . ᎐ and-bound Treebisection reconnectiontakesanini- branchinthetree.Thepotentialrootsaredesignated Ž. tialtreeandclipsitintotwoormorecomponents ,whereAandBarethetwonodesadjacenttothe Ž.. AB Goloboff,1993,1994;Swofford,1993 Thesubtrees rootŽ.Fig.1A.Onepotentialrootnodecanbechosen arereconnectedatallpossiblepositionsandthelength ascalculationrootforcalculationpurposes. ofeachrearrangementiscomparedtothatofthe Astatesetisassignedtoaninternalnodeinfirst- . originaltree Whenatreeofthesamelengthasthe passandfinal-passoptimizationbycombininginfor- startingtreeisfound,thenewtreeisaddedtothe mationfromsomeorallofthethreesurrounding . treesetinmemory Ifashortertreeisfound,thetrees nodes.First-passoptimizationresultsinapreliminary inmemoryaredeletedandanewroundofswapping stateset Ž.designatedPAforanodeA whereas . isinitiatedontheshortertree Thesearchhaltswhen final-passoptimizationgivesthefinalstatesetŽdesig- allrearrangementshavebeentriedonalltreesin natedFAforanodeA..Astatesetisusuallyrepre- memoryandnoadditionaltreesofthesamelengthor

FIG.1.Terminologyusedinthepaper.Ž.ACapita llettersareusedtodesignateinternalorterminalnodesinthetreeŽA,B,C,etc.. .Oneach

branchthereisapotentialrootnodedesignatedR AB ,whereAandBarethetwonodesadjacenttotheroot.ŽB,C. Intreebisection᎐ reconnection,aninitialtreeisclippedintotwosubtrees,thesourcetreeandthetargettree.Thetargetsubtreeistheonecontainingtheroot nodethatwasusedforcalculatingthestatesetsofinternalnodesintheinitialtree.Thetwointernalnodesadjacenttotheclip,SandT, becomepotentialrootnodesintheirrespectivetrees,eachreplacingtwopotentialrootnodesintheinitialtree.ŽD. Arecombinedtreeis

obtainedbyconnectingarootnodeinoneofthesubtreesŽ.R VX with arootnodeintheothersubtreeŽR YZ . .

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved ParsimonyAlgorithms 389 sentedinthecomputerbyabinaryvariablewhere structions,onecomplexintegerinstruction,oneload eachbitrecordswhetherastateisincludedintheset instructionandonebranchinstructionperclockcycle Žbitsetto1.ornotŽbitsetto0 ;c.Fig.3A. .The Ž.Anonymous,1995.Aconditionalbranchwhichis numberofbitsrequireddependsonthenumberof predictedcorrectlywillusuallynotaffectthrough- statesinthecharacter.Atwo-statecharacterrequires put,whereasanincorrectpredictionwilltypically avariablewithtwobits,afour-statecharactera causeadelayofthreeclockcycles,andthesevalues variablewithfourbits,andsoon. havebeenusedhere.Ihavefurtherassumedthat Terminalnodeshaveasinglestateset,the therearenodelaysinfetchinginstructionsor observedstateŽ.. PotentialrootnodesŽe.g.,RAD ; datafrommemoryandthatstallscausedbydata Fig.1A. onlyhaveafinalstateset,whichiscalcu- dependenciesareavoided,ifpossible,byefficient latedfromthefinalstatesetsofthetwoadjacent schedulingofinstructions. internalnodesŽ.AandDforR AD ;Fig.1A. Whenatreeisclippedintotwoparts,thepart containingthecalculationrootwillbecalledthe targettreeandtheotherpartthesourcetreeŽ Figs1B, BASICSEARCHSTRATEGY C.Ž.Thetwonodesclosesttotheclip,SandTFig. 1B. ,becomepotentialrootnodesintheirrespective treeaftertheclip,andapairofpreviouspotential First,aninitialnear-minimaltreeisobtainedby rootnodesineachtreebecomesobsolete.For stepwiseadditionorbysomeothermeansŽe.g., . instance,SbecomesapotentialrootnodeŽ.R KL inthe Fouldsetal.,1979;Swofford,1993.Thelengthofthe sourcetreereplacingR andR Ž.Figs1B,C. initialtreeandapreliminarysetofstatesforeach Inreconnectingthetwosubtrees,apotentialroot nodeisthencalculatedusingfirst-passoptimization Ž. nodeinthesourcetree Ž.e.g.,R VX ;Fig.1D is Fitch,1970.Oneproceedsfromtheterminalsto- connectedtoapotentialrootnodeinthetargettree wardsanarbitrarilychosenrootnode,thecalculation

Ž.e.g.,R YZ ;Fig.1D,andthelengthoftheresulting root.Assumethatthecalculationrootisplacedbelow combinedtreeiscalculated. nodeDandwearecalculatingthestatesetofnodeA Formaximumspeed,thealgorithmsshouldbe ŽFig.1A. .Byproceedingfromtheterminalstowards programmedinassemblybuttheyaredescribedhere theroot,wehaveassuredthatPBandPChave inamixtureofBASICandplainEnglishforclarity. alreadybeencalculated.Now,iftheintersectionof ThefollowingsymbolshavebeenborrowedfromC PBandPCisempty,PAistheunionofPBandPC Ž.KernighanandRitchie,1978: andonestepisaddedtotreelength;otherwise,PAis & bitwiseANDoperationŽ correspondingtoan theintersectionofPBandPCŽ.Table1.Whenthe intersection. calculationrootisreached,thetreelengthisknown bitwiseORoperationŽ correspondingtoa andallinternalnodes,includingthecalculationroot, union. havebeenassignedpreliminarystatesets. ; one’scomplementŽbitwiseNOT. Thefinalstatesetsarecalculatedusingfinal-pass !s notequal optimizationproceedingfromthecalculationroot 4 binaryrightshift towardstheterminalsŽ.Fitch,1971.Thealgorithmis < binaryleftshift. fairlycomplicatedwithseveralbranchesŽTable2. , butmostcharactersinnear-minimaltreeswillonly Thetimeneededforanalgorithmdependsonthe passthroughsteps1᎐3and9᎐10. processor,theexactsequenceofinstructionsused, Whentheinitialtreeisclipped,eachsubtreeis memoryorganization,andanumberofotherfactors. againsubjectedtofirst-passandfinal-passoptimiza- Asanexampleillustratingsomeofthetimingconsid- tionŽ.butseetheqsearchshortcutdiscussedbelow.A erationstypicalformodernmicroprocessors,Ihave potentialrootnodeisnowchoseninthesourcetree giventhetheoreticalmaximumthroughputforthe ŽR VX ;Fig.1D. ,anditsstatesareobtainedasthe algorithmsdescribedhereonaPowerPC604proces- unionofthefinalstatesetsofthetwoadjacentnodes sorŽ.c.Table8.ThePowerPC604isasuperscalar ŽVandX;Fig.1D. ŽTable3. .Thelengthofeach processorwhichcanexecutetwosimpleintegerin- rearrangementderivablefromthatrootingiscalcu-

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved 390 Ronquist

TABLE1 TABLE3 Algorithm1—Single-CharacterAlgorithmforFirst-Pass Algorithm3—Single-CharacterAlgorithmforCalculatingthe OptimizationofNonadditiveCharactersŽbasedonFitch,1970. StateSetofaPotentialRootNodeŽbasedonGoloboff,1994. andLengthCalculation Step Instruction Step Instruction 1 LoadFVandFX

1 LoadPBandPC 2FRVX sFVNFX 2 LetPAsPB&PC 3 StoreFRVX 3IfPA!s0goto6 4 Proceedwithnextcharacter 4 LetPAsPBNPC 5 LetTLsTLq1 SymbolsexplainedintextandFig.1. 6 StorePA 7 Proceedfrom1withnextcharacter treesafterclipping.Bydefinition,thecalculationroot . . TListreelength;forothersymbols,seetextandFig 1 intheinitialtreeendedupinthetargetsubtreeŽFig. 1C..Now,onecanseethatifthepreliminaryand latedbycombiningthestatesetoftherootnodewith finalstatesetsofnodeSareidenticalbeforetheclip, thefinalstatesetsoftwoadjacentnodesinthetarget thenreoptimizationwillnotaffectthefinalstatesets treeŽ.YandZ;Fig.1D.Nostepisaddedifastateis inthesourcetree.Alargefractionofthecharactersin

sharedbetweenR VX andYorbetweenR VX andZ; near-minimaltreeswillfulfilthiscondition,sothe otherwise,onestepisaddedŽTable4;Goloboff,1993, qsearchshortcutcansaveconsiderableamountsof 1994..Toavoidunnecessarycalculations,thelength timeŽGoloboffreportedspeedgainsof20timesor istestedagainstthelengthoftheinitialtreeeach more..Unfortunately,thesameshortcutcannotbe timeastepisadded.Ifthelengthexceedsthatofthe usedinthetargettree.GoloboffŽ.1993suggested shortesttreeŽ.s,onecanproceedwiththenexttree usingvariouscomparisonsofstatesetsaroundthe rearrangement. clippedbranchtoguesswhichcharactersneedreopti- Whenallthepotentialrootsofthesourcetreehave mizationinthetargettree,butthismayintroduce beentriedonallpossiblebranchesofthetargettree, errorsintreelengthcalculations. anewclippingoftheinitialtreeisexamined.The Theqcollapseshortcutdealswithtreecomparison possiblerearrangementsoftheinitialtreeare whenunsupportedbranchesarecollapsedŽGoloboff, exhaustedwhenallpossibleclippingshavebeen 1993..Beforeatreeisclipped,eachbranchistested examined. againstthecollapsecriterion.Ifarearrangementre- IntheprogramdocumentationtoPIWErNONA, GoloboffŽ.1993describedtwoimportantshortcuts. Theqsearchshortcutconcernsreoptimizationofsub- TABLE4 Algorithm4—Single-CharacterAlgorithmforEvaluatingthe LengthofaTreeRearrangementfromtheStateSetofaRoot NodeintheSourceTreeandtheFinalStateSetsofTwo TABLE2 AdjacentNodesintheTargetTreeŽbasedonGoloboff,1994. Algorithm2—Single-characterAlgorithmforFinal-Pass OptimizationofNonadditiveCharacters Step Instruction Ž. basedonFitch,1971 a 1 LoadFR XV ,FY,andFZ Step Instruction 2 LetSsFR XV &ŽFYNFZ. 3IfSs0goto6 1 LoadPAandFD 4 LetALsALq1 2 LetFAsPA&FD 5IfAL)DIFFstop 3IfFAsFDgoto9 6 Proceedfrom1withnextcharacter 4 LoadPBandPC a 5IfŽPB&PC. s0goto8 Toavoidstallsbecauseofdatadependencies,loadoperations 6 LetFAsŽŽP BNPC. &FD. NPA mustbedoneonecycleaheadoftheotherinstructions,i.e.,load 7Goto9 instructionsŽstep1. forthenextcharactermustbeissuedbefore 8 LetFAsFDNPA calculatingtheunionŽstep2. ofthecurrentcharacter. 9 StoreFA AListhelengthaddedbythejoiningofsubtrees,DIFFisthe 10 Proceedfrom1withnextcharacter differencebetweenthelengthoftheinitialtreeandthesummed lengthoftheclippedtrees,andSisatemporaryvariable.Other SymbolsexplainedintextandFig.1. symbolsareexplainedinthetextandinFig.1.

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved ParsimonyAlgorithms 391 sultinginatreeofthesamelengthastheinitialtree TABLE5 didnotmovethesourcetreeacrosssomesupported Algorithm5—Single-CharacterAlgorithmforEvaluatingthe LengthofaTreeRearrangementfromtheStateSetofaPotential branchesinthetargettree,theinitialandrearranged RootNodeintheSourceTreeandaPotentialRootNodein treearelikelytocollapsetothesamepolytomous theTargetTree tree,andtherearrangementcanbediscardedbefore Step Instruction timeiswastedoncomparingitwithtreesinmemory.

1 LoadFR XV ,FRYZ 2 LetSsFR XV &FRYZ 3IfSs0goto6 4 LetALsALq1 SOMEIMPROVEMENTS 5IfAL)DIFFstop 6 Proceedwithnextcharacter

SymbolsasinTable4.DIFFcanbecalculatedusingthealgo- Mostofthetimeduringatreebisection᎐ rithmonthetworootnodesadjacenttotheclipintheinitialtree Ž. . . reconnectionsearchisspentexaminingthelengthof R KL andR MN ;Fig 1C alternativerearrangementsŽ.algorithminTable4,so thespeedofthisalgorithmiscrucialtooverallspeed. beforethestatesetsofallrootnodeshavebeenused. Iftherearemanyoptimaltreesofequallength,con- However,thepotentialrootnodestatesetscanbe siderabletimewillalsobeconsumedbytreecompar- transferredtothenewtreewithonlyminorchanges isonsŽ.thisalgorithmisnotdiscussedhere.Thespeed aswillbedescribedbelow. ofsubtreereoptimization ŽalgorithmsinTable1 Goloboff Ž.pers. comm. uses an alternative and2. isoflittleimportanceintreebisection᎐ approachthatavoidsprecalculationofrootstates. reconnectionsearchesexceptfortheearlyphasewhen ThealgorithminTable4ismodifiedbyloadingonly rearrangementsleadingtoshortertreesarefound oneofthestatesetsinthetargettreetogetherwith often,butitissignificantthroughoutsubtree . pruning᎐regraftingsearches. therootstatesetofthesourcetree Theunionof . Thetimerequiredtocalculatethelengthofarear- thesevaluesisthencalculated Ifitisempty,the rangementwiththealgorithminTable4dependson secondstatesetinthetargettreeisloadedandanew thefrequencyofcharactersthatchangeonthebranch. unionwiththerootstatesetiscalculated.Ifthisis Formaximumspeed,thebranchinstructionŽstep3; alsoempty,onestepisaddedtotreelength.This Table4. shouldbepredictedastaken.Thethrough- algorithmisfastformostnonchangingcharacters Ž. putonaPowerPC604willthenbeonecharacter requiringtwocyclesbutisveryslowforchanging withoutchangeperthreeclockcyclesŽassumingthat charactersŽ.requiringatleastninecyclesbecause loadsaredoneonecycleaheadofdependentinstruc- calculationofthesecondunionhastowaituntilthe tionssothatstallscausedbyloadlatenciescanbe loadcompletes.Inaddition,somenonchangingchar- avoided. andonecharacterwithchangeperseven acterswillbedelayedbecausethefirstunionis cyclesŽthreeadditionalcyclesforrecoveringfromthe emptyŽrequiringatleasttencyclesifthesecond mispredictedbranchandonecyclefortheotherin- branchispredictedasnottaken..Therelativeadvan- structionsinthebranch.. tageofprecalculatingrootstatesdecreasesordisap- Thespeedofthisalgorithmcanbeimprovedby pearsifloadinstructionsthatcouldbescheduled calculatingandstoringthestatesetsofallpossible aheadofdependentinstructionsarenot Žasis rootnodesinthesourceandtargettreesbefore probablycommoninmostexistingprograms.. recombiningthem.Althoughthisprocedurerequires Withanew,improvedqsearchshortcut,onlyafew twiceasmuchmemory,itreducestherequiredload statesetsinthesourceandtargettreesneedbe instructionsfromthreetotwopercharacterŽ.Table5 recalculatedafterclipping.Thisisdonebymoving anddecreasesthehandlingtimepercharacterwith awayfromtheclippedbranchinbothsubtrees,up- 14᎐33%Žfromseventosixclockcyclesforcharacters datingnodesetsontheway,andstoppingassoonas changingstateandfromthreetotwoclockcyclesfor noadditionalchangewilloccurfurtherawayfrom othercharacters..Ofcourse,onerunstheriskthata theclippedbranchŽ.Fig.2.Assumethatthenode treeshorterthantheonesinmemorywillbefound closesttotheclipisusedasthecalculationrootinthe

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved 392 Ronquist

sourcetreeŽ.Fig.2A.Onlythefinalstatesetsneed theclippedbranchneedtobereoptimized;actually, thenbeconsideredinthesourcetree,becausethe mostcharactersinnear-optimaltreeswillneedno preliminarystatesetswillnotbeaffectedbytheclip. reoptimizationatall. IfthefinalandpreliminarystatesetsofnodeSŽFig. Asimilarshortcutcanbeusedinthereoptimiza- 1B. arethesame,noreoptimizationisnecessaryin tionofthetargettree.Theclipdividesthetargettree thesourcetreeŽGoloboff’sŽ1993.originalqsearch inarootpartŽ.theN-partcontainingthecalculation shortcut..Ifthesetsdiffer,onecalculatesthenew rootandacrownpartŽ.theM-part;Figs1B,2B.

finalstatesetsforthetwodescendantsofR KL ,i.e., Assumingthatthesamecalculationrootisused,all nodesKandLŽ.Figs1C,2A.Thenewfinalstatesets preliminarystatesetsinthecrownpartwillbeunaf- areagaincomparedtothefinalstatesetsintheinitial fectedbytheclip.Now,onecomparestheprelimi- tree.Ifthenewandoldsetsareidentical,itis narysetofTwiththepreliminarysetofMŽ.Fig.1B. unnecessarytoproceedfurtherawayfromthecalcu- Iftheyareidentical,therewillbenochangeinthe lationrootinthatdirection;otherwise,thenextsetof preliminarystatesetsofthetargettree.Ifnot,anew descendantnodesareconsidered.Unlessacharacter preliminarystatesetiscalculatedfornodeNand isextremelyhomoplasticonthetreebeingexamined, comparedwiththeoldset.Aslongasthestatesets itisunlikelythatmorethanafewnodesawayfrom arenotequivalent,onecontinuesdownthetree

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved ParsimonyAlgorithms 393

TABLE6 TABLE7 Algorithm6—Single-CharacterAlgorithmforFirst-Pass Algorithm7—Single-CharacterAlgorithmforFinal-Pass ReoptimizationandStateTest ReoptimizationandStateTest,IncludingNecessary UpdatesofPotentialRootNodeStateSets Step Instruction Step Instruction 1Load*PA,PBaa,andPC 2 LetSsPB&PC 1 LoadPA,FDand*FA 3IfS!s0goto5 2 LetSsPA&FD 4 LetSsPBNPC 3IfSsFDgoto9 5IfSs*PAstop 4 LoadPBandPC 6 Push*PAanditsaddressontostack 5IfPB&PCs0goto8 7 StoreSŽreplacing*PA. 6 LetSsŽŽP BNPC. &FD. NPA 8 ProceedwithnextnodeŽancestorofA. 7Goto9 8 LetSsFDNPA a a Ifthisisnotthefirstnodebeingreoptimised,oneofthePBor 9IfSs*FAgoto17 PCloadscanbereplacedwitharegistermoveinstruction. 10 Push*FAanditsaddressontostack Aprefixedasteriskdenotesastatesetthathasnotbeenupdated. 11 StoreSŽreplacing*FA. ForothersymbolsseetextandFig.1. 12 LetSsSNFD 13 Push*FR AD anditsaddressontostack 14 StoreSŽreplacing*FRAD . towardsthecalculationrootŽ.Fig.2B.Whenanodeis 15 PushpointertorightdescendantofA reachedforwhichthesetsarethesame,thefirst-pass ontostack 16 Proceedfrom1withleftdescendantofA reoptimizationiscompletedandonereturnstothe 17 Poppointerfromstack previousnodeŽthelastnodethathaditsstateset 18 Ifstackemptyproceedwithnextcharacter changed. andrecalculatesthefinalstatesforthat 19 Proceedfrom1withthenodepointedtoby thepointer node.Thefinal-passreoptimizationproceedsupthe tree,consideringalldescendantbranches,aslongas a Inthetargettree,theconditionAnotonthepathtothe thenewsetdiffersfromtheoldoneorthenodeis clippedbranchshouldalsobemetforthebranchtobetaken. . Ž Aprefixedasteriskdenotesastatesetthathasnotbeenupdated betweenthecalculationrootandtheM-nodei.e., ForothersymbolsseetextandFig.1. ancestraltotheT-node..Ontheway,thestatesetsof theaffectedpotentialrootsareupdated.Formost charactersinnear-optimaltrees,itwillbesufficientto Goloboff’sshortcutcanbeappliedexactlyonlytothe assurethatthepreliminarystatesetsoftheM-node sourcetree,whereastheonedescribedhereallows andT-nodeandthefinalstatesetsoftheN-nodeand exactandselectivereoptimizationofbothsubtrees. -nodeareidentical;noreoptimizationwillbe Second,Goloboff’sshortcuthasanall-or-none needed. response.Ifapossiblechangeinoptimizationis Thealgorithmsforfirst-passandfinal-passreopti- detectedforacharacter,thecharacterwillbereopti- mizationincludingstatesetcomparisonsŽTables6 mizedfortheentiresubtree.However,eveninthose and7. areconsiderablyslowerthanthesimple casesitisunlikelythatthestateassignmentswill first-passandfinal-passalgorithms,butthiswillbe changeformorethanafewnodesclosetotheclipped morethancompensatedforbythefactthatatmosta branch.Theshortcutdescribedhereonlymakesthe fewinternalandrootnodesneedtobereoptimised necessarychanges. foreachcharacter.Unlesstherearehighlevelsof Theimprovedqsearchshortcutdescribedherewas homoplasy,whichisunlikelyinnear-optimaltrees, independentlydiscoveredanddescribedunderthe theqsearchshortcutwillbringdownreoptimization name‘‘incrementaltwo-passoptimization’’by timesconsiderably.Keepingthenumberofcharacters GoloboffinapaperŽ.Goloboff,1996thatwasinpress constant,thereoptimizationtimeshouldbelargely whenthepresentpaperwasfirstsubmitted.Asnoted independentoftreesizeŽaslightdecreasemayactu- byGoloboffŽ.1996,theimprovedqsearchshortcutis allyoccurifthenumberofcharacterchangesper considerablyfasterthanincrementaloptimizationin branchdecreaseswithincreasingtreesize;c.Goloboff, itsoriginalformulationŽ.Gladstein,1997. 1993.. Goloboff’sincrementaltwo-passoptimization Theqsearchshortcutdescribedhereisbetterthan recordsalocalcostforeachnodeinthetree.This thatofGoloboff Ž.1993 intworespects.First, valueisusedincalculatingthesummedlengthofthe

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved 394 Ronquist

clippedtrees.However,thelocalcostvaluesarenot parsimony-analysisprograms Že.g.,Swoffordand necessary.Totaltreelengthneedbecalculatedonly Maddison,1987,Swofford,1993. butwasnotconsid- onceduringanentiretreesearch,forinstanceforthe eredbyGoloboffŽ.1994.However,thefirstcompari- firsttreetobeswappedupon.Duringtherestofthe soninthefinal-passoptimizationalgorithmŽstep3, searchitissufficienttoworkwithlengthdifferences. Table2:FDsPA&FD. isequivalenttoatestof Whenatreeisclipped,thedifferencebetweenthe whetherornotthemaximumlengthofthebranch initialtreelengthandthesumofthesourceand is0.Thus,branchescanbetestedagainstthe targettreelengthsisobtainedbycalculatingthe maximum-lengthcollapsecriterionduringfinal- passoptimizationwithoutanyextracalculations. lengthaddedbycombiningtherootnodesR KL and

R MN afterreoptimizationŽ.Fig.1C,asifthesubtrees weretoberejoinedwheretheinitialtreewasclipped. Thislengthdifferenceisthenusedtodetermine whetheranewrearrangementissuccessfulinfinding ADDITIONALSHORTCUTS atreeofthesamelengthasthoseinmemoryor shorter. ᎐ Whenatreeshorterthanthoseinmemoryisfound, Inverylargeanalyses,treebisection reconnection searchesmaybeprohibitivelytime-consumingde- thetreestackisclearedandthenewtreeisusedas spitetheshortcutsdiscussedabove.Onepossibilityis thestartingpointforclipping.Thestatesetsforthe thentolimitswappingtonearestneighbourinter- newstartingtreeneednotberecalculatedfrom changesorsubtreepruning᎐regrafting ŽSwofford, scratch.Instead,itispossibletoupdatethestatesets 1993..However,alternativewaysofrestrictingthe ofthesourceandtargetsubtreesusingessentiallythe swappingmaybemoreefficient.Isuggesttwoap- sameprocedureasinthereoptimizationofthetarget proacheswhichshouldbeexaminedinmoredetail subtreeafterclipping.First,preliminarystatesetsare byempiricalstudies.First,treeclippingmaybe updatedfromthepointofreuniontowardstheroot restrictedtothellongestbranchesintheinitialtree. untilthenewandoldpreliminarysetsagree.One Thesearetheclippingsthatseemmostlikelytolead thenmovesupwardsusingfinal-passreoptimization toshortertrees.Thisstrategyissimilartohowmor- untilthefinalstatesetsagree.Thus,thepreliminary, phologistscompartmentalizelargephylogeneticanal- final,androotnodestatesetsofthenewtreeare ysesbytreatingwelldefinedgroupsseparately . obtainedwithaminimumofcalculations Ž.Donoghue,1994.Onecancarrytheanalogyonestep Whenanewtreeofthesamelengthasthosein furtheranddoseparateparsimonyanalysesonthe memoryisfound,itisaddedtothetreestack.Only clippedsubtrees,butthisisnotnormallydoneintree thetopologyofthetreeissaved,otherwisetoomuch bisection᎐reconnectionsearches.Second,treerear- memorywouldberequired.Whenthetreeisrecalled rangementsmayberestrictedtothemnodesclosest fromthestack,thepreliminary,finalandrootnode totheclip.Again,thesearetherearrangementswhich statesetshavetoberecalculatedfromscratch. appearmostlikelytoyieldshortertreesŽGoloboff, However,suchfulloptimizationswillnotbeneeded 1993..Ifms1,thisneighbourhoodswappingis veryoften,andwillonlytakeasmallfractionofthe equivalenttonearestneighbourinterchanges.Asthe totalsearchtime. valueofmincreases,thesearchbecomesmoreex- Ifzero-lengthbranchesaretobecollapsed,itis haustiveuntilitconvertsintotreebisection᎐ essentialthatbranchescanbecheckedagainstthe reconnection.Subtreepruning᎐regraftingispeculiar collapsecriterionquickly.Twocriteriaareincom- inthatonenodeinthesourcetreeiscombinedwith monuse:minimumlengthandmaximumlength.The allnodesinthetargettree,eventhosefarawayfrom minimumlengthofabranchiseasilyobtainedfrom theclip.Assumingthatfardisplacementsofthe thefinalstatesetsofthetwonodesincidenttothe sourcetreeareunlikelytoyieldshortertrees,subtree branchŽ.Goloboff,1994asthenumberofcharacters pruning᎐regraftingwillbelessefficientthanneigh- forwhichtheintersectionofthefinalstatesetsis bourhoodswappingforagivencomputationaleffort. empty.Maximumlength,whichisastrictercriterion Anadditionaladvantageofneighbourhoodswapping Ž.fewerbranchesarecollapsed,isusedinsome oversubtreepruning᎐regraftingisthattheformer

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved ParsimonyAlgorithms 395 canmakebetteruseoffastcachememory.Aneffi- onevariableisusedforeachcharacterstateŽ.Fig.3B. cientuseofneighbourhoodswappingwouldbeto Characterpackingismostefficientforcharactersthat startsearcheswithasmallneighbourhoodandthen haveaconstantandsmallnumberofstates,suchas increasethesizeoftheneighbourhoodasitbecomes nucleotidecharacters. moreandmoredifficulttofindshortertrees. Multicharacteralgorithmshavetobeformulated foraspecificnumberofstatesdeterminingthemaxi- mumnumberofdifferentstatesthatthealgorithms MULTICHARACTERALGORITHMS canhandle.Nucleotidecharacterscanbeanalysed withalgorithmsdesignedforcharacterswithfour statesŽ.gapscodedasstateunknownorforcharac- Itispossibletofurtherincreasethespeedofthe terswithfivestatesŽgapscodedasanadditional searchalgorithmsbytakingadvantageoftwofea- state..IntheAppendix,IpresentFitch-parsimony turesofmodernmicroprocessorssuchasthePentium algorithmsforbothhorizontallypackedandverti- andPowerPCprocessorsŽ.Anonymous,1998a,1998b. callypackedcharacterswithamaximumoffour First,theseprocessorshandlelargeunitsŽ32or64 states. bits,withAltiVectechnology128bits. ineachclock Inprinciple,thelength-of-recombinationalgorithm cycle.Becausecharactersrarelyhavemorethanfour Ž.Table4couldbeusedwithonlyslightmodifica- orfivedifferentstates,theprocessorcanoptimize tionsforhorizontallypackedcharactersbyfeeding manycharacterssimultaneouslygivensuitablealgo- thefirstthreeoperationscharactersetsratherthan rithms.Second,theseprocessorsaresuperscalar characters,andthenextractingtheresultofthe Žseveralindependentexecutionunitsoperateinpar- AND-operationinstep3onecharacteratatimeand allel.Žandpipelinedeachinstructiongoesthrough testingitagainstzero.However,toavoidthebranch severalsuccessivestagesbeforebeingcompleted. for instructions,whichcannotbepredictedefficiently,I maximumthroughput.Aconditionalbranchinstruc- suggestusingamaskwitheveryfourthbitsetŽfor tionwillsignificantlydegradetheperformanceof four-statecharacters. togetherwithshiftinstructions suchasystemunlessitalmostalwaysgoesthesame Ž.steps3᎐6;Algorithm8inAppendixtoobtaina waysothattheoutcomecanbepredictedcorrectly binarynumberFinwhicheveryfourthbitrecordsfor mostofthetime. thecorrespondingcharacterwhetherachange Therearetwooptionsforpackingcharactersinto occurredŽbitsetto1.ornotŽ.bitsetto0onthe largerunits:horizontalpacking,inwhichseveral branchbeingconsidered.AllbitsinFcanbefilledby single-charactervariablesareconcatenatedtoforma completingthreeadditionalcharactersetsbeforeup- largerunitŽ.Fig.3B,andverticalpacking,inwhich datingthetreelength.Aloopingbit-counterwith

FIG.3.Differentformsofcharacterpackingforafour-statenucleotidecharacteronan8-bitmachine.ŽA. Nopacking.Onevariableisused toholdthestatesetofasinglecharacter.ŽB. Horizontalpacking.Thestatesetsofseveralcharactersareconcatenatedtoformasingle variable.ŽC. Verticalpacking.Onevariableisusedforeachstateandrecordsinformationfrommanycharacters.

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved 396 Ronquist

twobranchesisusedŽ.steps8᎐11toupdatethetree length.Bothbranchescanbepredictedefficiently:the firstasnot-takenandthesecondastaken.However, theloopingbit-countercannottakefulladvantageof theparallelexecutionunits.Whenmanycharactersin thesetorsetsbeingconsideredchangestateonthe branch,anexplicitbit-counterwhichextractsallbits inFoneatatimeandaddsthemtothetreelengthis fasterŽ.c.steps9᎐13;Algorithm9inAppendix.To increasespeedwhenFequalszero,Fisfirsttested againstzeroŽ.step8. Thecorrespondingalgorithmforverticallypacked charactersŽ.Algorithm9usesafewsimpleoperations toobtainthebinarynumberF,inwhicheverybit recordswhetherornotacharacterchangedstate. Bothaloopingbit-counterŽ.c.Algorithm8andan explicitbit-counterŽ.Algorithm9canbeusedto updatethetreelength. Therelativespeedofthesingle-characterand multicharacterlength-of-recombinationalgorithmson thePowerPC604variesdependingonthetypeof bit-counterusedandthefrequencyofcharactersthat changestateŽ.Fig.4.Thevertical-packingalgorithms arefrom2.2᎐5.8timesfasterthanthesingle-character algorithm,whilethealgorithmsforhorizontally packedcharactersareslightlyslowerŽdatanotshown here..Theloopingbit-counterisfasterthantheex- plicitbit-counterwhenfewcharacterschangestate andviceversa.Duringtreebisection᎐reconnection searchesoflargedatasets,onemightexpectmost rearrangementstoberelativelypoorfits,sothatthe FIG.4.Comparisonofthespeedofsingle-characterandmulti- meanproportionofcharacterschangingonthe characteralgorithmsduringcalculationofthelengthofatree recombinationonaPowerPC604processorŽa32-bitprocessor. . . branchesbeingevaluatedwouldbefairlyhigh Amongthemulticharacteralgorithms,thespeedisgivenfora Therefore,itislikelythatoverallperformancewould vertical-packingalgorithmwithaloopingbit-counterŽc.Table8. bebetterwithanexplicitbitcounter. andavertical-packingalgorithmwithanexplicitbitcounterin- Themulticharacterfirst-passandfinal-passreopti- cludingtwotestsagainstzero,oneaftertheloadoperationsand Ž.Ž. mizationalgorithmsŽ.Algorithms10᎐13useinteger oneafter16bitshavebeencountedc.Table9.ATimecon- sumptioninnumberofcyclesper32characters.ŽB. Relativespeed logicaloperationsinsteadofconditionalbranches gainofthemulticharacteralgorithms.Theexplicitbitcounteris Žcomparewiththecorrespondingsingle-characteral- superiortotheloopingbitcounterwhenmanycharacterschange gorithmsinTable6and7..Theyareconsiderably state. fasterthantheirsingle-characterequivalentsŽ.Table8 buttherawspeeddoesnottellthewholetruth.The qsearchshortcutislessefficientwithmulticharacter reducedconsiderablybycombiningcharacterssuch algorithmsbecauseafullsetofcharacterswillhave thatcharacterswithhighlycongruentstatedistribu- tobereoptimizedevenwhenreoptimizationistruly tionsendupinthesamecharactersets.Suchcombi- neededforonlyafeworevenasinglecharacterin nationwillalsospeedupthelength-of-recombination theset.Horizontal-packingalgorithmsarelesssus- algorithmsŽ.Algorithms8and9considerablybyin- ceptibletothisproblemsincetheycombinefewer creasingthefrequencyofcharactersetswitheitherno charactersinthesameset.Theproblemcanbe changingcharactersormanychangingcharactersŽc.

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved ParsimonyAlgorithms 397

TABLE8 TimingCharacteristicsofSingle-CharacterandVertical-PackingMulticharacterAlgorithmsonaPowerPC604Processor. TheThroughputisGivenastheNumberofClockCyclesRequiredfor32Four-StateCharacters.TheVertical-Packing Length-of-RecombinationAlgorithmisAssumedtouseanExplicitBit-Counter

Single-character Verticalpacking Algorithm Needed a Table Cycles Table Cycles Speedgainb

First-passwithlengthcalculation Once 1 96 Ž12. 13 7.4 Simplefirst-pass Rarely Ž.1 96 Ž12. 13 7.4 Final-passincl.potentialroots Rarely Ž.2᎐3 128 Ž13. 25 5.1 Reoptimizationtest Often — 64 — 8 8.0 First-passreoptimization Intermediate 6 192 12 26 7.4c Final-passreoptimization Intermediate 7 384 13 41 9.4c Lengthofrecombination Veryoften 5 128 9 36d 3.6 Restorestatesets Intermediate — 96 — 9 10.7c

a Approximatefrequencywithwhichthealgorithmisusedduringatreebisection᎐reconnectionsearch. b Rawspeedgainofvertical-packingalgorithmoversingle-characterequivalentassuming50%ofcharacterschangeŽ length-of-recombina- tionalgorithm.ornocharacterschangestateŽotheralgorithms. . c Netspeedgainwillbeconsiderablysmallerforthesealgorithmsforreasonsexplainedinthetext. d Assumingtheuseofanexplicitbitcounter.

Fig.4B..Evenintheworstcase,theqsearchshortcut REFERENCES wouldnotbeslowerwithmulticharacteralgorithms thanwithsingle-characteralgorithms. Anonymous.Ž1995. .‘‘PowerPC604RISCMicroprocessorUser’s Manual’’.Programanddocumentationavailableonthe PROSPECTS Internetathttp:rrwww.mot.comrSPSrPowerPCrteksupportr teklibraryrindex.html. Anonymous.Ž1998a. .‘‘AltiVecTechnologyProgrammingEnviron- Therecentdevelopmentoftechnologiessuchas mentsManual. Preliminary Version’’. Program and documentationavailableontheInternetathttp: MMX ŽAnonymous,1998b. andAltiVec ŽAnony- rr www.mot.com SPS PowerPC teksupport teklibrary . r r r r r mous,1998afurtheraccentuatestheadvantagesof index.html. multicharacterparsimonyalgorithms.BothMMXand Anonymous.Ž1998b. .‘‘MMXtechnologydevelopersguide’’.Pro- AltiVecallowprocessorstosimultaneouslyperform gramanddocumentationavailableontheInternetathttp:rr thesameinstructiononlargerunitsthannormally developer.intel.comrdrgrmmxrdgr. handledbytheprocessorŽ64bitsforMMXand128 Donoghue,..Ž1994. .Progressandprospectsinreconstructing forAltiVec.,acceleratingmulticharacteralgorithms. plantphylogeny.Ann.MissouriBot.Gard.81,405᎐418. Ž. Forinstance,avertical-packingmulticharacteralgo- Farris,J.S.1970.MethodsforcomputingWagnertrees.Syst.Zool. 19,83᎐92. rithmforcalculatingthelengthofarearrangement Farris,J.S.Ž1988. .‘‘Hennig86,Version1.5’’.Programanddocu- Ž. Algorithm9islikelytobeaboutanorderofmagni- mentationdistributedbyD.Lipscomb,GeorgeWashington tudefasterthanacorrespondingtraditionalŽbutpro- University,Washington,D.C. cessor-optimized. single-characteralgorithm.These Fitch,.M.Ž1970. .Distinguishinghomologousfromanalogous enormousspeedgainsshouldhelpovercomethe proteins.Syst.Zool.19,99᎐113. disadvantageofhavingtoadaptcorealgorithmsin Fitch,W.M.Ž1971. .Towarddefiningthecourseofevolution: parsimonyanalysisprogramstothetypeofprocessor Minimumchangeforaspecifictreetopology.Syst.Zool.20, 406᎐416. beingusedinthemachinestheyarerunningon. Goloboff,P.A.Ž1993. .‘‘NONA,Version1.1’’.Computerprogram anddocumentationdistributedbyJ.M.Carpenter,American MuseumofNaturalHistory,NewYork. ACKNOWLEDGEMENTS Goloboff,P.A.Ž1994. .Characteroptimizationandcalculationof treelengths.Cladistics9,433᎐436. Goloboff,P.A.Ž1996. .Methodsforfasterparsimonyanalysis. IamgratefultoPabloGoloboffforcommentsandsuggestions. Cladistics12,199᎐220.

Copyrightᮊ1998byTheWilliHennigSociety Allrightsofreproductioninanyformreserved 398 Ronquist

Gladstein, D. S. Ž.1997 . Efficient incremental character optimiza- 10 If AL)DIFF proceed with next tion. Cladistics 13,21᎐26. rearrangement Huelsenbeck, J. P., and Hillis, D. M. Ž.1993 . Success of phylogenetic 11 If ! 0goto8 methods in the four-taxon case. Syst. Biol. 42, 247᎐264. s Kernighan, B. W., and Ritchie, D. M. Ž.1978 . ‘‘The C Programming 12 Proceed with next character set Language’’, 2nd ed. Prentice-Hall, London. Kumar, S., Tamura, ., and Nei, M. Ž.1994 . MEGA: Molecular evolutionary genetics analysis software for microcomputers. Algorithm 9 Computer Appl. Biosci. 10, 189᎐191. Four-state vertical-packing algorithm for calculat- Maddison, W. P., and Maddison, D. R. Ž.1992 . ‘‘MacClade: Analysis of Phylogeny and Character Evolution’’. Sinauer, ing whether the length added by combining the root Sunderland, MA. nodes RVX and R YZ Ž. Fig. 1 is smaller than DIFF, the Nixon, K. C., Crepet, W. ., Stevenson, D., and Fries, E. M. Ž.1994 . difference between the length of the initial tree and A reevaluation of seed plant phylogeny. Ann. Missouri Bot. the summed length of the source and target trees. Gard. 81, 484᎐533. The variable PRVX 0 contains information about state Swofford, D. L. Ž.1993 . ‘‘PAUP: Phylogenetic Analysis Using Parsi- 0 for potential root node R , PR 1 information mony, Version 3.1’’. Program and Documentation, Laboratory of VX VX Molecular Systematics, Smithsonian Institution, Washington, about state 1, and so on. An explicit bit counter is D.C. used to update the added length based on the Swofford, D. L., and Maddison, W. P. Ž.1987 . Reconstructing ances- number of bits set in FŽ. steps 9᎐14 . tral character states under Wagner parsimony. Math. Biosci. 87, 199᎐229. 1 Load PRVX 0, PR VX 1, PR VX 2 and PR VX 3 Swofford, D. L., and Olsen, G. J. Ž.1990 . Phylogeny reconstruction. 2 Load PR 0, PR 1, PR 2 and PR 3 In ‘‘Molecular Systematics’’Ž. D. M. Hillis, and C. Moritz, Eds , YZ YZ YZ YZ pp. 411᎐501. Sinauer, Sunderland. 3 Let S0sPRVX 0 & PR YZ 0 4 Let S1sPRVX 1 & PR YZ 1 5 Let S2sPRVX 2 & PR YZ 2 6 Let S3sPRVX 3 & PR YZ 3 APPENDIX 7 Let Fs;Ž.S0NS1NS2NS3 8IfFs0goto14 a Multicharacter Fitch-parsimony algorithms 9 For Is1 to number of bits in F 10 Let ALsALqŽ.F&1 11 If AL)DIFF proceed with next Algorithm 8 rearrangement Four-state horizontal-packing algorithm for calcu- 12 Let FsF41 lating whether the length added by combining the 13 Next I

root nodes RVX and R YZ Ž. Fig. 1D is smaller than 14 Proceed with next character set DIFF, the difference between the length of the initial tree and the summed length of the source and target Algorithm 10 trees. MASK is a binary mask with every fourth bit setŽ. 000100010001 ... . A looping bit counter is used Four-state horizontal-packing first-pass reoptimiza- to update the added lengthŽ. AL based on the tion algorithm with state test. number of bits set in FŽ. steps 8᎐11 . 1 Load PB,b PC,b *PA, *F 2 Push *F and its address onto stack Ž. 1 Load PRVX and PR YZ 3 Let ; PB & PC 2 Let Ss;Ž.PRVX & PR YZ 4 Let UsPBNPC 3 Let FsS & MASK 5 Let FsS & MASK 4 For Is1to3 6 For Is1to3 5 Let FsF&Ž. S4I a ᎐ 6 Next I The loop 9 13 is replaced by repetition of steps 10 to 12 in the actual machine coding of the algorithm. Step 11 is optionalŽ need 7IfFs0goto12 not be repeated for every step.. 8 Let FsF&Ž. Fy1 b If this is not the first node being reoptimized, one of the load 9 Let ALsALq1 operations may be replaced with register moves.

Copyright ᮊ 1998 by The Willi Hennig Society All rights of reproduction in any form reserved Parsimony Algorithms 399

7 Let FsF&Ž. S41 Algorithm 12 8 Next I Four-state vertical-packing first-pass reoptimization 9 For Is1to3 algorithm with state test. 10 Let FsFNŽ.F<1 1 Load PB0, PB1, PB2, and PB3d 11 Next I 2 Load PC0, PC1, PC2, and PC3d 12 Store FŽ. replacing *F 3 Load *PA0, *PA1, *PA2, *PA3 13 Let SsŽ.Ž;S N F&U . 4 Load *F 14 If Ss*PA stop 5 Push *F and its address onto stack 15 Push *PA and its address onto stack 6 Let S0sPB0 & PC0 16 Store SŽ. replacing *PA 7 Let S1sPB1 & PC1 17 Proceed with ancestor of A 8 Let S2sPB2 & PC2 9 Let S3sPB3 & PC3 10 Let U0sPB0NPC0 Algorithm 11 11 Let U1sPB1NPC1 12 Let U2sPB2NPC2 Four-state horizontal-packing final-pass reopti- 13 Let U3sPB3NPC3 mization algorithm with state test and updating of 14 Let Fs;Ž.S0NS1NS2NS3 potential root node state sets. 15 Store FŽ. replacing *F 1 Load PA, PB, PC, FD, *FA, and F 16 Let S0sS0NŽ.F&U0 Ž. 2 Let XsFD & Ž.;PA 17 Let S1sS1N F&U1 18 Let S2 S2 Ž.F&U2 3 Let GsŽ.; & MASK s N 19 Let S3 S3 Ž.F&U3 4 For Is1to3 s N 20 If S0 *PA0 and S1 *PA1 and S2 *PA2 5 Let GsG&ŽŽ;X .41 . s s s 6 Next I and S3s*PA3 stop 21 Push *PA0, *PA1, *PA2, and *PA3 and their 7 For Is1to3 address onto stack 8 Let GsGNŽ.G<1 Ž * 9 Next I 22 Store S0, S1, S2, and S3 replacing PA0, *PA1, *PA2, and *PA3. 10 Let SsŽŽ..FDN ;G&PA 23 Proceed with ancestor of A 11 Let SsŽ.FD & F 12 Let SsŽŽŽŽŽPBNPC . & FD . & Ž;F& .. Ž..;G NS Algorithm 13 13 If Ss*FA go to 21c Four-state vertical-packing final-pass reoptimiza- 14 Push *FA and its address onto stack tion algorithm with state test and updating of 15 Store SŽ. replacing *FA potential root node state sets. 16 Let SsSNFD 1 Load PA0, PA1, PA2, PA3 17 Push *FRAD and its address onto stack 2 Load PB0, PB1, PB2, PB3

18 Store SŽ. replacing *FRAD 3 Load PC0, PC1, PC2, PC3 19 Push pointer to right descendant of A onto 4 Load FD0, FD1, FD2, FD3 stack 5 Load *FA0, *FA1, *FA2, *FA3 20 Proceed from 1 with left descendant of A 6 Load F 21 Pop pointer from stack 7 Let X0sFD0 & Ž.;PA0 22 If stack empty proceed with next character 8 Let X1sFD1 & Ž.;PA1 set 9 Let X2sFD2 & Ž.;PA2 23 Proceed from 1 with the node pointed to by 10 Let X3sFD3 & Ž.;PA3 the pointer 11 Let GsX0NX1NX2NX3 12 Let S0sŽ.ŽŽ..FD0&FN GNFD0 & PA0

c In the target tree, the condition A not on the path to the d If this is not the first node being reoptimized, one of the set of clipped branch should also be met for the branch to be taken. load operations may be replaced with register moves.

Copyright ᮊ 1998 by The Willi Hennig Society All rights of reproduction in any form reserved 400 Ronquist

13 Let S1sŽ.ŽŽ..FD1&FN GNFD1 & PA1 23 Let S0sS0NFD0 14 Let S2sŽ.ŽŽ..FD2&FN GNFD2 & PA2 24 Let S1sS1NFD1 15 Let S3sŽ.ŽŽ..FD3&FN GNFD3 & PA3 25 Let S2sS2NFD2 16 S0sŽŽŽŽŽPB0NPC0 . & FD0 . & Ž;F&G .. .NS0 26 Let S3sS3NFD3 17 S1sŽŽŽŽŽPB1NPC1 . & FD1 . & Ž;F&G .. .NS1 27 Push *FRAD 0y*FR AD 3 and their address 18 S2sŽŽŽŽŽPB2NPC2 . & FD2 . & Ž;F&G .. .NS2 onto stack 19 S3sŽŽŽŽŽPB3NPC3 . & FD3 . & Ž;F&G .. .NS3 28 Store S0, S1, S2, and S3Ž replacing 20 If S0s*FA0 and S1s*FA1 and S2s*FA2 *FRAD 0y*FR AD 3. and S3s*FA3 go to 31e 29 Push pointer to right descendant of A onto 21 Push *FA0, *FA1, *FA2, and *FA3 and their stack address onto stack 30 Proceed from 1 with left descendant of A 22 Store S0, S1, S2, and S3Ž replacing *FA0, 31 Pop pointer from stack *FA1, *FA2, and *FA3. 32 If stack empty proceed with next character set e In the target tree, the condition A not on the path to the 33 Proceed from 1 with the node pointed to by clipped branch should also be met for the branch to be taken. the pointer

Copyright ᮊ 1998 by The Willi Hennig Society All rights of reproduction in any form reserved