<<

Society of Systematic Biologists

A Phylogenetic Analysis of the Caminalcules. I. The Data Base Author(s): Robert R. Sokal Reviewed work(s): Source: Systematic Zoology, Vol. 32, No. 2 (Jun., 1983), pp. 159-184 Published by: Taylor & Francis, Ltd. for the Society of Systematic Biologists Stable URL: http://www.jstor.org/stable/2413279 . Accessed: 02/04/2012 22:07

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Taylor & Francis, Ltd. and Society of Systematic Biologists are collaborating with JSTOR to digitize, preserve and extend access to Systematic Zoology.

http://www.jstor.org Syst.Zool., 32(2):159-184, 1983

A PHYLOGENETIC ANALYSIS OF THE CAMINALCULES. L. THE DATA BASE

ROBERT R. SOKAL Departmentof Ecologyand ,State Universityof New Yorkat StonyBrook, StonyBrook, New York 11794

Abstract.-The Caminalcules are a group of "organisms" generated artificiallyaccording to principles believed to resemble those operating in real organisms. A reanalysis of an earlier data matrixof the Caminalcules revealed some inconsistencies and errorswhich necessitated recoding of some characters.The resultingdifferences with earlier resultsare minor.The images of all 77 Caminalcules are featured,those of the 48 fossilspecies forthe firsttime. The characters of the Caminalcules are defined and a data matrixis furnishedfor all Recent and fossil species. A new phenetic standard is proposed for the Caminalcules which divides them into five "genera." The true is revealed for the firsttime. Recent Caminalcules have evolved over 19 time periods. Five branches correspond to the phenetic genera but originateat greatly differingtime periods. Four lines terminatein fossils. A series of measures for quantifyingevolutionary change is defined,including measures for ,parallelism, and reversal. A survey is made of these measures and of other statistics of relevance to systematicsfor 19 data sets fromthe numerical taxonomic literature.The Cam- inalcules turn out to be compatible to data sets on real organisms with respect to all these measures, as well as with respect to evolutionaryrates and species longevities. Thus, questions raised by an analysis of the Caminalcules should be of interestto systematistsconcerned with the analysis of data sets on real organisms. [Phenetic classifications;cladistic classifications; estimatedcladograms; homoplasy; Wagner trees; Caminalcules; numerical .]

This paper,and othersto follow,takes ad- organisms.By contrastthe variousresults of vantage of the opportunityafforded by a this study merit serious attentionbecause group of artificialorganisms with a known they can be examined against the bench- phylogeny,the Caminalcules,to throwlight markof the true phylogenyof the group. on some of the questions concerningprin- The Caminalculesare artifactscreated by ciples and proceduresthat currently engage the late ProfessorJoseph H. Camin of the theattention of taxonomists.There is consid- Universityof Kansas and in effectrepresent erabledisagreement on the relativemerits of a single simulationof the evolutionarypro- pheneticand cladisticclassifications (Sneath cess by rules that have not been made ex- and Sokal, 1973;Eldredge and Cracraft,1980; plicit.However, readerswill findthat these Wiley, 1981). Some workers contend that organisms,which have the advantage over classificationsbased on phylogeneticprinci- othersimulations in presentinga visual rec- ples are empiricallybetter by variouscriteria ord to the investigator,illustrate a varietyof of optimality(Farris, 1977, 1979a,b; Micke- evolutionaryphenomena and are therefore vich,1978a, 1980; Schuh and Polhemus,1980; of considerable pedagogical and heuristic Schuh and Farris,1981). These claims have value. The relevanceof this data set to cur- been questionedby otherswho findthe evi- rentlyactive issues in systematicswill readi- dence and methodology presented to be ly becomeevident to the readerof this series. flawed(Colless, 1980;Rohlf and Sokal, 1980, I shall show thatwith respect to a substantial 1981; Sokal and Rohlf, 1981a; Rohlf et al., arrayof measurableproperties, the Camin- 1983a,b). The empiricalstudies, and the ar- alcules are well withinthe range of empiri- gumentspro and con phylogeneticclassifi- cally observed values for real taxonomic cationsderived in such investigations,suffer groupsand that,conversely, for no property froma major impediment.All of the phy- of consequence in numericaltaxonomy are logeneticclassifications reported in the lit- the Caminalcules beyond the range of ob- eratureare only estimatesof the true phy- servedvalues in real organisms. logeny, which is unknown for all real At the suggestionof the Editorand some 159 160 SYSTEMATIC ZOOLOGY VOL. 32 reviewers,this series of publicationsis ini- Ehrlichof StanfordUniversity and Dr. W. tiatedin this paper with the presentationof Wayne Moss of the Philadelphia Academy the data on which previous and succeeding of Sciences in additionto myself.The origi- studies have been based. I furnishthe im- nals drawn on the ditto mastersappear to ages of the previouslypublished 29 Recent have been lost followingthe death of Pro- species and forthe firsttime the 48 "fossil" fessorCamin in 1979. species. I also presentfor the firsttime the All examinationsof the Caminalcules for truephylogeny of the Caminalculesas gen- numericaltaxonomic studies have been car- eratedby ProfessorCamin. Withthese illus- ried out on the xeroxesof the images. Illus- trationsI providea listof the descriptionsof trationsof all 29 Recent OTUs have been charactersas adopted in my laboratoryas published three times previously (Sokal, well as a data matrixgiving the character 1966;Rohlf and Sokal, 1967;Sokal and Rohlf, statesfor all 106 charactersfor each of the 77 1980). For this purpose,inked copies of the Recent and fossilCaminalcules. In addition xeroxed images were photographed. Al- to presentinga new standardphenetic clas- thoughthe artist'scopies of the originalxe- sification,the paper describesa number of roxesare quite faithful,inevitably some fine measures for taxonomic and evolutionary detailhas been alteredor lost.Thus, not every propertiesand compares the Caminalcules characterstate describedbelow can be un- withdata setson real organismswith respect equivocallyrecognized in the featuredillus- to thesemeasures. Subsequent studies in this trations.The versionof the 29 RecentOTUs series will treatestimates of the true clado- published in Sokal (1966) was "beautified" gram,the inclusionof fossils in pheneticand by the publisher'sartist and cannotbe relied cladisticclassifications, congruence and char- upon fordetail. The images of the 48 fossils acterstability, and OTU stability. were newly inked forthis studyand all dif- ferentiatingcharacteristics can be observed ORIGIN OF THE DATA BASE in them. The original intentionin generatingthe The RecentOTUs, numbered 1 to 29, are Caminalcules was to study the nature of shown in Figure 1. The fossilOTUs, given taxonomicjudgment (eventually published differentcode names by Camin, were ran- by Sokal and Rohlf, 1980), but work with domly assigned numbers 30 to 77 by me. theseanimals has led to otherdevelopments They are shown in Figure2. in numericaltaxonomic methodology, such The truecladogram of the groupwas com- as an early method of numerical municatedto me by Camin in 1970. But al- (Camin and Sokal, 1965) and a method for though this informationwas employed in obtaining taxonomic structureby random the computationsleading to the analysis of and systematicscanning of biologicalimages taxonomicjudgment (Sokal and Rohlf,1980), (Sokal and Rohlf, 1966; Rohlf and Sokal, access to it was restrictedeven for workers 1967). Other experiments on taxonomic on this project.I did not become intimately judgmenthave also been based on the Cam- familiarwith the phylogenetictree until 1981 inalcules (Moss, 1971;Sokal, 1974;Moss and during the final analyses leading to this Hansell, 1980). manuscript. Camin drew the Caminalculesusing mas- Readers of this paper and of subsequent ter stencilsfor ditto machines. The genetic ones in this seriesshould note thatI use the continuityof the Caminalculeswas achieved -termcladogram in the sense in which I orig- by Camin by tracingsuccessive drawings of inallycoined it (Camin and Sokal, 1965;also the animals,permitting the preservationof independentlycoined with the same mean- all charactersexcept for such modifications ing by Mayr, 1965). One definitionof this as were desired.Xerox copies of the images meaning(Sneath and Sokal, 1973:29)is as "A of the RecentOTUs were made available in branching ... networkof ancestor-descen- theearly 1960s, those of the fossil OTUs some dant relationships."This definitiondiffers years later.Independent xerox copies of all fromthe several meanings attachedto the OTUs are in the possession of Dr. Paul A. termcladogram by various cladists(e.g., El- 1983 CAMINALCULES: DATA BASE 161

o u'

U) C in

00. 0'~~~~~~~~~0 04~~~~~~~~~~~~4 U.

U 0

0 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~1

0 0 ~~~~~~~~~~~~~~~~~~u0 U) 04 .4

0 V

*0~~~~~~~~~~~~~~~

~~~~~~~~~~*~ ~ ~ ~ ~ ~ ~ ~ K

04 _~~~~~~~~~4 co

*0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~4.IEIIIC.4 162 SYSTEMATIC ZOOLOGY VOL. 32

- .o

c, o

g .4

%6oo CNJ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

0e -~~~~~~~~~~~~

00~~~~~~

0% ~~~~~~~~~~~~~

*~~~~~~~~~~~~~~~~~~~, o

cli~~~~~~~~~~~~~~~~~~~~~~~~~b ODs

r) 6**

" 0

C)

-U, 1983 CAMINALCULES: DATA BASE 163

N ' Nl~ ~ ~

NA~~~~~~

* '~~~~~~~~~~~~~~~~~~~~~~~~~0~~~~- 00 0~~~~~~~~~~~

rl9j?~ ~ ~

I')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~4 164 SYSTEMATIC ZOOLOGY VOL. 32 dredgeand Cracraft,1980; Nelson and Plat- characters,27 orderedinteger characters and nick,1981; Wiley, 1981; see also Sneath,1982). 10 measurementcharacters. Both the 106 Thus,cladogram as used in thisand succeed- characterX 77 OTU matrixand the 85 char- ing papers refersto a branchingsequence acterX 29 OTU matrix contain characters depictingthe actual or hypothesizedgeneal- with NCs (no comparisoncodes). All NCs in ogy of the OTUs. It does not include length thisstudy are logical(i.e., presence of a given of branchesof the treeand is not a statement statefor character h makingit impossibleto about the evolution or patternof character definea statefor character i) ratherthan due states akin to "nested synapomorphy to missinginformation. In the largermatrix, schemes." 79 of the 106 characterscontain NC codes; in Camin did not keep a writtenrecord of the smaller,59 of the 85 charactershave NCs. characterchanges. In the 1960s,A. J.Boyce The measurementcharacters are available in and I. Huber prepared a data matrixfor a two versions.One is as directmeasurements numerical'pheneticstudy of the 29 Recent in millimeters,the otheris coded as ordered OTUs, describing86 characters.This data integerstates but with gaps omitted.The matrix was the basis of various phenetic rangeof measurementsfor one characterwas analyses of the Caminalcules (Sokal, 1966; divided into 10 equally wide classes coded 0 Sokal and Rohlf,1966, 1980; Rohlf and Sokal, to 9. Each characterstate was then assigned 1967). For the presentstudy it became nec- to one of these classes, but classes lacking essaryto map the coded charactersonto the observed characterstates were ultimately now known cladogramand duringthis pro- omitted.Thus, if therewere no OTUs with cess each characterwas re-evaluatedfor the measurementsin class3, the subsequentclass 29 RecentOTUs. This re-examinationof the 4 was renumbered3. The number of states imagesled to the discoveryof severalincon- in the 10 charactersresulting from this pro- sistenciesin the Boyce and Huber data ma- cedure ranges fromfour to nine. A list de- trix.Altogether 13 of the 2,494entries in that fining the states of each characteris fur- 29 X 86 matrixseemed to be in error and nished in the Appendix. The actual data were corrected.Original character 58 was de- matrix(integer coded version) is shown in leted because it was discoveredto be invari- Table 1. ant. All measurementcharacters were re- The consequences of recoding the mea- measuredand small differenceswith earlier surementcharacters in integer scale were measurementswere recorded. Because the minimal.Matrix correlations for the 29-OTU images available during my currentstudies study between the resemblance matrices are poorer quality xerox copies than those based on these two methods of character available to Boyceand Huber,the resolution coding are very high (0.998 and 0.997, re- of some of the more minute morphological spectively,for r and d, the correlationand featureswas moredifficult and in a fewcases taxonomicdistance coefficients).The classi- characterswere recoded into fewer,coarser ficationsresulting from these two coding classes. methods are identical. In this and subse- Remeasurements,changes in logic,and re- quent papers,only the data matrixusing in- vision of characterstate coding accounted tegerscale coding of measurementcharacter for changes in 33 of the 85 characters.To states is featured,since this is the simpler account forthe featuresof the fossilOTUs, coding preferredfor the several cladistic which had not been coded previously, methodsemployed. another12 characterswere redefinedand 21 The effectsof correctingand recodingthe new characterswere added. This augmented originalBoyce and Huber data matrixfor the data matrix comprises 77 OTUs and 106 29 OTUs were minor.Matrix correlations be- charactersof which 65 are binary,31 are or- tween the two versionsof the resemblance dered integer characters,and 10 are mea- matricesare high (0.980 for r, 0.969 for d). surementcharacters. From this matrixan 85 UPGMA phenogramsbased on theseresem- characterX 29 OTU subsetwas extractedfor blance matricesare close; thestrict consensus the RecentOTUs. This matrixhas 48 binary index CIc forthe classificationsbased on the 1983 CAMINALCULES: DATA BASE 165 two correlationmatrices is 0.889, for those the logical choice as a new standard for a based on thetwo distancematrices 0.815. This pheneticclassification of the 29 RecentOTUs index (Rohlf,1982) rangesbetween 0 forno to be comparedwith the true cladogram. This consensus and 1 forperfect consensus. It is phenogramalso has the highestcophenetic identicalto the consensusfork index of Col- correlationcoefficient of any phenetic clas- less (1980) and the proportionalconsensus sificationof the Caminalcules computed so index of Sokal and Rohlf(1981a). far(0.965). The differencesbetween the classifications Five major clusters,the "genera" of the based on correlationsand distancescomput- Caminalcules,can be seen in the phenogram ed fromthe originalBoyce and Huber data in Figure3. These fallnaturally into two ma- matrix(see fig.3 of Rohlfand Sokal, 1967) jor groups,that containing genera A, B, and and thosebased on the updated,integer-cod- F, and a second group containingC and DE. ed versionall involve taxonomicaffinities at The statusof D, consistingof OTUs 3 and 4, the "species" level. The "generic"classifica- is problematical.Only in one previousstudy, tion is the same forboth versions.When the based on a mechanicalmethod of scanning new classificationswere coriparedto 49 sub- the images (Rohlfand Sokal, 1967: fig.3A), jective classificationsof the Caminalcules was D unequivocallyseparated from E at a producedby 22 experimentalsubjects (Sokal level of pheneticsimilarity equal to that of and Rohlf,1980), the new phenogramsof the the othergenera-A, B, C, and F. The Boyce Caminalculeswere foundto be no more(and and Huber correlationphenogram (Rohlf and no less) similarto these intuitiveclassifica- Sokal, 1967: fig.3C) shows D joining E at a tionsthan the earlierBoyce and Huber clas- level closerthan thatof at least one member sifications. of A joining its genus. Intuitiveclassifica- tions by various taxonomists(Sokal and A NUMERICAL PHENETIC CLASSIFICATION Rohlf,1980) had frequentlyassigned species For pheneticanalyses the data were sub- 3 and 4 to genus E, althoughsome had sin- jectedto standardtaxometric procedures us- gled out the divergenceof these two OTUs ing the NTSYS system of numerical taxo- which are unique in having rudimentary nomiccomputer programs (Rohlf et al., 1980). posteriorappendages as well as an elongated Characterswere standardizedbefore com- body obliteratingthe neck. Since the phe- putation of correlationand taxonomicdis- nogramin Figure3 clearlyaffiliates D with tance coefficientsbetween OTUs. E, it seems appropriateto join the two to- Rohlfand Sokal (1967) noted thatpheno- getherinto a singlegenus whichI have called gramsbased on distancesand correlationsof DE to preservecontinuity with earlierpub- the Caminalculesdiffer substantially in the lications. specificas well as genericaffinities indicated. Previouswork by Rohlfand Sokal (1967) and THE TRUE CLADOGRAM by Sokal and Rohlf(1980) showed that the The true cladogramas furnishedby J.H. classificationbased on the correlationmatrix Camin is shown in Figure 4. The diversity correspondedmore closely than that based of the taxonwas generatedover 19 time pe- on the distance matrix to the taxonomic riods.Lines undergoingevolutionary change structurenoted by individual taxonomists are indicatedin Figure4 by slanted (as dis- groupingthe Caminalculesby conventional, tinctfrom vertical) lines. Species thatunder- intuitivemethods. This undoubtedlyoccurs went evolutionarychange during a given because correlationcoefficients are moresen- timeperiod are shown as solid circlesat the sitiveto shape than are distancecoefficients end of thatperiod. Ancestral species thatare (Rohlfand Sokal, 1965),and it is shape rather not continuedinto the next time period as than size on which taxonomiststend to base verticallines are consideredto have become theirjudgments on taxonomicaffinity. For extinctand are indicated by small hollow this reason,the UPGMA clustering(shown circles.Five branchesleading to Recentforms, in Fig. 3) of the correlationmatrix based on correspondingto the five phenetic genera, the updated,integer-coded data matrixwas can be recognized but these originate at 166 SYSTEMATIC ZOOLOGY VOL. 32

TABLE 1. Data matrixof 77 Recent and fossil Caminalcules.a

1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-77

1 11111 11111 11111 11111 11110 11111 11111 11101 11111 11111 11111 11111 11111 11111 11111 11 2 11111 01110 01111 11111 OlliX 11111 11111 iliXi 11111 11111 11111 11111 11111 11111 11111 11 3 XXXXX lXXXO OXXXX XXXXX lXXXX XXXXX XXXXX XXXXXXXXXX XXXXX XXXXX XXXXXXXXXX XXXXX XXXXX XX 4 00110 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00010 00000 00001 00 5 XX1OX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXOX XXXXX XXXX1 XX

6 10000 12200 00220 10011 00004 10310 10000 00050 00000 00001 00000 01002 00001 00010 21000 00 7 Xllll 10111 11112 X1122 11110 2X121 21111 11101 11111 1X112 11111 11111 12111 lliXi 02111 11 8 XXXXX XXXXX XXXX1 XXxO1 XXXXX OXXOX OXXXX XXXXX XXXXX XXXXO XXXXX XXXXX XOXXX XXXXX XOXXX XX 9 00000 01000 00000 00000 00001 00000 00000 00010 00000 00000 00000 00000 00000 00000 10000 00 10 11111 10111 11010 11111 11110 11011 11111 11101 11111 11111 11111 11101 10111 11111 00111 11

11 20000 OXOOO OOXOX 30000 OOOOX O1XOO 00000 OOOXO 00000 01000 00000 OOOXO OXOOO 00010 XXOOO 00 12 00000 00000 00000 00011 00000 10010 10000 01100 00000 00101 00000 00000 00000 00001 01000 00 13 XXXXX XXXXX XXXXX XXX55 XXXXX 5XX3X 5XXXX XO1XX XXXXX XX3X3 XXXXX XXXXX XXXXX XXXX2 X4XXX XX 14 XXXXX XXXXX XXXXX XXXlX XXXXX 1XXOX 2XXXX XOOXX XXXXX XXOXO XXXXX XXXXX XXXXX XXXXO XOXXX XX 15 12001 10000 02000 01011 01001 10011 10010 10110 11000 00101 10201 01000 10000 11011 01000 00

16 XlXXX XXXXX XOXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXOXX XXXXX XXXXX XXXXX XXXXX XX 17 01000 00000 01000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00 18 11111 10111 11110 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11 19 54444 4X034 4400X 56378 53350 75063 71334 21202 32322 35415 20412 23000 20434 44553 06013 32 20 01001 OXOOO OlOOX 00000 01000 00000 00000 00000 10000 00000 00100 00000 00000 10000 00000 00

21 01111 OXOOO OlOOX 00100 01100 00001 00001 10000 10100 10010 00101 11000 00011 11000 00001 10 22 1XXXX 1X221 lX22X 21X22 OXX12 2222X 2213X X2222 X3X33 X22X2 32X2X XX222 321XX XX122 2222X X3 23 Xliii XXXXX XlXXX XXlXX xoixx xxxxo xxxxi OxXXX iXiXX iXXOX XXiXO lOXXX Xxxii iOXXX XXXX1 ix 24 XOioo XXXXX XiXXX XXOXX XOXXX XXXXX XXXXX XXXXX OXXXX OXXXX XXOXX XXXXX XXXOX XXXXX XXXXO XX 25 X1010 XXXXX XOXXX XXOXX XOXXX XXXXX XXXXX XXXXX OXXXX OXXXX XXiXX XXXXX XXXOX XXXXX XXXX1 XX

26 OXXXX OXOOO OXOOX OOxil OXXOO lOOiX lOOOX XO100 XOXOO XOXl OOXOX XXOOO OOOXX XXOO1 OlOOX XO 27 XXXXX XXXXX XXXXX XXX34 XXXXX 5XX5X 3XXXX XXOXX XXXXX XX2X3 XXXXX XXXXX XXXXX XXXX1 X5XXX XX 28 XXXXX XXXXX XXXXX XXX23 XXXXX 6XX4X 3XXXX XXOXX XXXXX XX2X2 XXXXX XXXXX XXXXX XXXX1 X5XXX XX 29 00000 iXOil ilOOX 00000 10000 00000 00100 00000 00000 00000 00000 00000 00000 00000 00000 00 30 XXXXX 1XXOO 1XXXX XXXXX 2XXXX XXXXX XXNXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XX

31 12222 00000 02000 11200 02210 02000 00001 00000 20100 21000 00200 00000 00121 20110 00002 10 32 XXXXX 10011 lXOOO XXXOO lXXXO OX002OQOlOX 20001 XOXOO XXOOO 0OX02 02000 OOXXX X2XXO OOOOX XO 33 XXXXX X11XX XXO11 XXX44 XXXX2 4X04X 46X7X X432X X6X76 XX464 61X6X 7X111 61XXX XXXX3 1524X X7 34 XXXXX X11XX XX112 XXXOO XXXX1 OXOlX 02X2X X221X X2X22 XX121 22X2X 2X221 22XXX XXXX2 1112X X2 35 XXXXX 2XXOO lXXXX XXXXX lXXXX XXXXX XXOXX XXXXN XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XX

36 X2012 XXXXX X2XXX XX2XX X22XX X2XXX XXXXX XXXXX 2XXXX 2XXXX XX2XX XXXXX XXX2X 2XXXX XXXX1 XX 37 XlXXO XXXXX XlXXX XXOXX XlOXX XOXXX XXXXX XXXXX OXXXX OXXXX XXiXX XXXXX XXXOX lXXXX XXXXX XX 38 XiXXX XXXXX XiXXX XXXXX XOXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXlXX XXXXX XXXXX OXXXX XXXXX XX 39 01001 22222 21222 10100 21112 01202 02211 22222 10111 11222 02122 12222 02101 12102 20220 11 40 00000 00000 00000 00001 00000 00000 10000 00000 00000 00000 00000 00000 00000 00000 00000 00

41 00000 11111 10111 00000 10001 00101 01100 11011 00000 00010 01011 01111 01000 01000 10110 00 42 01111 00000 01000 00100 Ol1OO 00000 00000 00000 10100 10000 00100 10000 00011 10000 00001 10 43 Xllll XXXXX XIXXX XXOXX X21XX XXXXX XXXXX XXXXX lXOXX 1XXXX XXlXX OXXXX XXX1N 2XXXX XXXX1 OX 44 10000 00000 00000 11000 00010 01000 00011 00000 00010 01000 00000 00000 00100 00110 00000 01 45 2XXXX XXXXX XXXXX 2OXXX XXX2X XIXXX XXX12 XXXXX XXXOX X2XXX XXXXX XXXXX XX2XX XX22X XXXXX XO

46 11111 00000 01000 11100 01110 01000 00011 00000 10110 11000 00100 10000 00111 10110 00001 11 47 20000 XXXXX XOXXX 12OXX X002X XIXXX XXX03 XXXXX OX02X O1XXX XXOXX OXXXX XX202 OX22X XXXXO 02 48 2XXXX XXXXX XXXXX O1XXX XXXOX XOXXX XXXX1 XXXXX XXX1X XOXXX XXXXX XXXXX XXlXl XX12X XXXXX 1X 49 00000 00000 00000 00000 10000 00000 00010 00000 00000 00000 00000 00000 00000 00000 00000 00 50 00000 00000 00000 00011 00000 1001,0 10000 00000 00000 00101 00000 00000 00000 00000 01000 00

51 XXXXX XXXXX XXXXX XXX13 XXXXX lXX4X 2XXXX XXXXX XXXXX XXOX1 XXXXX XXXXX XXXXX XXXXX X4XXX XX 52 XXXXX XXXXX XXXXX XXX10 XXXXX 2XX1X 1XXXX XXXXX XXXXX XXlXl XXXXX XXXXX XXXXX XXXXX X2XXX XX 53 10000 00000 00000 11000 00010 01000 00001 00000 00000 01000 00000 00000 00100 00110 00000 00 54 01221 00000 02000 00200 02200 00000 00000 00000 20200 20000 00200 20000 00020 20000 00002 00 55 XX02X XXXXX XOXXX XXlXX XOOXX XXXXX XXXXX XXXXX lXOXX OXXXX XXOXX OXXXX XXX3X OXXXX XXXX2 XX

56 XX20X XXXXX X3XXX XXlXX X11XX XXXXX XXXXX XXXXX lXlXX 1XXXX XXlXX XXXXX XXXOX 1XXXX XXXXO XX 57 XiXXO XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XX 58 XXXXX XXXXX XXXXX XXXO1 XXXXX OXXOX OXXXX XObXX XXXXX XXOXO XXXXX XXXXX XXXXX XXXXO XOXXX XX 59 XXlOX XXXXX XOXXX XXOXX XOOXX XXXXX XXXXX XXXXX OXOXX OXXXX XXOXX OXXXX XXXOX OXXXX XXXXO XX 60 22023 XXXXX XlXXX 323XX X133X X2XXX XXXN6 XXXXX 3X44X 33XXX XX2XX 3XXXX XX536 lX32X XXXX3 6N

61 XOX1O xxxxx XXXXX XXOXX XXOXX XXXXX XXXXX XXXXX OXXXX OXXXX XXOXX OXXXX Xxxix XXXXX XXXX1 XX 62 X1212 XXXXX XlXXX XX2XX X02XX XXXXX XXXiX XXXXX 2X2XX 2XXXX XXlXX 2XXXX XXX2X OXXXX XXXX2 2X 63 43233 33333 33332 74310 33363 17313 13324 33333 33333 37131 33323 33333 33533 33542 31332 32 64 11111 00000 01000 11100 01110 01000 00011 00000 11111 11000 10100 10000 10111 10110 00001 11 65 01111 11111 11111 10111 11110 11111 11111 11111 11111 11111 11111 11111 Nllll 11101 11111 11

a Columns are species code numbers. Recent OTUs are species 1-29. Rows are the 106 charactersdescribed in the Appendix. The first85 charactersare the integer-codeddata for the Recent OTUs. The rest are those necessary for describing the fossils. The characterstates, ranging numericallyfrom 0 to 8, are in the body of the table. The following two states have special symbols:X-no comparison (NC) code; N-represents negative one (-1). 1983 CAMINALCULES: DATA BASE 167

TABLE 1. Continued.

1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-77

66 XOOO1 00000 00000 OXOll OOlOX 10010 10000 00000 00000 00101 00000 00000 XOOOO OOOXi 01000 00 67 XXXX1 xxXXX XXXXX XXXOO XXlXX OXXOX OXXXX XXXXX XXXXX XXOXO XXXXX XXXXX XXXXX XXXX1 XOXXX XX 68 10000 33333 30333 11133 30013 31333 33331 33333 03233 11333 33033 33333 23101 03113 33330 13 69 Oxxxx xxxxx xxxxx Oo1xx xxxoX XOXxX XXXXO XXXXX XXlXX lOXXX XXXXX XXXXX OXOXO XXOOX XXXXX OX 70 XXXXX 00000 OXOOO Xxxii OXXXO iXO10 lOOOX 00000 XOXOO XXOO1 OOXOO 00000 XOXXX XOXXO OlOOX XO

71 XXXXX XXXXX XXXXX XXX1o XXXXX lXXOX OXXXX XXXXX XXXXX XXXX1 XXXXX XXXXX XXXXX XXXXX XOXXX XX 72 XXXXX XXXXX XXXXX XXXXO XXXXX Xxxix lXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XOXXX XX 73 XxXXX xxxxX XXXXX XXXOX XXXXX lXXXX XXXXX XXXXX XXXXX XXXXO XXXXX XXXXX XXXXX XXXXX XXXXX XX 74 13333 33333 33323 01333 33311 30233 33332 33313 33333 30333 33333 33333 23233 33113 33333 33 75 OXXXX XXXXX XXXOX XOXXX XXX1o XXOXX XXXXO XXXOX XXXXX XXXXX XXXXX XXXXX OXlXX XXOOX XXXXX XX

76 Xliii 01100 OliXO XXOll OlOXX lXX1o lOOOX OOOXO 10000 iXOOl 10100 00001 XOXll lOXXO 11001 00 77 XOllO XlOXX XOOXX Xxxii XlXXX lxxix lXXXX XXXXX OXXXX OXXX1 OXOXX XXXXO XXX1o lXXXX llXXl XX 78 Xxoxx XXXXX XXXXX XXXXX XlXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXOX XXXXX XXXXO XX 79 XXllX XOXXX XXXXX XXXOO XlXXX OXXOX OXXXX XXXXX XXXXX XXXXO XXXXX XXXXX Xxxix lXXXX OOXX1 XX 80 XX02X X2XXX XXXXX XXX10 XOXXX 1XXOX OXXXX XXXXX XXXXX XXXX1 XXXXX XXXXX XXXOX 1XXXX 2OXXO XX

81 XlXXl XXOXX XlOXX XXXXX XXXXX XXXXX XXXXX XXXXX lXXXX lXXXX OXlXX XXXXO XXXXO XXXXX XXXXX XX 82 XOXX1 XXXXX XOXXX XXXXX XXXXX XXXXX XXXXX XXXXX lXXXX lXXXX XXOXX XXXXX XXXXX XXXXX XXXXX XX 83 XXXXX XX2XX XXOXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX 3XXXX XXXXO XXXX1 XXXXX XXXXX XX 84 X2222 XXXXX X2XXX XX2XX X20XX XXXXO XXXXO OXXXX 2XOXX 2XXOX XX2XO OOXXX XXX20 1OXXX XXXX2 OX 85 00330 00000 00000 00056 00000 50040 50000 00100 00000 01304 00000 00000 00020 00002 04003 00

86 XXXXX XXXXX XXXXX XXXXX XOXXX XXXX2 XXXXX 1XXXX XXXXX XXXiX XXXX1 X2XXX XXXXX X3XXX XXXXX XX 87 xxXXX xxXXx XXXXX XXXXX XXXXX XXXXX XXXXX lXXXX XXXXX XXXOX XXXX1 XXXXX XXXXX XXXXX XXXXX XX 88 Xxxxx xxxxx XXXxX XXXXX XXXXX XXXXX XXXXX OXXXX XXXXX XXXXX XXXX1 XXXXX XXXXX XXXXX XXXXX XX 89 xxXXx xxxXx xxxxX XXXXX XXXXX XXXXO XXXXX XXXXX XXXXX XXXXX XXXXX XiXXX XXXXX XXXXX XXXXX XX 90 XOOOO XXXXX XOXXX XXOXX XXOXX XXXXX XXXX2 XXXXX OXOXX OXXXX XXOXX OXXXX XXX02 OXXXX XXXXO 1X

91 XXXXX XOOXX XXOOO XXXOO XXXXO OXOOX OiXiX XiiOX XiXii XXiiO lOXiX iXOOO iOXXX XXXXi OOOiX Xi 92 XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XOXOX XOOXX XOXOO XXOiX OXXOX OXXXX OXXXX XXXXO XXXOX XO 93 Xxxxx xxxxx xxxxx XXXXX XXXXX XXXXO XXXXX OXXXX XXXXX XXXXX XXXXO XOXXX XXXXX XiXXX XXXXX XX 94 XXXXX XXXXX XXXXX XXXXX XXXXX XXXXi XXXXX OXXXX XXXXX XXXXX XXXXi XiXXX XXXXX XOXXX XXXXX XX 95 xxXXX xxXXx XXXXX XXXXX XXXXX XXXXi XXXXX XXXXX XXXXX XXXXX XXXXO XlXXX XXXXX XXXXX XXXXX XX

96 XXXXX XXXXX XXXXX XOXXX XXXXx XXXXX XXXXX XXXXX Xxxix XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX Xi 97 10000 XXXxXXOXXX iiOXX XOOiX XiXXX XXXOi XXXXX OXOiX OiXXX XXOXX OXXXX XXioi OXiiX XXXXO 10 98 OXXXX XXXXX XXXXX OOXXX XXXOX XOXXX XXXio XXXXX XXXOX XOXXX XXXXX XXXXX XXOXX XXOOX XXXXX Xi 99 XXOOO XXXXX XOXXX XXOXX XOOXX XXXXX XXXXX XXXXX OXOXX OXXXX XXOXX iXXXX XXXOX OXXXX XXXXO XX 100 00000 00000 00000 00000 00000 00000 00000 00000 01000 00000 10000 00000 10000 00000 00000 00

101 XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XOXXX XXXXX lXXXX XXXXX 2XXXX XXXXX XXXXX XX 102 XOOOX 00000 00000 OXOXX OOXOX XOOXO XOOOO 00010 00000 OOXOX 20000 00000 XOOOO OO1XX OXOOO 00 103 XXXXX 00000 OXOOO XXXOO OXXXO OXOOO OOOOX 00000 XOXOO XXOOO iOXOO 10000 XOXXX XOXXO OOOOX XO 104 XOOOO 00000 00000 XXOOO OOOXO OXOOl 00010 00000 01000 OXOOO 10000 01000 10000 OOXXO 00000 01 105 xxxxx xxxxx xxxxx xxxxx xxxxx xxxxi xxxox xxxxx xoxxx xxxxx oxxxx xixxx xxxxx xxxxx xxxxx xo

106 XXXXX XXXXX XXXXX XXXXX XXXXX XXXXO XXXXX XXXXX XXXXX XXXXX XXXXX XiXXX XXXXX XXXXX XXXXX XX

greatlydiffering time periods. There are also MEASURES OF PHENETIC AND four lineages that underwentevolutionary EVOLUTIONARY CHANGE change beforebecoming extinct. The extinct To describeevolutionary changes in this terminal species of these fossil lines are and succeeding papers, various statistics shown as large hollow circles.I have indi- based on the charactersand the true clado- cated the amount of evolutionarychange- gram are needed. They are summarizedin (path lengthof the internode)based on 85 Table 2. Some termsneed to be defined.An charactersby thelength of the thickened bars entiretaxon consistsof t OTUs and is de- along the slante,d lines. Note that path scribedby n characters.OTUs are labelled as lengthsfor lines leading to fossilsare based 1, ... , j, k,.. ., t and charactersas 1, ... , hi on only 85 charactersto make them compa- if..., n. The most recentcommon ancestor rable to path lengthsof Recent forms.Path of all of the OTUs is o. z is summationover lengthsbased on all 106 characterswill be illustratedin paper III of this series. all OTUs indexed by j, usually t OTUs. 168 SYSTEMATIC ZOOLOGY VOL. 32

1.00 4 3 22 12 2 5 18 23 16 27 24 17 1 6 10 11 9 21 7 15 8 14 13 28 25 26 19 29 20

0.80 X

0.60-

0.40-

0-

-0.20

-0.40

FIG. 3. Standard phenogram of the 29 Recent Caminalcules based on the updated data set of this study. It is based on the data matrixwith integer codes substitutedfor the measurementsand was obtained by standardizationof charactersfollowed by computationof product-momentcorrelation coefficients and UPGMA clustering.The ordinate is in correlationcoefficient scale. The phenetic genera are labeled with capital letters.

Let us assume that we know the true sures homoplasticas well as patristicresem- cladogramof the taxon,as is the case in the blance (Sneath and Sokal, 1973).Next, let us Caminalcules.The nodes of the tree will be define Lmax(u) = Io as the sum of such the Recent OTUs and their ancestors,and the internodeswill be the lines of descent lengthsover all t OTUs. The quantityLmax(u) connectingthese taxa. ThroughoutI shall is the maximallength which the trueclado- assume that evolutionary(character state) gram could assume if it were rearrangedso change occursonly along internodes.Let I11 thatevery OTU evolves singlyand indepen- be the sum of the lengthsin characterstate dentlyfrom the ancestoro, with the OTUs changesover all internodesalong the direct- diverging from that ancestor in bushlike ed path fromOTU j to o, the most recent fashion and repeating separatelyfor each common ancestor (root) of the taxon, OTU the changes in characterstates that ac- summedover all characters.Quantity lo has tually occurred along the common evolu- been called the patristiclength of OTU j by tionarystems in the true cladogram.Thus Farris(1969) and others.I shall referto it by Lmax(u) is a measurewhich indicatesthe upper the neutralterm "path length,"since it mea- possible bound of parallelismand reversals

FIG. 4. True cladogram of the Caminalcules as furnishedby J.H. Camin. Morphological change occurred during 19 time periods from time 1 to time 20 along the ordinate. Vertical lines indicate periods without morphological change, slanted lines indicate such change. The amount of evolutionarychange (path length of the internode) based on 85 charactersis shown by the length of the thickenedbars along the slanted lines. To furnishan indication of scale, the length of the internode subtending OTU 54 is one Manhattan distance unit, that subtending OTU 76 is 10 such units. Path lengths based on 106 characters,necessary for differen- tiating fossil species, are shown in a subsequent publication (Sokal 1983b: fig. 2). Squares identifyRecent 1983 CAMINALCULES: DATA BASE 169

F B DE C A 2619 2029 /6 21 11 10 9 4 3 22 5 12 2182316 272417 I 151413288 7 25 20-illl 11111

26 20 72 2 1226 19476

18 7 21 24 68 28

17 - 29 9 66 53 63

16 413

61 15 3

14 18 761525 -

13 -19 34

12 - 50 6 56 61

II I 344 51 7 30 10 10 7 67 9 - ~~~~~~~~~44 33 ~~~~~~~2 57 8 67

7 -404 30 14 13 39

6 54 557

5 36 62

4 49 5 3 ~~~~~~60

3 37 32 5

2 74 5

73 species, black circlesfossil species. However, note thatsome fossilspecies extend into the Recent (e.g., species 8). Hollow circles indicate extinctspecies. Large hollow circles terminateextinct lineages whereas the tiny ones symbolize extinct species whose lineages continue with evolutionary change. The numbers next to squares and circles identifythe species. Recent species have been given the numbers 1 through 29 familiar from the literature.Fossil species were assigned numbers 30 through 77 at random. Note that, although Camin indicated evolution between species 58 and 52, my coding shows no morphological change between these two forms. The internode between species 36 and 55 appears to be a similar case, but it exhibits morphological change for charactersnumbered 86-106. The phenetic genera of the Caminalcules are iden- tifiedby bracketsacross the top of the tree. 170 SYSTEMATIC ZOOLOGY VOL. 32

TABLE 2. Summaryof formulasof phenetic and evolutionarychange.a Ijo= Sum of lengths in characterstate changes over all internodes along the directed path fromOTU j to most recent common ancestor o Lma,,(,,)= i;, where 2 is summation over all OTUs j I MD,, = IX X,JXI where : is summationover all charactersand X, is the characterstate forcharacter i and OTU j. OTU o is the most recent common ancestor Lmax(l)= 3 MD,,

R, = Lmax(u)IlLm,(l) ri(tO)= Range in characterstates of characteri forthe assemblage of t OTUs plus the most recentcommon ancestor o

Lmzn(,)= r,(t,,)

Lmln(u,= Minimum length forthe taxon, given any tree structureoriginating from the common ancestor o, and an evolutionarymodel for allowable types of characterstate changes Lac,= Sum of path lengths over all internodes,given knowledge of the true cladogram Lest= Sum of path lengths over all internodes of an estimatedcladogram DI = (Lm,,x(u)- La,t)I(L.X(u) - Lmin(,))(Dendritic index) H =La.tILmm(l) H* LestL,Lm,n()*= 1/C, where C is the consistencyindex of Kluge and Farris (1969) DR = (l,k - MDjk)! ; MD,k, where 3 is summation of all OTU pairs except for pairs jj and kk ik Ik jk (deviation ratio of J.S. Farris)

Lb,avg = L.t,MI(2tM -2), where the subscript M refers to the subtaxon M

Pbr,= Lbr,.,gI(lpp' + Lb,avg), where p is the most recent common ancestor of subtaxon M and p' is the most recent common ancestor of that subtaxon that is also an ancestor of a Recent nonmember of M Presented in order of appearance in text.Quantities with asterisksare based on estimatedrather than true .

in the data, given the distributionof char- of the ratio of unnecessary changes (re- acterstates over RecentOTUs and the char- versalsand repeatedforward changes) among acterstates of the fossilOTUs. the evolutionarysteps for that OTU. Con- A generallyshorter length for each OTU sequently, j resultsif one computes R = Lmax(u)ILmax(l)? 1.0 (2) MD,,, X= xi- Xiol(1 is a measureof the amount of reversalsand repeatsin characterstate changes for the en- where Xi1is the characterstate of OTU j for tire taxon consideredas a bush. Note that characteri and is summationover the n reversalsnear the base will be repeated in various OTUs and, hence, weighted more characters.Note that MDj, is theManhattan heavily. distancebetween OTU j and ancestoro. It is Two measuresof minimumlength of the the minimumpossible evolutionary length treewere considered.Define Lmin(l)= ri(t=), foreach OTU j fromthe ancestor o. We can assemblethese lengths to forma new upper where ri(t,)stands forthe range in character boundlength of the entiretree Lmax(l) - statesof characteri forthe assemblageof the t OTUs in the taxonplus its mostrecent com- MD,,given a minimallength bush from the mon ancestoro. Length Lmin(l) is the mini- ancestor.Note that since MDjo ? 1jo0Lmax(l) mum amountof evolutionnecessary for pro- Lmax(u).For any OTU j otherthan the ancestor ducingthe taxonomicdiversity of the t OTUs o theratio Rjo = 1jo/MD1O> 1.0 is a measure in the taxon.It is a theoreticalquantity, rare- 1983 CAMINALCULES: DATA BASE 171

L ct ancestor,can be computedonly by enumer- 4, Lmnt,e) Lmin(U) Lmox(,e) Lmax(iu) ation, which is practicalfor small numbers I II of OTUs only. For mostreal data sets,infor- mationis available only forRecent OTUs. In HOMOPLASY DENDRITIC INDEX such cases an estimatedcladogram is pro- FIG. 5. Schematicto show the relationsamong the duced by a numericalcladistic algorithm us- measures of phenetic and evolutionary change dis- ing an outgroupOTU believed to be close to cussed in the text.Note thatthe various statisticshave the mostrecent common ancestor or a vector been placed along a line representingthe range of possible lengths of an evolutionarytree in positions of characterstates believed to be primitive. that they might assume in a "typical" tree. In any As a resultof the cladisticestimation process, actual case, the length of any segments of this line the HTU ultimatelyspecified as the mostre- may differfrom that shown in the figure and may cent commonancestor of the group may be indeed be zero. The double-arrowed bracket above differentfrom the outgroupfirst furnished. the line indicates that La,Cmay vary between Lm() and L and is undefined with respect to its posi- In thesecases, the statisticsLmax(l)' Lmax(u)'Lmin(l)' tion vis-a-vis LmX(,)The double headed arrows at the and Lmin(u) must be defined with respect to bottom of the figureindicate the proportionsof the the estimatedcladogram and the HTU rep- length due to homoplasy and to its dendritic struc- resentingthe hypothesized most recent com- ture. The location of the between these boundary mon ancestor.To distinguishthese statistics two (the opposed arrowheads) is at the value for L,,. based on estimatedrather than true clado- grams,I have added asterisksto theirsym- ly, if ever, obtained in real data because of bol. The equivalent of the quantityLact for theactual distribution of characterstates over the estimatedcladogram obtained by a nu- the OTUs. For Lmin(l)to be realized, there mericalcladistic procedure such as a Wagner would have to exista cladogramfor the tax- treealgorithm is the lengthLest. For any giv- on forwhich all characterswould have to be en data set and evolutionarymodel, given a compatible.In such a case therecould be no specified root o, Lest > Lmzn(u)*.This estimate reversalsor parallelisms.Since this is a hy- by various computationalalgorithms will pothetical,largely unattainable minimum frequentlynot producethe minimumlength length, a second minimum length Lm,n(,)Lmzn(u) * Note that Lest can be less than or needs to be definedas the minimumlength greaterthan La,t for the taxon with t OTUs, given any tree These quantitiespermit computation of the structure originating from the common followingstatistics. The saving in evolution- ancestoro and an evolutionarymodel forthe ary lengthby the known tree structurecan allowable types of characterstate changes. be defined as Lmax(u)- Lact.Reversals and par- NecessarilyLmin(,) > Lmin(l)since some homo- allelismsare in bothcoefficients but theyare plasy is usually present. morenumerous in Lmax(u)since thisis a bush- Given knowledge of the true cladogram, like structurerepeating the path length of the actual lengthof the tree,Lact, is the sum common shared stems separatelyfor each of the lengthsover all the internodes.Thus OTU. A dendriticindex can be defined it representsall the evolutionarychanges DI= - over all charactersthat have taken place on (Lmax(u) -Lact)I(Lmax(u) Lmin(l)), (3) thistree. Clearly, Lmln(u) < Lact < Lmax(u)but Lact which expressesthe savings in length due could be greateror smallerthan Lmax(l). The to the tree's dendriticstructure departing relationsamong the various quantitiesare fromthat of a bush as a proportionof the illustratedin Figure5. tree'smaximum possible range in length.If QuantitiesLact and Lmax(u)are computable DI = 0, then the taxonis a bush and thereis only in those extremelyrare cases, such as no common evolution. If DI = 1, then the the present one, in which the true evolu- characterstates are fullycompatible on the tionarysequence is known; quantitiesLmin(l) cladogramand there are no parallelismsor and Lmax(l)require at least knowledge of the reversals.Because basal internodesare more most recent common ancestor o. Quantity often involved in determiningIjo than are Lmin(u)' even with knowledge of the common internodesnear the tips of the tree, DI is 172 SYSTEMATIC ZOOLOGY VOL. 32 more heavily affectedby savings in length wise homoplasticdistances divided by the in the dendriticstructure near the base. For sum of the Manhattandistances among all data setswith unknown cladogenies one can pairs of OTUs. The pairwise homoplastic define distancesfor OTUs j and k are foundas 1]k- MD1k,whereas the Manhattandistances are DI* (Lmax(u)*-Lest)I(Lmax(u)* - Lmn(l)*). (4) MD,k.Therefore, A second quantity,Lact - Lmln(l) describes DR= the excess in evolutionarylength over the z (,k- MDjk)/I MDjk, jk jk (8) absolute hypotheticalminimum. It may be usefulto partitionthis excess into two parts where z is summationof all OTU pairs jk as follows: jk except for pairs and kk. This ratio is af- = jj Lact Lmin(l) (Lmln(u) Lmin(l)) fectedmore by homoplasyat the base of the + (Lact -Lmin(u)). (5) tree, even though the denominator also The firstquantifies the necessary parallel- countsbasal lengthsrepeatedly, because any isms to allow forthe departurefrom a fully excess length due to homoplasy will be compatible distributionof characterstates counted more oftenfor basal than for ter- over OTUs and the second term describes minal internodes. the extraparallelisms and reversalsthat oc- A second measureof characterreversal R2 curredin the actual evolutionaryhistory of is analogous to the DR ratio.It is definedas the group over the minimumamount nec- the sum of the pairwisedistances due to re- essaryto account forthe observeddistribu- versals and nonparallel repeated forward tion of characterstates over OTUs. Biologi- changes divided by the sum of the homo- cally speaking,however, the two termsmay plastic distances among all pairs of OTUs, be difficultto distinguishsince bothdescribe which is the numeratorof DR. It indicates departuresfrom perfect consistency of char- the proportion of homoplasy due to re- acter states with the cladogeny. Clearly, versals. Its 1-complementis the proportion Lact -Lmin(l) is a measure of homoplasyas it due to parallelisms. is conventionallyunderstood and when ex- In studyingsubtaxa within a largertaxon, pressed as a proportionof the maximum as, for example, the genera in the present possible range of lengthof the tree Lmax(u)- study,it is usefulto definethe above statis- Lmin(l) is simply1 - DI (see Fig. 5). tics also for subtaxa. In such a case when An alternativestatistic, adopted in thispa- workingwith subtaxon M, I employthe most per, is recentcommon ancestor PM' or simplyp, of H = LactILmin(l1) (6) that taxon, replacing o by p in the above formulas.Summations over OTUs will then which expressesthe homoplasy as a ratio, be carriedout not over the t OTUs of the necessarilygreater than 1. The amount by entirestudy, but only over the tMOTUs of which H is greaterthan unity is the extra subtaxonM. lengthof the actual tree,expressed as a pro- It is usefulto recordthe path lengthof the portionof the minimumtree lengthneces- stem of each of the subtaxa. Following the saryfor evolution of the characterstates. For earliersymbolism, this can be definedas data sets with unknown cladogenies 1pp,,, where p' is the most recentcommon ances- H = LestILmln(l)*. (7) tor thatis also an ancestorof a Recentnon- memberof subtaxonM. An average branch Note that H* = 1/C where C is the consis- lengthLbr,avg of the internodessubtended by tencyindex of Kluge and Farris(1969), also the most recentcommon ancestorp is also employed by Kluge (1976) and Mickevich useful.It can be computedby dividingLact,M, (1978a). the observedlength of the treerepresenting A thirdmeasure of homoplasyis DR, the the subtaxon,by 2tM- 2, the numberof its deviationratio featured in Farris'WAGNER internodes.Comparing lpp,with Lbr,avgcon- 78 program,which is the sum of the pair- traststhe evolution on the stem preceding 1983 CAMINALCULES: DATA BASE 173 the mostrecent common ancestor of the ge- pectsin 19 zoologicaldata sets,ranging from nus with the subsequent evolution within 8 to 97 OTUs and based on from20 to 139 the genus.The averagebranch length can be characterslacking NCs. None of these data expressedas a branchlength proportion, sets was subjected to the exhaustive anal- ysis to which the Caminalculeswere treated Pbrl = Lbr,avgl(lpp'+ Lbr,avg). (9) (recountedin subsequent papers of this se- An additional complicationarises when- ries)for the obvious reasonsthat such a proj- ever NCs are presentas in this study.I have ectwould have takenseveral additional years, adopted two conventions,which are modi- thatfinding outgroups for them would have fications of the Manhattan distance. In takenmuch specializedknowledge, and that "transparent" measure the NC state is the truecladogenies of the data are of course thoughtof as an unknown.For computation unknown.As will be seen below,the results of distancesbetween terminalOTUs, char- for almost all parametersbracket those ob- acter states for OTU pairs, one or both of tainedfor the Caminalcules.It is not expect- which had NCs fora given character,were ed thatthe differencesin tree topologythat ignoredduring the computationof the dis- might be accomplishedby repeated appli- tance and NCs were passed over during cation of a cladisticalgorithm to the same computation,of path lengths. Thus a 0 - data set in the hope of findinga shortertree NC - 3 - 4 path is four units long. In would alter these relationswith respectto "opaque" measureNCs are considereda dis- homoplasyand othermeasures. I have sum- tinctcharacter state and changes along the marized my conclusionsin Table 3 for ho- tree froman expressedcharacter state to an moplasy, tree symmetryand adequacy of unexpressedone (NC) are considereda sin- characters. gle step. Successive NC statescount as zero Homoplasy.-Threeindices of homoplasy steps and a change froman NC to an ex- have been mentionedearlier and in the lit- pressedstate is again a singlestep, regardless erature.These are the 1-complementof the of the magnitudeof the expressedcharacter dendriticindex (1 - DI), the homoplasyratio state.Thus the path 0 - NC - 3 - 4 is of H, which is the reciprocalof C, the consis- length3. Computingdistances between ter- tencyindex of Kluge and Farris(1969), and minalOTUs, thedifference between two NCs Farris'deviation ratio DR. In the Caminal- was consideredto be zero, thatbetween any cules,two values can be obtainedfor each of numericalstate and an NC was considered theseindices. One is based on the trueclado- to be one. Opaque distance was employed gram and should be the correctmeasure of mainlybecause it made it simple to express homoplasyin the group forthe given index; path lengthfor any internodealong the tree the other,comparable to those furnishedin and to partitionthe path lengthinto homo- the literatureon real organisms,is based on plastic and reversaldistances as needed in the estimatedcladogram. As an estimateI the nextsection of thispaper. In transparent employedthe approximateWagner tree ob- measureit is not always possibleto uniquely tained with the WAGNER 78 programde- estimatepath lengthfor each internode. veloped by J. S. Farris,using the distance Wagner procedure and midpoint rooting. Opaque distanceswere analyzed to permit RELEVANCE OF CAMINALCULES FOR computation of all homoplasy statistics. SYSTEMATIC INQUIRY However, for the two statisticsthat can be It is of interestto examine in how many computed in transparentmeasure (H and ways the Caminalculesresemble data setson DR), these values are reportedas well. Sta- real organisms,since thiswill strengthenthe tisticsbased on estimatedcladograms are dis- relevanceof resultsobtained fromthem for tinguishedby affixingan asteriskto their systematicinquiry. Below I inspectas many symbols. ApproximateWagner trees were separateaspects of the Caminalculesas seem computed for the 19 data sets, again using to me relevantto systematicmethods and the WAGNER 78 programand rootedby the principles.I also examine threeof these as- midpointmethod. Because these data setslack 174 SYSTEMATIC ZOOLOGY VOL. 32

TABLE 3. Minima and maxima of tree statisticsfor 19 zoological data sets and the Caminalcules.a

Caminalcules Tree statistics Low value Estimated True High value Homoplasy 1 - DI* 0.0559b 0.1326 0.1745 0.4646c H* 1.160b 1.417 2.327 6.690d C 0.1495d 0.7057 0.4298 0.8621b DR* 0.1124b 0.1795 1.3591 1.7395d Symmetry BSUM2 0.2597d 0.4900 0.3779 0.7714e BSUM3 -0.0976' 0.2420 0.0753 0.3462g SHAO2 0.4483h 0.6186 0.7720 0.8021c SHAQ3 0.3421h 0.5225 0.7146 0.6840c COLLESS2 0.1061' 0.3331 0.1693 0.4889g Adequacy nit 1.16h 2.93 2.93' 6.62i

a High and low values referto numerical values, not necessarily to the propertydescribed. Thus, high values of C indicate low homoplasy, and high values of BSUM2, BSUM3, and COLLESS2 indicate low symmetry(high asymmetry).The values shown are based on means of three runs of the WAGNER 78 program with midpoint rooting for the symmetrymeasures, and on single estimates for the other measures. Other cladistic estimateswere run as well; fordetails see text.The resultsconform to those shown in this table. The data sets exhibitingextreme values are identifiedin footnotes.For explanations of tree statisticssee text. b Leptopodomorpha (Schuh and Polhemus, 1980). c Orthopteroidinsects (Blackith and Blackith,1968). d Hoplitiscomplex (Michener and Sokal, 1957). e WesternBufo (Feder, 1979). f Drosophila(real set; Throckmorton,1968). 8 Hemoglobin , (Dayhoff,1969). h Dasyuridae (Archer, 1976). Is 5.34 if binary coded data are considered. l Pygopodidae (binary coding; Kluge, 1976).

NCs, the distinctionbetween opaque and rentlybracketed by the data sets on real or- transparentmeasur(e does not apply to them. ganisms,they presumablywould still be so As an experiment,I also triedtrees with an bracketedafter the length of all trees had outgrouprooting using OTU 1 as the out- been reduced. group (clearly not a recommendedproce- In the Caminalcules 1 - DI = 0.1745 and dure,although it has been used by some nu- 1 - DI* = 0.1326.In the 19 data sets 1 - DI* mericalcladists; e.g., Mickevich, 1978b; Farris, rangesfrom 0.0559 in the Leptopodomorpha 1979a).Only midpointrooting is reportedin (Schuh and Polhemus, 1980) to 0.4646 in 12 Table 3. Resultsfor the OTU 1 rootingpro- orthopteroidinsects (Blackithand Blackith, cedure were similar. 1968). Because the true trees for these real It is worth emphasizingagain that,both organismsare not known, the proper com- for the Caminalcules and the 19 real data parison with the Caminalcules is with 1 - sets, the above method provides only ap- DI* fromthe approximateWagner estimate. proximate Wagner trees. Better estimates For the truecladogram H = 2.327,and for could have been obtainedwith furtherwork the Wagner estimateH* = 1.417 and 1.261 and were indeed obtained forthe Caminal- for opaque and transparentestimates, re- cules (see Sokal, 1983a).I am employingthe spectively,yielding correspondingconsis- "cruder"estimate for the Caminalcules,since tency indices of 0.4298, 0.7057, and 0.8048. it is comparableto the estimatesin the 19 Mickevich(1978a) reporteda range of con- data sets. Presumablyhomoplasy would de- sistencyindices from 0.33 forAedes and pap- crease if the length of the trees could be ilionidsto 0.86 forcytochrome C and globin. shortenedalgorithmically. But, on the aver- In the 19 data sets,H* ranges from1.160 in age, thiswould occur proportionatelyfor all the Leptopodomorphato 6.690in bees of the data setsand, sincethe Caminalculesare cur- Hoplitiscomplex (Michener and Sokal, 1957), 1983 CAMINALCULES: DATA BASE 175 corresponding to consistency indices of WAGNER program,rooted at the trueances- 0.8621and 0.1495,respectively. tor,and an estimatedWagner tree obtained The deviationratio (DR) forthe true clado- with the WAGNER 78 programwith mid- gram is 1.3591,a very high value, but DR* pointrooting. The latterwas replicatedthree is 0.1795 and 0.3852 for opaque and trans- timeswith randomly permuted input orders parentWagner estimates,respectively. This of OTUs. great discrepancycan be accounted for by These same statisticswere also calculated the numerous reversalsin the true clado- forsimilarly estimated cladograms (except for gram,especially along the stemsdefining the the impossibletrue ancestor rooting) for the genera. Remember that the estimateopti- 19 data sets.Again, rooting using OTU 1 was mizes the positionof the HTUs and, in con- employed additionallyon an experimental sequence, attemptsto minimizehomoplasy. basis for trees obtained by both programs. The formulafor computing the deviationra- Not all data sets were suitablefor each esti- tio involves all pairwise distancesbetween mationmethod and statistic,but no reported OTUs and, therefore,repeatedly counts these rangeof values is based on fewerthan 14 data basal relations.Actually the parallelismcom- sets. In Table 3 resultsare reportedonly for ponentof the homoplasyis relativelylow in estimatedcladograms based on the WAG- the Caminalcules. For the lower triangular NER 78 programand midpointrooting, but matrixin opaque measure,the sum of the below the other resultsare summarizedas homoplastic distances is 21,225, of which well. For WAGNER 78 with midpointroot- 17,093is due to reversalsand 4,132is due to ing, the true cladogramis containedwithin parallelisms.The Wagner tree algorithmin the range of observed symmetriesfor all tryingto obtaina shortestlength tree cancels coefficientsexcept SHAO3. But the true many of the reversals,so thatthe deviation cladogramshows more symmetrythan the ratio actually observed is relatively low, observed range of symmetryvalues for es- 0.1795,whereas the deviationratio of the true timated cladogramsbased on the PHYLIP cladogramis much higher.The range of ob- WAGNER program and on WAGNER 78 served values of DR* in the 19 data sets is with OTU 1 as an outgroup.Yet, when es- from 0.1124 in the Leptopodomorpha to timated cladograms are computed for the 1.7395 in the Hoplitiscomplex. If reversals Caminalcules,these are almost always less are as common in real organismsas in the symmetricalthan the truecladogram. In Ta- Caminalcules and if we knew their true ble 3 formidpoint rooted Wagner treesthe cladograms,real organismsmight well ex- symmetriesof estimatedcladograms of the hibit generallyhigher deviation ratios. Real Caminalculesare containedwithin the range and apparent homoplasy in the Caminal- of observedvalues and, when otherestimat- cules is well within the reportedrange for ed cladogramsare consideredas well, this real organisms. relationholds in 17 out of 20 comparisons. 5ymmetry.-Thetrue cladogram of the Since the other19 classificationsare all based Caminalculesis fairlysymmetrical. No uni- on estimatedcladograms, it is the estimated versallyaccepted criterion of symmetryhas cladogramsof the Caminalculesthat must be yet been established,but K. T. Shao (pers. compared with these data sets ratherthan comm.),who is investigatingthis problem, the truecladogram. Thus on this basis, also, has used fiveseparate indices to describedif- it appears that the Caminalcules are not ferentaspects of symmetry. These are BSUM2 atypical. and BSUM3 (modifiedfrom Sackin, 1972), Adequacyof thecharacters for resolving the SHAO2 and SHAO3 (Shao, pers.comm.), and cladogram.-The29 Recent OTUs in a fully COLLESS2 (Colless, 1982).These coefficients bifurcatingtree would derivefrom 27 bifur- were computedfor the truecladogram of the cations,not countingthe basal bifurcationat Caminalculesas well as forseveral estimated the root.Since one of the branchingpoints cladogramsof the Caminalcules. The esti- in the true cladogram is a trifurcation,26 mated cladogramsinclude one obtained by synapomorphiesare minimally needed to means of Joseph Felsenstein's PHYLIP resolve the tree. How can we measure the 176 SYSTEMATIC ZOOLOGY VOL. 32 adequacy of the data set forthis task? At the termsof the charactersavailable, in addition simplest level, one can count characters. to the known trifurcation-which goes Fewer than 26 binarycharacters cannot, re- counterto the assumptionsof most cladistic solve the tree. Subjectedto additive binary methods.Correlated characters undoubtedly coding, the 85 X 29 data yield 155 binary definefurcations redundantly, and parallel- characters-superficiallya more than ade- ism allows a single characterto definefur- quate number.However, such an approach cationsin differentparts of the tree.Unfor- is too crude since it does not allow forchar- tunately,knowledge of this typeis virtually acter correlations.In fact,we know that in impossibleto obtain fromreal data sets,be- additionto the single trifurcation,three bi- cause the truecladograms are unknown.Us- furcationsin the true cladogram(19-26, 11- ing estimatedcladograms for such inferences 21, 18-23)are not supportedby any evident would tend to make the argumentcircular. evolutionarychange in the stems subtend- In any case, it appears thatthe Caminalcules ing them.Thus, even thoughthere are more are unlikelyto be less capable of havingtheir than fivetimes as manybinary characters in cladogramresolved than most data sets in thisdata set than are necessaryto definethe systematics. tree,they are distributedacross the tree in Because of the incompletenessof the fossil such a way as to make it impossibleto define record,less is known fromreal organismsof more than 23 branchingpoints. Most nu- the followingthree aspects. However, it may merical cladisticstudies are carriedout on be of some interestto treatthese topics at far fewer charactersand because the true least brieflyin an effortto describe where cladogramsof real organismsare unknown, the Caminalculesare locatedwith respectto it is generallynot known whetherthe data evolutionaryparameters-evolutionary rates, matrixis adequate forresolving the truetree. specieslongevities, and speciation-extinction In the 19 data sets the ratio n/t,where n is rates-that mightbe employedin simulation the numberof charactersand t the number studies.It will be seen thatthe Caminalcules of OTUs, ranges from 1.16 in Dasyurus are not in contradictionwith such findings (Archer,1976) to 6.62 forbinary coded mem- as are reportedin the paleobiologicallitera- bersof the Pygopodidae(Kluge, 1976).These ture. figurescompare with 2.93 and 5.34 formul- Evolutionaryrates.-A frequencydistribu- tistateand binary coded Caminalcules,re- tion of the path lengths(amount of evolu- spectively(see Table 3). Since the values for tionarychange) of each internodefor each the 19 data sets reflectdifferences in char- timeperiod shows extremeclumping. When actercoding, they must be interpretedwith examined against Poisson expectations(Fig. caution. Nevertheless,it is clear that the 6), the overdispersionis highly significant Caminalculesfall well withinthe bounds of (P < 0.001). Thus, thereare many more pe- data fromthe literature. riodsof no evolutionas well as moreperiods One can examine the true cladogramto with substantialamounts of evolutionthan determinethe number of OTUs subtended expected.Two explanationscan be advanced by any given furcation.It ranges from2 to forthis phenomenon.(1) The Caminalcules 22 OTUs. These figuresshould minimallybe are similar to real organismsin that their matchedby the numbersof OTUs (greater evolutionis of organicform as a whole rath- than one) sharingany one characterstate to er than independent for each character. providesynapomorphies for the recognition Changes in form in the Caminalcules in- of these furcations.When the requireddis- volve various correlatedcharacters and thus tributionof OTU numbers is compared to those segments of the cladogram during the actualdistributions of such numbers,one which extensive morphological evolution findsat least as many observed frequencies occurred will exhibit greater amounts of as are required.However, this is insufficient change.(2) Therealso may be local clumping evidence forthe adequacy of a data set be- of evolutionarychanges on the treefor non- cause we already know that three bifurca- biologicalreasons which have an analog in tionsin the Caminalculesare not resolvedin real phylogeneticprocesses. We must as- 1983 CAMINALCULES: DATA BASE 177

8 - 5-

6 z 03

4 0

7 ......

>7

201 o 2 4 6 8 10 12 14 16 18 20 1 23 4 TIME PERIOD L FIG. 7. Average evolutionaryrate plotted against time period. The rate is expressed along the ordinate 6 _ ~~~~~ as average path length in the (unit) time interval preceding the point in time shown along the ab- scissa. The path lengths are computed on the basis of 106 characters.The means of this figureare based on 4 lines leading to Recent OTUs only.

logicalanalog is to assumethat major changes 7 in environmentor in genetic architecture to 8 leading adaptive radiation are ongoing processeswith a momentumof their own. FIG. 6. Observed and expected frequenciesof evo- The data in Figure6 are based on only those lutionary rates (path length per evolutionary time lines leading to RecentOTUs. When all in- period) in the true cladogram of the Caminalcules. These data are based only on those lines leading to ternodesincluding those leading to extinct Recent OTUs. The bar diagram of the "skyline" (bars terminalspecies are examined,the resultsare above the abscissa) indicates the expected frequen- very similar to those already reportedand cies of a Poisson distribution;the "inverted skyline" illustratedin Figure6. Similardata are hard (shaded portion of the bars) indicates the observed frequencies.Both frequencies are given in square roots to obtain in real organismsbecause of the of actual values to emphasize the deviations. Where incompletenessof the fossilrecord, but the the invertedskyline does not reach the abscissa,there observed patternof evolutionarychange is are fewerobserved than expected frequencies.Wher- at least biologicallyplausible. ever it reaches below the abscissa, there is an excess In Figure7, I show a graph of the average of observed frequencies over expected frequencies. The ordinateshows square rooted frequencies,while evolutionaryrates over all evolutionarylines the abscissa expresses evolutionaryrates. foreach of the 19 time periods (again, only forlines leading to RecentOTUs; the results includinglines leading to extinctspecies are quite similar).The ratesare computedfrom sume thatCamin in constructinghis treewas the 106 characterX 77 OTU data base. There at least somewhatgoal orientedso thatas he is clear evidence of differentialevolutionary evolved any one line he carriedit through ratesthrough time. Between times 7 and 11, to some major morphologicalchange rather rates were generally lower than at other than abandon the changein midcourse.This times. By time 7, all genera except for DE processalso tends to induce inhomogeneity and C had already been definedbut major of rates among the lines which results in within-genusdiversity had not yet begun clumpingof the distributionof evolutionary exceptin genus A. A runs-up-and-downtest changes when consideredoverall. The bio- (Sokal and Rohlf,1981b), significant at P < 178 SYSTEMATIC ZOOLOGY VOL. 32

0.01, demonstratesthe alternatingnature of 100 - A the rate changes. Periods of higher change alternatedwith periods of lower change more - IL 80 RECENT frequentlythan could be expectedby chance > 60 - ---- RECENT+FOSSIL alone. Relevant comparisonswith real or- ganisms are hard to come by. That evolu- - 40 '\ tionaryrates differ over the fossilrecord has D 20 _ s _ -_^s been well established since the work of Simpson (1944, 1949,1953). Yet, because pa- 2 4 6 8 10 12 14 16 leontologicalseries are usually based on few LONGEVITIESOF SPECIES IN TIME PERIODS charactersand because the fossillineages are not well known, estimatesof evolutionary Z Z ratesbased on multiplecharacters as in Fig- I: 100 B X

logical, and biogeographic data sets for the Lep- Caminalcules. III. Fossils and classification.Syst. topodomorpha (Hemiptera). Syst. Zool., 29:1-26. Zool. (in press). SIMPSON, G. G. 1944. Tempo and mode in evolution. SOKAL,R. R., AND F. J.ROHLF. 1966. Random scan- Columbia Univ. Press, New York. ning of taxonomiccharacters. Nature, 210:461-462. SIMPSON, G. G. 1949. The meaning of evolution.Yale SOKAL,R. R., AND F. J.ROHLF. 1980. An experiment Univ. Press, New Haven, Connecticut. in taxonomicjudgment. Syst. Bot., 5:341-365. SIMPSON, G. G. 1953. The major featuresof evolu- SOKAL,R. R., AND F. J. ROHLF. 1981a. Taxonomic tion. Columbia Univ. Press, New York. congruence in the Leptopodomorpha re-examined. SNEATH, P. H. A. 1982. [Review of] Systematicsand Syst. Zool., 30:309-325. biogeography:Cladistics and vicariance.Syst. Zool., SOKAL,R. R., AND F. J.ROHLF. 1981b. Biometry.2nd 31:208-217. ed. W. H. Freeman, San Francisco. SNEATH, P. H. A., AND R. R. SOKAL. 1973. Numerical STANLEY, S. M. 1979. Macroevolution, pattern and taxonomy.W. H. Freeman, San Francisco. process. W. H. Freeman, San Francisco. SOKAL, R. R. 1966. Numerical taxonomy. Sci. Am., THROCKMORTON, L. H. 1968. Concordance and dis- 215:106-116. cordance of taxonomic characters in Drosophila SOKAL, R. R. 1974. Classification:Purposes, princi- classification.Syst. Zool., 17:355-387. ples, progress,prospects. Science, 185:1115-1123. WILEY,E. 0. 1981. Phylogenetics.John Wiley, New SOKAL, R. R. 1983a. A phylogenetic analysis of the York. Caminalcules. II. Estimating the true cladogram. Syst. Zool., 32:185-201. Received14 January1983; accepted11 April1983. SOKAL,R. R. 1983b. A phylogenetic analysis of the

APPENDIX Listof Charactersfor 77 Fossiland RecentCaminalculesa,b Head and neck 1. Head junction complex (folded) (1) or not (0). [73, 39] 2. If complex, degree of folding complete (1) or partial (0). [73, 10] 3. If partial,secondary junction narrow (0) or broad (1). [10, 21] 4. P (1) or A (0) of horn. [75, 73] 5. If horn present,pointed (1) or flattened(0). [75, 64] 6. Length of head in mm fromrear end of folded section to front,excluding horn and anteriorprojections. If head not complex, measure fromcollar. Recoded as integer in the following manner: (0) 9-10.9; (1) 10.9-12.8; (2) 12.8-14.7; (3) 20.4-22.3; (4) 24.2-26.1; (5) 26.1-28. [73, 65, 60, 28, 25, 39] 7. Anteriorend of head concave (0), flat(1) or convex (2). [71, 73, 62] 8. If convex, rounded (0) or sharply pointed (1). [62, 15] 9. P (1) or A (0) of anteriorprojections. [71, 73] 10. P (1) or A (0) of eyes. [73, 59] 11. If present, states of fusion of eyes: (0) two separate eyes; (1) grown together but two eyes still discernible; (2) grown togetherinto oblong approximatelythe size of two eyes; (3) grown together into circle approximatelythe size of one eye. [73, 47, 1, 16] 12. P (1) or A (0) of eye stalks. [37, 73] 13. If stalked, length of stalk (excluding eye) in mm. Recoded as integerin the following manner: (0) 3-4.5; (1) 4.5-5.9; (2) 6-7.5; (3) 10.5-12; (4) 13.5-15; (5) 16.5-18. [37, 38, 70, 48, 72, 19]

I These characterswere originally defined by A. J.Boyce and I. Hu- acters out of numerical order. The following paragraph gives the lo- ber and were revised by B. Thomson and R. R. Sokal, cation of all charactersthat are not in strictnumerical order in the list. bAbbreviations and conventions: P-presence; A-absence. Num- Characters26-28 in this list follow upon character22; 58 follows 13; bers in parenthesesare characterstate codes. Numbers in square brack- 59 follows 56; 60 and 61 follow 48; 62 follows 47; 78 follows 80; 84 ets following definition of each character are the OTU numbers of follows 90 (23); 85 follows 63; 86-90 follow 23; 91 and 92 follow 34; representativeOTUs that exhibit the characterstates in the orderde- 93-95 follow 35; 96 follows 45; 97 follows 62 (47); 98 follows 61 (48); scribedin thelist. Thus forcharacter 1 [73, 39] means thatOTU 73 is an 99 follows 55; 100 and 101 follow 85 (63); 102 follows 67; 103 follows example of a complex (folded) head junction and OTU 39 is an example 69; and 104-106 follow 75. of a not complex head junction. Whenever an "If" statementis denied, The following lists of characterscomprise the several body regions: as in character2 for OTUs whose state for character 1 is 0, an NC is head and neck, 1-17, 58 (total of 18); anterior appendages, 18-30, 84, recorded for the OTU for that character.Since all measurementchar- 86-90 (total of 19); posterior appendages, 31-38, 91-95 (total of 13); acters were recorded to the nearest mm, there is no ambiguitycreated collar and abdomen, 39-57, 59-63, 85, 96-101 (total of 31); abdominal by shared, more refined class limits of adjacent classes as listed for pores, 64-83, 102-106 (total of 25). The overall total is 106 characters. some characters(e.g., for character6, a measurement of 10 mm was The images shown in Figure 1 (OTUs 1-29) were traced from the coded 0, while one of 11 mm was coded 1). originals, whereas those in Figure 2 were traced fromxeroxes of the The character numbers 1-85 employed in this list correspond to originals. In the process and in the reductionnecessary for illustration those employed in earlier studies and originallydefined by Boyce and here some detail visible to the data coders was necessarily lost. The Huber. Characters numbered 86-106 were needed to describe addi- states of at least one character(17) are quite indetectable in the pub- tional observed differencesin fossil species. In order to preserve both lished figures.Lengths in millimetersare given in terms of the di- the old characternumbering systemand at the same time the logical mensions of the original images. As reduced in Figures 1 and 2, these order of charactersin the list, it became necessary to list some char- measurementsmust be multiplied by 0.387. 182 SYSTEMATIC ZOOLOGY VOL. 32

58. If stalked, stalks fused (1) or not (0). [20, 37] 14. If stalks not fused, tips divergent (0), parallel (1) or convergent (2). [37, 19, 31] 15. Top of head depressed (0), flat(1) or crested (2). [73, 41, 53] 16. If crested,single (0) or lobate (1). [53, 2] 17. P (1) of A (0) of groove in neck. [12, 73]

Anteriorappendages 18. P (1) or A (0) of anteriorappendages. [73, 15] 19. If appendages present, length in mm fromtangent to posteriorend of abdomen (excluding pos- terior appendages) up to the anterior end of the longer appendage. Recoded as integer in the following manner: (0) 40-46.3; (1) 46.3-52.6; (2) 52.6-58.9; (3) 58.9-65.2; (4) 65.2-71.5; (5) 71.5-77.8; (6) 77.8-84.1; (7) 84.1-90.4; (8) 96.7-103. [73, 74, 45, 43, 66, 24, 17, 19, 20] 20. If appendages present,P (1) or A (0) of flexion(elbow) in appendages. [41, 73] 21. If appendages present,end of appendage divided (1) or not (0). [56, 73] 22. If not divided, end of appendage is tendril(0), sharply pointed (1), tapered or rounded (2) or knobbed (3). [21, 68, 73, 45] 26. If not divided, P (1) or A (0) of flange. [38, 73] 27. If flange present, length expressed as sum of right and left sides in mm. Recoded as integer in the following manner: (0) 17-20; (1) 23-26; (2) 32-34.9; (3) 35-37.9; (4) 38-41; (5) 44-47. [38, 70, 48, 50, 20, 29] 28. If flange present, width expressed as sum of right and left sides in mm. Recoded as integer in the following manner: (0) 5-6.9; (1) 10.7-12.6; (2) 14.5-16.4; (3) 16.4- 18.3; (4) 18.3-20.2; (5) 20.2-22.1; (6) 22.1-24. [38, 70, 48, 31, 29, 72, 26] 23. If divided, two divisions (0) or three (1). [22, 56] 86. If two divisions, ending as fingers(0), clamp (pincers without joint) (1), pincers with joint (2) or two wavy spikes (3). [22, 49, 30, 67] 87. If ending as clamp, closed (0) or open (1). [49, 36] 88. If open, clamp is small (0) or large (1). [36, 55] 89. If ending as pincers with joint, serratededges P (1) or A (0). [57, 30] 90. If three divisions, one "finger" much longer than the other two (2), one fingerslightly longer than the other two (1) or all fingersthe same size (0). [65, 76, 56] 84. If divided, bulbs on ends of divisions present on all divisions (2), present on some divisions (1) or absent (0). [18, 66, 56] 24. If bulbs present on all divisions, P (1) or A (0) of claws. [12, 46] 25. If bulbs present on all divisions, P (1) or A (0) of pads. [75, 46] 29. If appendages present,P (1) or A (0) of pigment. [9, 73] 30. If pigment present, distributedin small dots (-1), small circles (0), large circles (1) or very broad areas (2). [33, 9, 11, 21]

Posteriorappendages 31. Appendage single (0), partially double (1) or completely double (2). (Completely double means that appendage originates from 2 stalks rather than 1; it does not necessarily mean that the 2 halves are separated.) [73, 43, 18] 32. If single, disclike (pedestal like) (0), platelike (1) or propellerlike (2). [73, 40, 36] 33. If disclike, width of disc in mm. Recoded as integer in the following manner: (0) 5-5.7; (1) 5.7-6.4; (2) 6.4-7.1; (3) 7.8-8.5; (4) 8.5-9.2; (5) 9.9-10.6; (6) 10.6-11.3; (7) 11.3-12. [13, 58, 39, 38, 37, 72, 42, 77] 34. If disclike, stalk long (>3 mm) (2), short (<3 mm) (1) or absent (0). [74, 73, 28] 91. If disclike, round (0) or square (1). [73, 74] 92. If square, P (1) or A (0) of division lines. [49, 74] 35. If platelike, no indentation (-1), weakly emarginate(curved indentation) (0), cleft(V-inden- tation) (1), or deeply cleft (V-indentationcontinued into division line) (2). [40, 33, 11, 6] 93. If propellerlike,blades rounded (0) or pointed (1). [36, 67] 94. If propellerlike,stalk short (<5 mm) (0) or long (>5 mm) (1). [36, 55] 95. If stalk long, straight(0) or twisted (1). [55, 30] 36. If completely double, ends pointed (0), clubbed (1) or platelike (2). [3, 75, 18] 37. If platelike, divided (1) or not (0). [66, 18] 38. If divided, two divisions (0) or three (1). [66, 53] Collarand abdomen 39. Rim of abdomen plain (0), narrowlyraised (1) or broadly raised (2). [64, 45, 73] 40. Anteriormargin of abdomen well delineated (0) or imperfectlydelineated (1). [73, 31] 41. P (1) or A (0) of abdominal ridge (raised area on central posteriorabdomen). [73, 45] 1983 CAMINALCULES: DATA BASE 183

42. P (1) or A (0) of large areas of color on anteriorabdomen. [56, 73] 43. If present, whole (2), bisected (1), quadrisected and square (0) or quadrisected and irregularly shaped (-1). [66, 46, 56, 65] 44. P (1) or A (0) of spots on anteriorabdomen. [44, 73] 45. If present,small (0), large (1) or large and cross-shaped (2). [44, 27, 35] 96. If small, 2 (0) or 4 (1) spots. [17, 44] 46. P (1) or A (0) of postabdominal spots. [44, 73] 47. If postabdominal spots present,large (open circles) (0), medium (> 1.5 and ?2.0 mm avg.) (1), small (>1.0 and <1.5 mm avg.) (2) or dots (<1.0 mm) (3). [56, 47, 44, 35] 62. If large, anteriorlateral spots medially fused (0), posteriorlyfused (1) or free (2). [66, 53, 56] 97. If postabdominal spots present,P (1) or A (0) of a group of four free posteriorspots. [44, 56] 48. If group present,posterior group of fouris stronglyconcave (0), weakly concave (1), or convex (2). [24, 44, 69] 60. If postabdominal spots present,number of free (not fused) spots on postabdomen: zero (-1), two (0), four (1), six (2), eight (3), ten (4), twelve (5) or fourteen(6). [77, 3, 66, 53, 56, 44, 63, 76] 61. If six or eight large(char. 45), freespots, P (1) or A (0) of gap between second and thirdrows of spots. [64, 18] 98. If postabdominal spots present, and anterior abdominal spots (char. 44) also present, spots con- nected into racquet design (either by lines or by fusion) (1) or not (0). [77, 44] 49. Posteriorend rounded (0) or angular (1). [73, 21] 50. P (1) or A (0) of dorsal fin. [50, 73] 51. If present,length of attachmentin mm. Recoded as integerin the following manner: (0) 7-7.7; (1) 7.7-8.4; (2) 8.4-9.1; (3) 9.8-10.5; (4) 13.3-14. [48, 50, 31, 20, 29] 52. If present,posterior apex projectingbeyond rear of attachment(2), not projecting(1) or projecting anteriorto rear of attachment(0). [72, 50, 20] 53. Posteriorabdomen swollen (1) or not (0). [35, 73] 54. Posteriorbars (apparently fused spots) absent (0), single (1) or double (2). [73, 2, 43] 55. If double, shortestdistance between bars in mm. Recoded as integer in the following manner: (0) 1-1.8; (1) 1.8-2.6; (2) 6.6-7.4; (3) 8.2-9. [43, 18, 75, 64] 99. If double, elongate (0) or fatand roughly triangular(1). [43, 56] 56. If elongate, straight(0), curved (1), weakly J-shaped(posterior ends thickened) (2), or strongly J-shaped(3). [64, 43, 3, 12] 59. If double, P (1) or A (0) of large cross-shaped spot in postabdomen median to anteriortips of bars. [3, 56] 57. If single, semicircular(0) or U-shaped (1). [5, 2] 63. Greatestwidth of abdomen in mm. Recoded as integer in the following manner: (0) 15.95-17.95; (1) 19.95-21.95; (2) 21.95-23.95; (3) 23.95-25.95; (4) 27.95-29.95; (5) 29.95-31.95; (6) 33.95-35.95; (7) 36. [20, 48, 54, 73, 35, 63, 24, 47] 85. Length of abdomen in mm frombase to tip of collar. Recoded as integer in the following manner: (0) 38-42.9; (1) 42.9-47.8; (2) 47.8-52.7; (3) 52.7-57.6; (4) 57.6-62.5; (5) 72.3-77.2; (6) 82.1-87. [73, 47, 64, 75, 50, 19, 20] 100. P (1) or A (0) of lateral flippers.[42, 73] 101. If present,small (0), medium (1) or large (2). [42, 51, 61] Abdominalpores 64. P (1) or A (0) of large pore in column 10 (most anteriorposition). [45, 73] 65. Group II (columns 4 and 5) has zero pores (-1), one pore (0) or two pores (1). [61, 69, 73] 66. If two pores, fused (1) or not (0). [5, 73] 67. If fused, fusion complete (0) or incomplete (1). [48, 5] 102. If not fused, number of pores looking like slits: zero (0), one (1) or two (2). [73, 68, 51] 68. Group III (columns 6-10) has one (0), two (1), three (2) or five (3) pores. [64, 18, 43, 73] 69. If two or three pores, one pore looks like a slit (1) or not (0). [43, 76] 103. If five pores, the two pores closest to group II look like slits (1) or not (0). [56, 73] 70. If five pores, P (1) or A (0) of fusion. [50, 73] 71. If fusion present,complete (0) or incomplete (1). [29, 50] 72. If complete, one (0) or two (1) groups of fused pores. [72, 29] 73. If incomplete,P (1) or A (0) of a group of three fused pores. [26, 50] 74. Group I (columns 1-3) has zero (0), one (1), two (2) or three (3) pores. [47, 24, 35, 73] 75. If one or two pores, one pore looks like a slit (1) or not (0). [63, 35] 104. If two or three pores, pore nearest posteriorenlarged (1) or not (0). [42, 73] 105. If three pores present and posteriorone enlarged, eniarged pore apparentlysee-through (1) or not (0). [30, 42] 106. If see-through,fused with posteriorpore fromother side's Group I (1) or not (0). [57, 30] 184 SYSTEMATIC ZOOLOGY VOL. 32

76. If three pores, separated as three open pores of any size (0) or in some way modified (1). [73, 46] 77. If modified,modified by fusion (1) or some other way (0). [64, 46] 79. If fusion,two (0) or three (1) pores fused. [50, 64] 80. If fusion,broad and complete (0), broad and incomplete (1), or slit-like(2). [64, 66, 71] 78. If broad and complete and three pores fused (char. 79), both lateral pores reduced and joining median one (1) or not (0). [22, 64] 81. If modifiedin some other way, P (1) or A (0) of line (or slit) connecting pores. [46, 65] 82. If present,two (0) or three (1) interconnectedpores. [53, 46] 83. If absent, slits in column 1 (0), column 2 (1), columns 1 and 2 (2), or column 3 (3). [60, 65, 8, 51]