ARTICLE ANOPA: ‘Statistical’ for Young-Earth Creationists

Dan Bolnick, University of Texas, Austin

reationists has been working hard to make been active members in the Baraminology Study Ctheir views appear as legitimate science. Part Group, whose website proclaims its “ultimate goal is of this strategy has been to get papers pub- to develop origin models that accommodate empiri- lished in peer-reviewed scientific journals. cal data in a biblical framework of earth history Creationists have used two strategies to achieve this through scientifically sound analysis of biological data goal. First, the recent review paper on “intelligent and scholarly analysis of biblical texts” design” (Meyer 2004) in the Proceedings of the (). Biological Society of Washington was published by To this end, Cavanaugh has been developing a bypassing the normal peer review process.According quantitative method for identifying whether a collec- to a recent statement from the Council that publishes tion of represent a single “created kind”or are the journal, the editor Richard Sternberg handled the sufficiently distinct as to qualify as members of differ- paper in a manner “contrary to typical editorial prac- ent holobaramins. This method,called ‘Analysis of tices”(). Pattern’(ANOPA) is touted as a method to “reduce the The second approach is to sanitize the content of dimensionality of multi-dimensional data with mini- the paper. Two other papers written by creationists mal loss of information and no assumptions about the have been published in peer reviewed journals in data’s distribution”(Wood and Cavanaugh 2003: 2). By 2004 (Behe and Snoke 2004; Cavanaugh and reducing complex multivariate data down to 3 dimen- Sternberg 2004). Unlike Meyer’s paper, which was sions,ANOPA allows the user to visualize patterns of handled by a friendly editor who retired soon after similarity and difference among species to see if they (and also co-authored the Cavanaugh and Sternberg fall into discrete clusters. This purely descriptive paper), these have actually passed through the peer method could then be combined with statistical infer- review process.This was accomplished by removing ence (confidence intervals around the clusters) to any overt reference to or “intelligent infer whether these clusters might overlap.The impli- design”, a strategy clearly outlined in a recent paper cation is that overlap would imply membership in a by a group of young-earth creationists: common holobaramin, and disjunct groups imply sep- arate baramins. To make a creationist theory ‘theory neutral’ As of late 2004, three papers had been published (that is palatable to non-creationists), much of using ANOPA. The first two were published in cre- what makes it distinctly creationist must be ationist journals (Origins, and Occasional Papers of removed. This may be useful, for example, in the Baraminology Study Group) and clearly apply order to get a controversial scientific study ANOPA as a criterion for evaluating whether groups reviewed by competent evolutionary scholars of species (subtribe Flaveriinae and tribe Heliantheae, in a secular journal, but the elimination of cre- respectively) represent one or more of the Bible’s cre- ationist content as a general practice generates ated kinds (Cavanaugh and Wood 2002; Wood and more work for creationists.To integrate ‘theory- Cavanaugh 2001). Both papers provide only the neutral’ research back into the creation model sketchiest description of ANOPA as a method, citing from which it came, all that was excised must an unpublished paper.Without full description of the be replaced (Wood and others 2003). method, it was impossible to evaluate their claims Such excision is a clever, if intellectually dishonest, about its value, such as: “Because these calculations ploy that underlies the recent publication of a paper require no assumptions about the distribution of the by the young-earth creationist David Cavanaugh and data and retain more information regarding dataset the biologist Richard Sternberg in the peer-reviewed variation, ANOPA can reveal patterns obscured by Journal of Biological Systems. Both authors have other variance-analysis methods such as Principle Component Analysis. Consequently,ANOPA is the best Dk is Dan Bolnick is Dan Bolnick is Dan Bolnick is Dan available method to display biological character Bolnick is Dan Bolnick is Dan Bolnick is Dan Bolnick is space and reveal taxonomic patterns.” (Wood and JUL-AUG 2006 Dan Bolnick is Dan Bolnick is Dan Bolnick is Dan Cavanaugh 2003, emphasis added). REPORTS Bolnick is Dan Bolnick is Dan Bolnick is Dan Bolnick is A complete description of ANOPA has finally Dan Bolnick is Dan Bolnick is Dan become available, in a peer-reviewed journal. Taking 1 Continued on page 27 Continued from page 22 the strategy outlined in Wood and others (2003) to get through peer review,Cavanaugh and Sternberg wrote FIGURE 1. a paper describing ANOPA—a method developed for A) Rotation of axes by Principle young-earth creationist purposes and used as such in Componenet other venues,— but presented here without any ref- Analysis. erence to creationism. Cavanaugh and Sternberg B) Rotation via (2004) also apply ANOPA to a group of well known ANOPA, connect- North American freshwater fishes, the Centrarchidae ing the centroid (sunfish, bass, crappies, fliers), for which a large data to the most most set of morphological traits was available. distant data point to form This publication raises an interesting strategic the primary axis. question: how should the scientific community han- Location/choice dle papers written by creationists, describing meth- of outgroup will have a major ods or ideas designed to support creationist research, effect on results. that make no overt reference to creationism? One option is to treat such papers the way we treat any sci- entific work: assess the strength of its methods, and two such dimensions for clarity (Figure 2A).There are the rigor of its interpretation. Therefore, the remain- two conceptually distinct versions of ANOPA: 1- der of this essay will be spent describing and cri- dimensional ANOPA draws a histogram of distances tiquing ANOPA, and its particular application by among points, while 2- or 3-dimensional ANOPA Cavanaugh and Sternberg (C&S). reduces the data down to three variables and pro- HOW IS ANOPA SUPPOSED TO WORK? duces a scatter plot of the data points to look for dis- Multivariate data are a common feature of studies that junct groups. try to classify species based on morphological simi- For 1-D ANOPA, we first identify the central point larities. Such data are difficult to visualize because we (‘centroid’) in the cloud of data, the average of each cannot simultaneously view relationships among trait across all species (Figure 2B). We then calculate more than three variables at a time. Statisticians have the Euclidian length of the vector connecting each gotten around this problem with a variety of methods species’ point to that centroid (Figure 2C). Looking at such as Principle Component Analysis (PCA) that try a histogram of these distances (Figure 2D), C&S sug- to reduce the variation in many different variables, gest that different peaks in the histogram correspond down into a smaller number of important variables to different subgroups. In fact, these results can arise that capture most of the action.This is best illustrated from random chance: the data in Figure 2D appear to in Figure 1, where we have hypothetical data on the have 3 groups, but this is an artifact of small sample number of vertebrae and gill rakers for each of 20 sizes drawn from a single normal distribution. One’s species. There is clearly important variation in both choice of how wide the histogram bars are will affect traits, so we do not want to ignore one of them. We resolution and may create statistically non-significant can get around this problem by ‘rotating’ the axis of groups, or can obscure real groups.The 1-D test can the graph so that one axis (the ‘first principle compo- also fail to identify distinct groups (Figure 3), indicat- nent’) captures most of the action. We could then ing that it is likely to be very sensitive to both overall either ignore the remaining variation, or use a second principle component axis that is perpendicular to the first, to “explain” the remaining variation — by show- ing us how strong a relationship there is between individual variables and these axes. If we have 10 vari- ables instead of the 2 illustrated in Figure 1, there will be 10 possible Principle Components (PCs), but in general we find that the first couple of PCs capture FIGURE 2. A) Each species has values for each B) Calculate centroid of data most of the meaningful information. of m different characters.The goal is to (X0), which is the average for ANOPA tries to do something very similar,reducing reduce these m-dimensions to 2 or 3 each variable. some large number of variables down to three axes of dimensions to visualize the distribution variation that are meant to capture any meaningful of points to look for separate clouds of points.While this example uses just 2 patterns in the data (Cavanaugh and Sternberg 2004). dimensions, ANOPA works for more The algorithm for ANOPA is simple, and is summa- dimensions. rized in Figure 2. The method assumes we have a standard data set for cladistic analysis. That is, each species is given some value for each of a number of different traits. These traits can be continuous (for example, body size), ordinal (for example, number of vertebrae), or categorical (for example, presence or absence of a C) D) Calculate distance (a0) Draw histogram of these trait; or a number of different character states). Each from each point to the centroid. distances (a0), and look for trait represents a different (but not necessarily inde- distinct groups.The group pendent) dimension of variation among species. In furthest from the centroid is designated the ‘outgroup, X3. the visual illustration of ANOPA, we will assume just 2E).This is analogous to the first principle component axis in PCA, with a critical distinction. Whereas the first principle component axis seeks to explain the most variation in the data, the “relation vector”merely connects the centroid to the most distant point, or ‘out-group’.This distinction is illustrated in Figure 1B. Next, each data point is characterized in relation to FIGURE 2. (CONT.) E) F) Draw a vector X2 from the Each data point can be represent- this vector. Dropping a perpendicular line from each centroid to the outgroup mean. ed in 2-dimensions by determining: species down to the vector (Figure 2F), the point can the distance along the vector, t , and 0 be defined by 3 dimensions: 1) the distance along the the perpendicular distance to the relation vector from the centroid to the perpendicu- vector, d2 lar, t0, 2) the distance along the perpendicular line from the vector to the point, d2, and 3) the angle at which this line is rotated around the vector,ar,(Figure 3G). One can then create a scatter plot of any two of these axes, or a 3-D plot, and look for apparently dis- tinct groups.If multiple groups are identified by the 1- D ANOPA or 2-D scatterplots, C&S outline a way to G) A third dimension can be added by H) Now multi-dimensional data build confidence intervals around these plots (Figure determining the rotational angle (ar) of can be summarized as a 2- or 3- 3H). This involves calculating the standard deviation vector d2 around the vector t0 .This angle dimensinoal scatterplot (t0,d2 and ar). is calculated in reference to the vector Standard deviations of each for each morphological variable, building an ellipse variable are used to draw confidence from t0 to an arbitrarily chosen point. with length equal to 2 standard deviations along each intervals around each group identified by this scatterplot or at step 4. variable, then rotating this ellipse to fit the 3 ANOPA dimensions. C&S provide no quantitative criteria with which to evaluate the statistical significance of non- sample sizes and the evenness of sample sizes among overlapping or overlapping confidence intervals. groups. Part of the problem with the 1-D ANOPA is, of IS ANOPA A USEFUL METHOD? course, that one dimension is insufficient to capture Is ANOPA really a valid and useful method for analyz- variation in 2 or more dimensions (Figure 3). The ing patterns in morphological data? Wood and other problem is that distance from a centroid may Cavanaugh (2003) claim that “ANOPA is the best avail- not be the best single dimension to use.Consequently, able method to display biological character space and C&S propose an extension to the 1-D method: 3-D reveal taxonomic patterns”, in part because the ANOPA. This is more akin to the principle compo- method “requires no assumptions about the distribu- nents analysis, reducing numerous dimensions in the tion of the data and retain[s] more information regard- data down to three dimensions that can be plotted on ing dataset variation” than Principle Component a graph to search for visually distinct clusters.The first Analysis (Wood and Cavanaugh 2003). In fact,ANOPA step is to define an ‘out-group’. C&S advocate using requires a number of unstated assumptions and the histogram of disparity (Figure 2D) to pick out the retains less information than PCA.There are five major most divergent species or clump of species that can flaws with ANOPA: be designated as a morphological out-group.Note that this use of ‘out-group’ is quite distinct from what is 1) Lack of objective criterion for identifying meant in phylogenetic analysis. Rather than repre- discrete groups within the data. Judging by senting an ancestral character state to root character its development by baraminologists, the implic- state transitions along a phylogeny, C&S use the out- it goal of ANOPA is to identify groups or species group to define the first major axis of variation in the that are morphologically distinct from other multivariate cloud of data. groups (Cavanaugh and Wood 2002;Wood and Next, C&S draw a “relation vector” from the cen- Cavanaugh 2001, 2003;Wood and others 2003). troid to the point representing the outgroup (Figure Yet what ANOPA actually does is to reduce multi-dimensional data down to three dimen- sions. It does not provide an objective, quantita- tive method for identifying discrete groups. Instead,Cavanaugh and Sternberg state that “the practiced eye of an experience [sic] analyst could discern the patterns of relationships with- in a given data set” (p 158). Contrary to what their title implies, there is no part of the algo- rithm that statistically recognizes patterns in the data. It merely rotates the data, and patterns are evaluated by eye. FIGURE 3 2) Use of unordered data. Given continuous The 1-D ANOPA will fail to identify two distinct groups with equeal sample sizes. data (for example, body mass) the mathematical Because the centroid falls equidistiant between the groups, the distance measure rotations C&S use to calculate t , d , and ar are a0 will have a single mode. 0 2 mathematically sound. However, such data are rare in morphological cladistic data sets, which in their description of ANOPA is there any are often comprised of categorical data that are method for assessing statistical significance. arbitrarily coded with numbers, but could just In 1-D ANOPA, groups are identified by visual as easily have been represented by letters or any inspection of a histogram (Figure 2D), which other symbols. For instance, the Centrarchid has two problems. First, the number of peaks data set used by C&S includes a variable and distinct groups in histograms depends on describing the basihyal teeth in each species, one’s choice of how wide to make the bars.C&S coded as 0 (medial patch of teeth), 1 (no denti- advocate looking through a range of bar widths tion), or 2 (2 bilateral patches). Because the to search for distinct groups, an approach that three character states have no meaningful order, will often yield false positives. Second, low sam- it is meaningless to calculate an average value, ple sizes (as in the 22 species of Centrarchids or distances from that average. Since ANOPA used in their paper) will almost inevitably pro- depends on such distances, the method is inap- duce distinct peaks in any histogram merely as a propriate for unordered or categorical traits. result of random sampling effects.They provide C&S nevertheless apply the method to the no statistical method for testing the null hypoth- Centrarchid data set, which contains many such esis that different peaks in the histogram could arbitrarily ordered categorical traits. be generated by random sampling from a single To get around this problem, C&S point out distribution. that many unordered characters can be assigned There is a more serious attempt at statistical an order. For instance, a particular codon in a rigor in 2-D ANOPA,in which one can draw con- gene can call for any of 20 common amino fidence intervals around different sub-groups, acids. C&S suggest assigning an order to amino and see whether these intervals overlap. Two acids based on traits such as how hydrophobic problems emerge here. First, C&S calculate the they are. However, there are, in fact, a number intervals by forming an ellipse based on the of traits that might be used to order amino standard deviations of the two major ANOPA acids, which may give very different orders. axes.They claim “Due to the way that t0 and d2 Arbitrarily selecting one of these will potential- are calculated they are orthogonal, independent ly bias results, so there must be a clear criterion distances, thus have independent variances” for choosing orders for categorical traits. C&S (Cavanaugh and Sternberg 2004: p 164).This is offer no such criterion, instead making an not necessarily true, as independent axes can implicit assumption that the chosen criterion is still have dependent variances, so their confi- biologically the most meaningful one. dence intervals may be incorrect. Second, 3) Choice of out-group. One’s choice of out- ANOPA is supposed to “make no assumptions group has a major effect on ANOPA results about the distribution of the data” (Wood and because it determines the primary axis of varia- Cavanaugh 2003). The very concept of confi- tion in the resulting data rotation. C&S suggest dence intervals is predicated on either an using the species(s) furthest from the centroid assumption of some parametric distribution, or as the outgroup.This has the advantage of pro- is derived from a resampling routine such as viding a semi-objective way of choosing the out- bootstrapping. Without these, ANOPA’s confi- group,though the choice of whether to use one dence intervals are meaningless and formal sig- or more species is left to the user’s discretion. nificance tests impossible. C&S therefore fail to Unfortunately, there is no guarantee that a vec- outline any quantitative way to assess statistical tor from the centroid to the most-distant point significance. Note that they also ignore any will: 1) maximize the amount of variation effects of error or variance within species’ datasets. ANOPA’s statistical properties (that is, explained by the axis (as PCA does); or 2) max- consistency) have not been documented; its imize the discriminating power between groups performance has not been tested.Yet even if the (as discriminant function analysis does). C&S confidence intervals could be trusted, one must provide no justification for why the ‘relation be wary of any method that proposes to identi- vector’ is the best rotation for identifying mor- fy groups by visual examination of the data then phologically distinct groups,nor do they discuss carry out post-hoc statistical tests on those how sensitive their results are to one’s choice of groups. out-group.This is made all the more problemat- 5) Loss of information. According to Wood and ic by their decision to ignore their own objec- Cavanaugh (2003), one of the selling points of tive criterion for choosing an out-group. The ANOPA is that it “retain[s] more information out-group for their analysis of centrarchids fish- regarding dataset variation”(p 3) than Principle es was a hypothetical morphology based on Components Analysis. They have provided no questionable assumptions about ontogenetic formal mathematical proof that this is true, nor trajectories. simulations or examples showing that ANOPA 4) Lack of statistical rigor. Cavanaugh and can succeed where PCA fails. In fact, it is unlike- Sternberg claim to find “statistically significant ly that such a proof is possible for two reasons. groupings” within the Centrarchids (Cavanaugh First, there is no way to reduce highly multi- VOL 26, NR 42006 and Sternberg 2004)(p 154). It is unclear how dimensional data down to three variables with- REPORTS they reached this conclusion, since at no point out losing substantial amounts of information. 4 Second, Principle Components Analysis does exercise is. Yet there are hints at a creationist inter- not need to lose any information at all, pretation in the text, as in the statements that “the because one is not limited to 3 dimensions as is shape of taxic groupings simply transcends a branch- the case for ANOPA. In the example of PCA pro- ing pattern”(p 155),and “that centrarchids anatomical vided in Figure 1, there were two variables to data have an overall poor fit to tree structures”. Given begin with, yielding two principle component the author’s creationist viewpoint, it is likely that they (PC) axes. Every data point can be precisely interpret this lack of fit to reflect a non-evolutionary described in terms of those two PC axes; there history of centrarchids. is no loss of data unless one discards the PC It is true that centrarchids’ morphological variation axes that explain less variation (PC2 in the case is complex, with many homoplasies that complicate of Figure 1). Generalizing to multi-dimensional efforts to construct a robust phylogenetic tree. In a data,if there are M different variables describing recent paper, Near and others (2004) count 27 dis- each point, PCA can provide up to M different tinct phylogenetic hypotheses that have been pro- PCA axes.These M axes can still precisely locate posed at various times.This systematic inconsistency each data point, but are guaranteed to be inde- indeed suggests that anatomical data do not fit a tree pendent of each other (‘orthogonal’), unlike the structure very well. However, there are two possible original M variables.Data are only lost if one dis- explanations of this pattern. First, centrarchids may cards low-variance PCA axes. Admittedly, dis- not have arisen by a branching pattern of sequential carding low-variance axes is common practice, speciation events.While Cavanaugh and Sternberg do but it is neither necessary,nor likely to eliminate not explicitly endorse this point of view, their discus- one’s ability to distinguish distinctive clusters in sion points in that direction. However, it is important morphospace. C&S could achieve their same to note that a lack of sequential branching does not baraminological aims by examining each PC disprove , as rapid bursts of speciation and axis for bi- or multi-modal distributions, if they diversification are well known and can produce unre- deployed a suitable statistical test. solvable phylogenies (‘hard polytomies’).The alterna- tive explanation is that centrarchids really did follow INTERPRETATION OF ANOPA RESULTS a regular pattern of branching evolution,but that mor- When applied to the right kind of data, ANOPA is a phological traits have evolved so quickly that distant- mathematically sound way of rotating the data to ly related species gain similar trait values and so reduce it to three dimensions. However, data rotation appear similar for some traits, and dissimilar for oth- alone does not identify discrete groups or patterns in ers. Cavanaugh and Sternberg ignore this possibility. morphospace. Instead, Cavanaugh and Sternberg Recent analyses of DNA sequence data from 7 dif- (2004) rely on “the practiced eye of an experienced ferent genes for all 32 species of Centrarchidae pro- analyst” (p 158), making clear that ANOPA does not vide a very well-resolved phylogeny (Near and others provide an objective statistical tool for identifying 2004, 2005). Not only do individual genes provide morphologically distinct groups. Nonetheless, the strong support for a regular evolutionary branching, authors have applied it to several datasets to search but all seven genes analyzed by Near and others pro- for evidence of such discrete groups (Cavanaugh and vide similar topologies. This molecular data make it Sternberg 2004;Cavanaugh and Wood 2002;Wood and clear that branching evolution can nevertheless pro- Cavanaugh 2001). duce species that are morphologically poorly The first published application of ANOPA was on resolved, consistent with the second explanation list- morphological data for the flowering plant subtribe ed above. It is also worth noting that one of the early Flaveriinae (Wood and Cavanaugh 2001) and later the morphological phylogenies, based on osteological larger tribe Heliantheae as a whole (Cavanaugh and characters that are ecologically less functional and so Wood 2002). While both cases are clearly written likely to show less homoplasy, provided a branching from a biblical and young-earth creationist perspec- pattern identical to that found in Near and others tive, in neither case did the authors find strong evi- (Branson and Moore 1962). dence of distinct groups that might be identified as different holobaramins. This led to the curious con- CONCLUSIONS clusion that all 20 000 species of the family Asteracae If ANOPA were a sound method, it could be quite use- may be a single created kind that diversified after the ful for biologists interested in identifying disjoint sets Flood (Cavanaugh and Wood 2002).There is no clear of data points. For example, the technique might be discussion of the speciation or population genetic useful in distinguishing the ecological niches of dif- process that could lead to this diversity in the short ferent species. However, Cavanaugh and Sternberg time following the Flood. have given no convincing evidence that the tech- In contrast, the Journal of Biological Systems nique is preferable to existing statistical methods paper (Cavanaugh and Sternberg 2004) was obvious- (PCA,discriminant analysis,cluster analysis).Still more ly sanitized for a scientific review process, and troubling, ANOPA provides no objective method for appears with no discussion of any evolutionary or cre- identifying discrete groups, uses no statistical frame- ative processes. Instead, the authors focus purely on work for hypothesis testing, and is inappropriate for the existing pattern of morphological differences the categorical data to which Cavanaugh and JUL-AUG 2006 among species of centrarchids. Without any discus- Sternberg apply it. Finally, Cavanaugh and Sternberg REPORTS sion of the mechanisms that produce this pattern or interpret their results without careful consideration its implications, one wonders what the point of the of alternative hypotheses for their results. Integrating 5 FEATURE

in recent molecular results, it is clear that the patterns er taxonomic categories.Proceedings of the Biological Society of identified by Cavanaugh and Sternberg are consistent Washington 117:213–39. with traditional evolutionary explanations, and cen- Near TJ, Bolnick DI, Wainwright PC. 2004. Investigating phyloge- trarchids do not really “transcend a branching pattern netic relationships of the Centrarchidae (Actinopterygii: of evolution”. Perciformes) using DNA sequences from mitochondrial and nuclear genes. Molecular Phylogenetics and Evolution ACKNOWLEDGMENTS 32:344–57. The author thanks A Gishlick,B O’Meara,M Ryan,and B Spitzer for Near TJ, Bolnick DI,Wainwright PC. 2005. Fossil calibrations and comments. molecular divergence time estimates in centrarchid fishes (Teleostei: Centrarchidae). Evolution 59 (8): 1768–82. REFERENCES Wood TC, Cavanaugh DP. 2001.A baraminological analysis of sub- Behe M J, Snoke DW.2004. Simulating evolution by gene duplica- tribe Flaveriinae (Asteraceae: Helenieae) and the origin of biolog- tion of protein features that require multiple amino acid residues. ical complexity. Origins 52:7–27. Protein Science 13: 2651–64. Wood TC, Cavanaugh DP.2003.An evaluation of lineages and tra- Branson BA, Moore GA. 1962. The lateralis components of the jectories as baraminological membership criteria. Occasional acustico-lateralis system in the sunfish family, Centrarchidae. Papers of the Baraminology Study Group 2:1–6. Copeia 1962:1–108. Wood TC,Wise KP,Sanders R,Doran N.2003.A refined baramin con- Cavanaugh DP, Sternberg RV. 2004. Analysis of morphological groupings using ANOPA, a pattern recognition and multivariate cept.Occasional Papers of the Baraminology Study Group 3:1–14. statistical method: a case study involving centrarchid fishes. AUTHOR’S ADDRESS Journal of Biological Systems 12:137–67. Dan Bolnick Cavanaugh DP,Wood TC. 2002. A baraminological analysis of the Section of Integrative Biology tribe Heliantheae sensu lato (Asteraceae) using analysis of pattern One University Station, C0930 (ANOPA). Occasional Papers of the Baraminology Study Group University of Texas at Austin, VOL 26, NR 42006 1:1–11. Austin TX 78712 REPORTS Meyer SC. 2004.The origin of biological information and the high- [email protected] 6