phylogeny: false positives detected in a gene sequencing study

David Peters*

Traditionally a matrix of taxa and physical traits provides data for phylogenetic analysis. In recent years gene sequencing has taken on a dominant role. Ideally both methods should recover identical family trees that model evolutionary events. Too often they do not. While DNA analysis has proven its validity within genera (e.g. criminal identification), here a competing morphological analysis (the only method that can include ) finds several false positives in the results of a recent gene sequencing study of crown clade . Unfortunately gene studies have to rely on the hope that they will recover a series of taxa with a gradual accumulation of physical traits that model evolutionary events—without using those physical traits. Based on this benchmark and the present results, it is inappropriate to circumvent direct observation with gene sequencing in bird studies, at least until gene sequencing study results mirror those based on morphology.

The phylogenetic interrelationships of crown clade birds have proven to be so frustrating that molecular methods have been employed to circumvent the need for direct observation. Several recent bird phylogeny studies have constructed family trees based on the results of using gene sequencing [1–5]. Unfortunately, results too often nest genera together that share too few physical traits (see below), and similar genera nest apart from each other. That leaves workers unable to model evolutionary events with any sense of logic and continuity. Ideally we’d like to see a gradual accumulation of derived traits attend each taxon in a cladogram, and we’d like to include taxa. That only becomes possible with a physical analysis of bone, shell and other preserved traits. At present a certain amount of faith attends gene sequencing, a hope that similar genes will translate to the appearance of similar body parts and proportions. Too often, they do not. While DNA testing has proven its validity within genera (in crime labs and ancestry searches),

“inaccuracies in the genome sequence at even a tiny fraction of genes can produce false-positive signals, which make it difficult to identify loci that have genuinely been targets of selection” [6].

Another problem arises from taxon exclusion [7]. Here fossils come into play as outgroup taxa can be extinct and therefore unable to provide data for genetic studies. Without the phylogenetic framework provided by extinct taxa, workers are forced to ‘choose’ an extant outgroup taxon, sometimes one so distant that it shares few traits with ingroup taxa.

Here I test a recent study of bird interrelationships based on genomic sequencing [1] against a competing analysis (described below) based on physical traits. Similarities and differences in the two tree topologies are reported. The method of using physical traits in cladistic analysis is found to better nest physically similar taxa with one another and so illuminate and illustrate evolutionary relationships in every tested detail. All tested taxa document a gradual accumulation of derived traits back to stem tetrapods.

The competing gene sequencing study of birds [1] was not able to match or approximate the topology of the present study based on morphological traits and employing fossil taxa, which uses an unbiased algorithm to nest physically similar taxa together throughout the dataset.

The tested gene analysis of bird interrelations [1] employed a wide variety of 198 in-group (bird) taxa and two out-group (crocodilian) taxa. According to the authors, virtually all clades were supported with a posterior probability of 1.0.

To test if any interrelations were false positives, a competing morphological analysis, slowly building online at www.ReptileEvolution.com, currently employs 100 in-group (Neornithes) taxa and 1065 out- group taxa. These extend back to Devonian stem tetrapods. Such a large taxon list minimizes the possibility of bias regarding taxon exclusion [7] or inappropriate taxon inclusion. Commonly known as the large reptile tree (LRT), this wide-gamut study is nearly fully resolved with high Bootstrap scores at virtually every node (when tested in overlapping subsets due to limitations in computational power; Fig. 1). Only generally visible skeletal data are employed as character traits (i.e. none specific to birds). Despite this appraoch and the high level of homoplasy (convergent traits) present in the 1165 taxa, a gradual accumulation of traits is documented at every node. Examining these small changes leads to an understanding of the patterns of evolution that produced all derived taxa. Distinct from the genomic study [1] that chose crocodilians as outgroup taxa, here the software recovered extinct avian out-group taxa for the crown clade of extant birds, Neornithes. A few extinct taxa happened to nest within this clade, some as far back as the Early .

Starting off with several points of agreement, both competing studies nest closely related taxa together.

Ducks nest with ducks, nest with hummingbirds, etc. Palaeognaths nest with one another in both analyses, but the order of palaeognaths is reversed in the gene study [1].

Unreasonably, the gene sequencing study [1] recovers the giant, flightless, two-toed ostrich (Struthio) at the base of all extant birds and at the base of all extant paleognaths. More reasonably, the LRT recovers the sparrow-sized, but long-legged and four-toed Pseudocrypturus (Paleocene/Eocene) as the last common ancestor of all tested Neornithes. According to the results of the LRT, Pseudocrypturus was a late survivor of an Early Cretaceous radiation [12–14], as demonstrated by the nesting of small, toothed, and four-toed

Hongshanornis and Longicrusavis (both Early Cretaceous [9]) within the clade of crown clade of extant birds (Fig. 1).

Unreasonably the gene study [1] nests chickens (Gallus), ducks (Anas), screamers (Chauna) and their allies

(including the scrubfowl, Megapodius), at the base of their . Given such distinct anatomies, ducks and chickens cannot be closely related and neither should arise from an ostrich relative, as recovered in the gene study. More reasonably, the LRT study nests the -like, long-legged scrubfowl

(Megapodius), at the base of all toothless neognaths. Skipping one clade, screamers do indeed nest at the base of the chicken/sparrow/ clade (see below). Much more distantly, extant ducks nest with the extinct stilt-legged, duckbill, Presbyornis (Palaeocene–Eocene); the extant, stilt-legged, wading spoonbill (Platalea); and the extant stilt-legged, wading ibis (Threskiornis) in order of increasing distance.

These last two extant taxa were missing from the gene study [1], and, of course, so was the extinct

Presbyornis.

Unreasonably the gene study [1] nests the (Phoenicopterus) with the dissimilar (Rollandia), and nests the grebe far from the similar (Gavia). More reasonably the LRT nests the flamingo with the similar, stilt-legged, hook- seriema (Cariama) and the LRT nests the splay-legged, diving grebe

(Aechmorophus) with the splay-legged, diving loon. A recent author [10] attempted to defend the dissimilar flamingo/grebe pairing by noting that only and have 23 presacral vertebrae and ten other shared traits. In the LRT, no other tetrapod shares more of its 231 tested traits with grebes than do.

No other taxa share more traits with flamingos than seriemas do. Evidently flamingos evolved 23 presacral vertebrae with grebes by convergence. Later the same author [11] unreasonably considered the stilt-legged, extinct Palaelodus ( to Pleistocene) a transitional taxon between grebes and flamingos. More reasonably, the LRT nests Palaelodus at the base of the ostrich (Struthio) clade, close to the stilt-legged seriema and flamingo clade (Fig. 1) and far from diving loons and grebes.

Unreasonably the gene study [1] nests the hook-beak, but herbivorous parrot (Nestor) with the hook-beak, but predatory falcon (Falco), pulling the falcon away from other predatory birds. More reasonably the LRT study nests the parrot (Ara) close to the similar, thick-billed, herbivorous hoatzin (Opisthocomus) and the falcon nests with the vulture (Torgos), the osprey (Pandion) and the hawk (Buteo).

Unreasonably the gene study [1] nests a straight-beaked (Archilochus) with a hook-beaked swift (Hemiprocne). More reasonably the LRT nests this hummingbird with a pre-hummingbird from the

Eocene (Eocypselus). The LRT nests another large-eyed, hook-beak swift (Apus) with a similar large-eyed, hook-beak (Tyto).

Unreasonably the gene study [1] recovers the sparrow (Passer) with the lyrebird (Menura) and the crow

(Corvus). More reasonably the LRT nests the neotonous sparrow between two similar, but larger seed- eaters, the chicken (Gallus) and the hoatzin. The long-legged, omnivorous lyrebird nests in another clade, between the long-legged, omnivorous trumpeter (Psophia) and the long-legged, omnivorous roadrunner

(Geococcyx). The crow nests in yet another clade between the similar grackle (Quiscalus) and the similar blue jay (Cyanocitta), two extant taxa omitted by the gene study [1].

A pattern arising from the LRT nests many, but not all, of the stilt-legged at the base of the

Neornithes. As some became larger, like and , they retained stilt-legs. Others, like predatory birds, sparrows, and grackles, were smaller, had shorter legs and became more arboreal. Since juveniles of stilt-legged birds have shorter legs, short-legged taxa were probably neotonous. Diving and swimming puffins and auks arise from soaring petrels. Diving and swimming murres and arise from diving kingfishers, which have , loons and grebes in their family tree. The stilt-legged ancestors of ducks kept their stilt legs through many clades, until finally evolving into short-legged ducks.

It is noteworthy that penguins nest in the LRT as highly derived neornithes, with a long list of ancestors, but the most primitive penguins [22], lived during the early Paleocene 61 mya. That argues for a more gradual evolution of neornithes during the entire Cretaceous [15], rather than a rapid radiation following the K-T extinction event [10].

It should be noted that three competing morphological analyses [12–14] also nested ducks, chickens, screamers, scrubfowl and their allies (nominal clade: ‘Galloanseres’) at the base of their Neognathae. (See above for counterarguments). A fifth morphological analysis [19] did not support a duck/chicken relationship, but nested ducks with storks, flamingos, shorebirds, gulls and auks. That’s unacceptably vague. Adding and deleting taxa in phylogenetic analysis are two methods for testing results. If robust, a tree topology should not change when branches are added or chopped off. Deletion of the toothed

Hangshanornis clade has no affect on the rest of the LRT. Deletion of the palaeognaths also changes nothing else.

The LRT presents a logical and continuous hypothesis of evolutionary events for birds and all included taxa. The competing gene study [1] nests dissimilar taxa together.

Materials and methods Lacking firsthand access to most of the specimens employed here, published photographs and drawings provide most of the data used in the present analysis. Traditionally firsthand access has been a stringent requirement in , but taxon exclusion and false positive results are the larger problems here.

This wide-gamut list of 1165 generic, specific and specimen-based taxa minimizes bias and tradition in the process of selecting in-group and out-group taxa for smaller, more focused studies because all major and many minor clades are established here (SuppData). Thus, exclusion problems that arose from less inclusive smaller studies are minimized here.

No characters used in the LRT are specific to the bird clades and their theropod ancestors. Although some characters are similar to those from various prior analyses, the present list (see below) was largely built from scratch. Characters were chosen or invented for their ability to universally lump and split clades, and for their visibility in a majority of taxa. Up to this point, the 231 multi-state character set has proven sufficient to lump and separate virtually all of the 1165 included taxa, typically with high Bootstrap scores.

In the past certain workers considered 231 characters too small for the number of taxa—when the taxon list was a quarter of the size it is now. Complete resolution in the LRT tree and high Bootstrap scores generally falsify any blackwashing levied against the present character list and data scoring. The present hypothesis of reptile interrelations continues to be verified whenever pertinent taxa are included in independent studies. For all of its faults, the LRT continues to work well as more taxa are added every week.

Taxa and characters were compiled in MacClade 4.08 [20], then imported into PAUP* 4.0b [21] and analyzed using parsimony analysis with the heuristic search algorithm. All characters were treated as unordered and no character weighting was used. Bootstrap support figures were calculated for 100 replicates. All taxon subsets of the LRT raise the character/taxon ratio. The cladogram, character list and data matrix are available here: http://www.ReptileEvolution.com/reptile-tree.htm, in Supp. Data, and in permanent repository here: http://www/Treebase.org/xxxxx; and here http://www.DataDryad.org/xxxxxx

(to be completed when the ms. is accepted). Several taxon deletion tests were performed to recover taxonomic affinities in the absence of proximal taxa recovered by the LRT.

References cited

1. Prum, R. O. et al. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526, 569–573 (2015).

2. Hackett, S. J. et al. A phylogenomic study of birds reveals their evolutionary history. Science 320, 1763–

1768 (2008).

3. Ericson, P. G. P. et al. Diversification of : integration of molecular sequence data and fossils.

Biol. Lett. 2, 543–547 (2006).

4. McCormack, J. E. et al. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoSONE 8, e54848 (2013).

5. Mayr, G. Metaves, Mirandornithes, Strisores and other novelties—a critical review of the higher-level phylogeny of neornithine birds. J. Zoological Syst. Evol. Res. 49: 58–76 (2011).

6. Mallick, S. et al. The difficulty in avoiding false positives in genome scans for natural selection. Genome

Res. 19: 922-933 (2009).

7. Graybeal, A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47:9–17

(1998).

8. Mayr, G. Morphological evidence for sister group relationship between flamingos (Aves:

Phoenicopteridae) and grebes (Podicipedidae). Zoo. J. Linn. Soc. 140: 157–169. (2004).

9. Mayr, G. The contribution of fossils to the reconstruction of the higher-level phylogeny of birds. Species,

Phylo. Evo. 1:59–64 (2006).

10. Ksepka, D. T. et al. Early Paleocene landbird supports rapid phylogenetic and morphological diversification of crown birds after the K-Pg mass extinction. www.pnas.org/cgi/doi/10.1073/pnas.1700188114

11. Chiappe, L. M. Basal bird phylogeny: Problems and solutions. Chapter 20 in Mesozoic birds above the head of . (Chiappe, L. M. and Witmer, L. M. eds.) U. Calif. Press (2002).

12. Mayr, G., Clarke, J. The deep divergences of neornithine birds: a phylogenetic analysis of morphological characters. Cladistics 19:527–553. (2003). 13. Livezey, B. C., Zusi, R. L. Higher-order phylogeny of modern birds (, Aves: Neornithes) based on comparative anatomy. II. Analysis and discussion. Zoo. J. Linn. Soc. 149:1–95 (2007).

14. Lee, M. S. Y., Cau, A., Naish, Dyke, G. J. Morphological Clocks in Paleontology, and a Mid-

Cretaceous Origin of Crown Aves. Syst. Bio. Oxford Js. 63: 442–449 (2014).

15. Cooper, A., Penny, D. Mass survival of birds across the Cretaceous-Tertiary boundary: molecular evidence. Science 275:1109–1113.( 1997).

16. Haddrath, O., Baker,A.J. Multiple nuclear genes and retroposons support vicariance and dispersal of the palaeognaths, and an Early Cretaceous origin of modern birds. Proc. R. Soc. Lond. B.

279:4617–4625. (2012).

17. Hedges S.B. Poling L.L. A molecular phylogeny of reptiles. Science, 283 pp. 998-1001 (1999).

18. Stanhope MJ et al. Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals. Proc. Nat. Acad. Sci. 95 (17): 9967–9972 (1998).

19. Ericson PGP, Parsons TJ, Johansson US. Morphological and molecular support for nonmonophyly of the Galloanseres. In: Gauthier, J., Gall, L.F. (Eds.), New Perspectives on the Origin and

Early Evolution of Peabody. Museum of Natural History Birds, New Haven, CT, pp. 157–168 (2001).

20. Maddison, D. R. and W. P. Maddison. MacClade 4: Analysis of Phylogeny and Character Evolution.

Sinauer Associates, Inc., Sunderland, MA. (1990).

21. Swofford, D. PAUP*: Phylogenetic Analysis Using Parsimony (*And Other Methods). Version 4.0b10.

Sinauer Associates, Inc., Sunderland, MA. (2002).

22. Mayr, G. et al. A new fossil from the mid-Paleocene of New Zealand reveals an unexpected diversity of world's oldest penguins, Sci. Nature (2017). DOI: 10.1007/s00114-017-1441-0 (2017)

Figure 1. Subset of the large reptile tree (LRT, 1165 taxa; www.ReptileEvolution.com/reptile-tree.htm) focusing on the clade Neornithes (gray background) and their proximal ancestors.

90 (Wellnhoferia) grandis 90 Zhongornis 88 Confuciusornis Changchengornis Archaeornithura 78 Vegavis 93 100 Pseudocrypturus Apteryx 92

59 Rhynchotus 80 Casuarius 64 Eudromia 60 Palaelodus 67 71 Aepyornis 71 Patagopteryx Struthio

88 Dingavis Hongshanornis 91 Longicrusavis 67 98 Changzuiornis Yanornis 91 98 Apsaravis 98 79 99 Gansus Hesperornis Megapodius <50 95 Cariama

68 Phoenicopterus 93 Llallawavis 93 Sagittarius Pandion 87 94 Falco 93 82 Torgos 92 Buteo 90 Apus Tyto

87 Ardeotis 95 Ciconia 70 81 Butorides Ardea 71 89 Psophia 92 Menura 83 Geococcyx 69 Monias Coccyzus

87 Anhima 89 Fulica Crex 93 89 Gallus Eogranivora 95 Passer 100 97 Macrocephalon 59 100 Opisthocomus

100 Dinornis 95 Ara macao Gastornis

54 Oedicnemus Quiscalus 61 Corvus 67 Cyanocitta 63 98 Melanerpes Sitta 79 57 Hirundo 80 90 Cinclus Troglodytes

Scopus <50 98 Balaeniceps Pelecanus <50 98 Pelagornis 81 Morus 82 Macronectes <50 Pinguinus 97 Fraterculus 58 82 Aptornis 98 Pezophaps 68 Raphus 51 Coragyps 84 94 Columba Caloenas

100 Buceros 79 Pteroglossus Cyrilavis 99 Septencoarcias 69 90 97 Phaethon Psilopogon Threskiornis 73 Platalea 79 85 Presbyornis 87 Mergus 90 Anas Anser Aramus 93 Chroicocephalus 62 91 Eocypselus 68 Archilochus Himantopus 65 Charadrius 63 Eurypyga 96 <50 Grus

94 Rhynochetos

65 Thalasseus 99 Gavia 68 Aechmophorus 99 Jabiru 57 Megaceryle 64 Pumiliornis 75 Uria Aptenodytes