Complecto Errorem: Embracing Uncertainty in Molecular Phylogenetic Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Complecto errorem: Embracing Uncertainty in Molecular Phylogenetic Analysis by Joseph W. Brown A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Ecology and Evolutionary Biology) in The University of Michigan 2010 Doctoral Committee: Associate Professor Yin-Long Qiu, Co-Chair Professor David P. Mindell, California Academy of Sciences, Co-Chair Assistant Professor Inés Ibáñez Assistant Professor Aaron A. King If we view statistics as a discipline in the service of science, and science as being an attempt to understand (i.e., model) the world around us, then the ability to reveal sensitivity of conclusions from fixed data to various model specifications, all of which are scientifically acceptable, is equivalent to the ability to reveal boundaries of scientific uncertainty. When sharp conclusions are not possible without obtaining more information, whether it be more data, new theory, or deeper understanding of existing data and theory, then it must be scientifically valuable and appropriate to expose this sensitivity and thereby direct efforts to seek the particular information needed to sharpen conclusions. Donald B. Rubin (1984) The Annals of Statistics 12: 1151-1172 We must confine ourselves to those forms which we know how to handle, or for which any tables which may be necessary have been constructed. More or less elaborate form will be suitable according to the volume of the data. R. A. Fisher (1922) Philosophical Transactions of the Royal Society 222: 309-368 ‘...And then comes the grandest idea of all! We made a map of the country, on the scale of a mile to the mile!’ ‘Have you used it much?’ I enquired. ‘It has never been spread out yet,’ said Mein Herr, ‘The farmers objected: they said it would cover the whole country and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.’ Lewis Carroll (1893) Sylvie and Bruno Concluded I went on to test the program in every way I could devise. I strained it to expose its weaknesses. I ran it for high-mass stars and low-mass stars, for stars born exceedingly hot and those born relatively cold. I ran it assuming the superfluid currents beneath the crust to be absent -- not because I wanted to know the answer, but because I had developed an intuitive feel for the answer in this particular case. Finally I got a run in which the computer showed the pulsar’s temperature to be less than absolute zero. I had found an error. I chased down the error and fixed it. Now I had improved the program to the point where it would not run at all. George Greenstein (1984) Frozen Star: Of Pulsars, Black Holes and the Fate of Stars I love deadlines. I like the whooshing sound they make as they fly by. Douglas Adams © Joseph W. Brown 2010 To My Wife, Rachel ii Acknowledgements Many people facilitated directly or indirectly the completion of this work. I am grateful for my dissertation committee, both past and present, for encouragement throughout: Thomas Duda, Philip Gingerich, Inés Ibáñez, Aaron King, David Mindell, Robert Payne, and Yin-Long Qiu. Tom, in particular, deserves special thanks for agreeing to serve as my on-site supervisor for the past few years; I regret that the timing of my ultimate defense conflicted with his schedule. I am indebted to Inés and Yin-Long for coming to my support in the eleventh hour (okay, 27th hour, but who is counting?). I wish I had taken advantage of the outside perspectives and modelling expertise of Aaron and Inés much earlier; the work presented here would have been the better for it. I was able to discuss the ideas contained herein with numerous thoughtful people: Jonathan Bollback, David Cutler, Alexei Drummond, Doug Futuyma, Alan Gelfand, Oliver Haddrath, John Harshman, Simon Ho, Jeff Johnson, Paul Joyce, Aaron King, Karl Kjer, Vladimir Minin, Tom Near, Yin-Long Qiu, Michael Sanderson, Elliott Sober, Alexandros Stamatakis, David Swofford, Jeff Thorne, Marcel van Tuinen, Ziheng Yang, and Derrick Zwickl. These individuals served in several capacities, providing corrections, advice, information, data, encouragement, source code, or simply acting as a sounding board so that I could think out loud. I would like to acknowledge the support of my collaborators over the past 5 years: Ed Braun, Jaime García-Moreno, Briar Howes, Rebecca Kimball, Jeff Johnson, David Mindell, Robert Payne, Michael Defoin Platel, Yin-Long Qiu, Josh Rest, Michael Sorenson, Ulf Sorhannus, and Marcel van Tuinen. I thank Shing Hei Zhan for correcting my Latin grammar in the title of this dissertation. Thanks to Clay Cressler for assistance in plotting in R. I am particularly indebted to two people. Paul Lewis, in addition to inspiring me to pursue molecular phylogenetics in the first place lo those many moons ago, provided thoughtful sentiments regarding statistical model selection criteria. Not only did Paul first introduce me to the model selection approach of Alan Gelfand and Sujit Ghosh (1998) (which unfortunately did not find its way in the present dissertation), but he also provided iii some unpublished python source code that I was able to rewrite in C++ for my own program. No less deserving of praise, Bob Payne considerately read everything I have ever written, and in each instance offered sound advice for improvement (in particular, turning a shrill condemnation into a publishable scholarly ‘comment’). Bob also furnished me with my first pair of binoculars, which in many ways altered my life trajectory. Programming and computational phylogenetics are lonely pursuits. I thank the following for their steadfast companionship over the course of the past 5 years: Ron Asheton, Scott Asheton, Dave Alexander, David Bowie, Nick Drake, Robert Fripp, Kim Gordon, Polly Jane Harvey, Georgia Hubley, Ira Kaplan, György Ligeti, James McNew, Tim Minchin, Thurston Moore, Robert Pollard, Iggy Pop, Sun Ra, Lee Ranaldo, Lou Reed, Trent Reznor, Arnold Schoenberg, Steve Shelley, Dean Ween, Gene Ween, Frank Zappa, and especially, above everyone else, Mr. Don Van Vliet. I would surely have gotten deported, audited, and/or thrown out of the program if it were not for the remarkable staff of the University of Michigan Museum of Zoology (UMMZ) and the Department of Ecology & Evolutionary Biology: Christy Byks- Jazayeri, Norah Daugherty, Beverley Dole, Janet Griffin-Bell, Janet Hinshaw, Gail Kuhnlein, Vlad Miskevich, Robbin Murrell, and Fritz Paper. I get easily confused by even simple paperwork; thank you all for your patience and for walking me through things. I am eternally beholden to the EEB graduate student coordinators that have helped my over the years: LaDonna Walker, Julia Eussen, and Jane Sullivan; Goddesses, all. Whatever salary they make, it is way too little. The content of this dissertation was conceived, developed, analyzed, and written using Apple® computers. Part of this work (specifically, replicate MrBayes analyses for chapters 4 and 5) was carried out by using 1) the resources of the Computational Biology Service Unit from Cornell University (partially funded by Microsoft Corporation), 2) the Genomic Diversity Laboratory at UMMZ, and 3) the Center for Comparative Genomics PhyloCluster at the California Academy of Sciences. I thank Brian Simison at the California Academy of Sciences for granting me the privilege of acting as a beta tester on the PhyloCluster. Programming and scripting was composed using BBEdit (Bare Bones Software®) and compiled using the GNU Compiler Collection (gcc; Free Software iv Foundation©). Figures not involving trees were plotted in R and/or Adobe Illustrator. Twinings® kept my cranium fueled and saturated with the delicious electrolytic solution of bergamot and black tea. How many cups of tea does it take to earn a PhD? 11,750. All funding for this work was provided internally through various sources within the University of Michigan: University of Michigan Museum of Zoology, Department of Ecology & Evolutionary Biology, Rackham Graduate School, and the International Center. I am particularly thankful to Rackham for providing the very generous Predoctoral Fellowship which has provided all of my funding for the past year. Graduate students with families do not have much time for extracurriculars. I don’t think I could have better spent what little time I had available than with my friend George Kulesza. A PhD in ornithology I learned, at least. Surely there does not exist a more devoted, patient teacher. I count it amongst my highest achievements that you regard me as one of your ‘successful’ students. Finally, I must acknowledge my family, the ultimate in extracurricular activities. Calliope and Nova are undoubtedly the coolest kids alive, and provide unlimited amusement. And lastly, my wife, Rachel. You have been around since this roller-coaster of graduate school began, and have been my supporter through the (all too infrequent) highs and the (all too frequent) lows. v Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Part I. Temporal Calibration of the Avian Tree of Life Chapter 1. Extraction of phylochronological information from molecular sequence 1 data: evolution of our understanding of the ‘evolution of the rate of evolution’ Chapter 2. Nuclear DNA does not reconcile ‘rocks’ and ‘clocks’ in Neoaves: a 30 comment on Ericson et al. (2006) Chapter 3. Strong mitochondrial DNA support for a Cretaceous origin of modern 44 avian lineages Part II. Statistical Arbitration of Competing Molecular Phylogenetic Models Chapter 4. Treatment of branch length parameters in partitioned phylogenetic 92 models: an empirical case study with New World Vultures (Aves: Cathartidae) Chapter 5. All models are wrong but some are useful: identifying useful 130 partitioned models for phylogenetic inference via posterior predictive approaches vi List of Tables Table 1.1 A comparison of available programs for the dating of non-clocklike molecular genetic 17 sequences.