Rutger Aldo Vos Doctorandus, Universiteit Van Amsterdam, 2000 THESIS SUBMITTED in PARTIAL FULFILLMENT of the REQUIREMENTS for TH
Total Page:16
File Type:pdf, Size:1020Kb
Rutger Aldo Vos Doctorandus, Universiteit van Amsterdam, 2000 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in the Department of Biological Sciences 0Rutger Aldo Vos, 2006 SIMON FRASER UNIVERSITY Summer, 2006 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. APPROVAL Name: Rutger Aldo Vos Degree: Doctor of Philosophy Title of Thesis: Inferring large phylogenies: The big tree problem Examining Committee: Chair: Dr. H. Hutter, Associate Professor Dr. A. Mooers, Assistant Professor, Senior Supervisor Department of Biological Sciences, S.F.U. Dr. W. Maddison, Professor Departments of Zoology and Botany, University of British Columbia Dr. B. Crespi, Professor Department of Biological Sciences, S.F.U. Dr. S. Graham, Assistant Professor Botanical Garden & Centre for Plant Research, and Department of Botany, University of British Columbia Public Examiner Dr. R. Page, Professor Division of Environmental and Evolutionary Biology, University of Glasgow External Examiner 12 May 2006 Date Approved SIMON FRASER UNI~ER~IW~~brary DECLARATION OF PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work. The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission. Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence. The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive. Simon Fraser University Library Burnaby, BC, Canada Phylogenetic trees are graph-like structures whose topology describes the inferred pattern of relationships among a set of biological entities, such as species or DNA sequences. Inference of these phylogenies typically involves evaluating large numbers of possible solutions and choosing the optimal topology, or set of topologies, from among all evaluated solutions. Such analyses are computationally intensive, especially when the pattern of relationships among a large number of entities is being sought. This thesis introduces two novel algorithms for the inference of large trees; one is applicable to the likelihood framework, the other to the Bayesian framework. Both approaches rely on the notion of a multi-modal tree 'landscape' through which inferential algorithms traverse. Using sampling techniques, the landscape can be perturbed sequentially, such that local optima can be evaded. The algorithms find good solutions in reasonable time, as demonstrated using real and simulated data sets. An example of large phylogeny inference is presented in the form of a novel estimate of Primate phylogeny - the largest estimate for this Order to date. The phylogeny is based on previously published smaller phylogenies, and hence serves as a summary of the present state of Primate phylogeny. In addition to this 'supertree's' topology, composite estimates of divergence are provided also. These estimates are based on multiple, clock-like genes combined using a novel approach presented here. Handling sets of trees and sequences poses practical problems in terms of conversion of data and the interoperation between computer programs. The thesis therefore concludes with a chapter discussing suitable data structures and programming patterns for phylogenetics. The appendix discusses an implementation of some of these concepts in an object-oriented application programming interface. Keywords: Phylogenetics; maximum likelihood; Bayesian systematics; MRP supertrees; divergence date estimation; tree data structures Voor mijn grootouders en ouders. I dedicate this dissertation to my parents and grandparents. Tweedledum looked round him with a satisfied smile. "I don't suppose, " he said, "there 'I1 be a tree left standing, for ever so far round, by the time we 'vefinished!" -Lewis Carroll, Through the Looking-Glass I am very happy to introduce this thesis by expressing my gratitude to the many people who have helped me on the way. I am thankful to my family and friends, here and in the Netherlands, for their encouragement and support. In particular, I would like to express my gratitude to Arne Mooers, my senior supervisor. Arne was the one to suggest I come to Vancouver, and he has been extremely supportive since. Thank you for guiding me through this stage of my academic career; you are a great supervisor, scientist and friend. I thank Bernie Crespi for his supervision, his generous offer to work in his lab and the great feedback he dispenses in the lab or over cinnamon chicken at his house. I am grateful to Wayne Maddison, for moving to Vancouver so that I could pick his brains. It is with great pleasure that I now move on to the next stage in my academic career where I get to work in the Maddison lab. I thank Rod Page, the external examiner, Sean Graham, the public examiner, and Harald Hutter, the chair at my defence, for their kind contributions of time and effort in evaluating my research. I thank Marlene Nguyen, Barbara Sherman and Penny Simpson, who have been very helpful with the logistics of writing a thesis. I am very grateful to all the wonderful people in the lab where I did much of the work in writing this thesis: Jeff, Christine, Patrik, Mick, Steve, Sampson, Martin and many others whose companionship and cleverness I have had the privilege to enjoy. That lab, the Crespi lab, is part of a group of labs interested in evolution: the FAB* lab. Being part of this group has been a great academic experience; the discussions, during or outside of the lab meetings, have taught me many things and given vi i me many new ideas. The faculty members, Felix Breden, Arne Mooers, Bernie Crespi and Mike Hart are all very supportive, approachable and with an inspiring passion for science. I thank them all for creating this environment. My research has been funded by Simon Fraser University, through Graduate Fellowships and the President's Research Stipend. In addition, I was funded by NSERC, the Society of Systematic Biology, and by CIPRES, the research project in which I will now continue my academic career. I am very grateful to these institutions for making my research possible. ... Vlll .. Approval ............................................................................................................................n ... Abstract .............................................................................................................................ru Dedication .......................................................................................................................... v Quotation ........................................................................................................................ vi .. Acknowledgements ......................................................................................................... vu Table of Contents ............................................................................................................. ix ... List of Figures .................................................................................................................xm List of Tables .................................................................................................................. xiv CHAPTER I .General Introduction .............................................................................. 1 History of Phylogenetics................................................................................................. 2 Present Paradigms ..................................................................................................... 7 Why Study Phylogenetics? ........................................................................................... 10 The Big Tree Problem ................................................................................................. 12 This Dissertation ........................................................................................................... 13 References ..................................................................................................................... 15 CHAPTER I1 .Accelerated Likelihood Surface Exploration: the 'Likelihood Ratchet' .......................................................................................................................