(And Tertiary) Structure of the ITS2 and Its Application for Phylogenetic Tree Reconstructions and Species Identification
Total Page:16
File Type:pdf, Size:1020Kb
Secondary (and tertiary) structure of the ITS2 and its application for phylogenetic tree reconstructions and species identification vorgelegt von Dipl. Biol. Alexander Keller Würzburg, 2010 Kumulative Dissertation zur Erlangung des naturwissenschaftlichen Doktorgrades (Dr. rer. nat.) der Bayerischen Julius-Maximilians-Universität Würzburg Einreichung: in Würzburg Mitglieder der Promotionskommission: Vorsitzender: Prof. Thomas Dandekar 1. Gutachter: Prof. Thomas Dandekar 2. Gutachter: Prof. Ingolf Steffan-Dewenter Promotionskolloquium: in Würzburg Aushändigung Doktorurkunde: in Würzburg iii TABLE OF CONTENTS Acknowledgements........................................ vii Summary..............................................viii Zusammenfassung ........................................ ix I General Introduction1 II Materials and Methods9 1 Materials 11 2 Bioinformatic tools 13 2.1 Annotation Tool....................................... 13 3 Bioinformatic approaches 15 3.1 HMM-Annotation...................................... 15 3.2 Secondary Structure Prediction.............................. 15 3.3 Tertiary Structure Prediction............................... 16 4 Phylogenetic procedures 17 4.1 Alignments.......................................... 17 4.2 Substitution model selection................................ 17 4.3 Tree reconstructions.................................... 18 4.4 CBC analyses........................................ 18 4.5 Tree viewers......................................... 18 5 Simulations 21 5.1 Simulations......................................... 21 5.2 Datasets........................................... 23 5.3 Robustness and Accuracy................................. 23 III Results 25 6 Contributions to the methodological pipeline 27 P.1 5.8S-28S rRNA interaction and HMM-based ITS2 annotation............. 28 P.2 The ITS2 Dababase III-sequences and structures for phylogeny........... 38 v 7 Evaluation of secondary structure phylogenetics 45 P.3 Including RNA secondary structures improves accuracy and robustness in recon- struction of phylogenetic trees.............................. 46 8 Phylogenetic case studies 59 P.4 ITS2 data corroborate a monophyletic chlorophycean DO-group (Sphaeropleales) 60 P.5 ITS2 sequence-structure phylogeny in the Scenedesmaceae with special reference to Coelastrum (Chlorophyta, Chlorophyceae), including the new genera Comasiella and Pectinodesmus ...................................... 74 P.6 Internal transcribed spacer 2 (nu ITS2 rRNA) sequence-structure phylogenetics: Towards an automated reconstruction of the green algal tree of life......... 86 P.7 ITS2 secondary structure improves phylogeny estimation in a radiation of blue butterflies of the subgenus Agrodiaetus (Lepidoptera: Lycaenidae: Polyommatus). 124 9 Secondary structure Phylogenetics in Ecology 153 P.8 Ant-flower networks in Hawai’i: native plants are exploited, introduced plants defended........................................... 154 P.9 Composition of epiphytic bacterial communities differs on flowers and leaves.. 210 10 Future prospects of structural phylogenetics 229 P.10 Ribosomal RNA phylogenetics: the third dimension.................. 230 IV General Discussion and Conclusions 235 V Bibliography and additional Information 245 Bibliography 247 Acronyms 273 List of Figures 275 List of Tables 279 Curriculum Vitae 281 List of Publications 283 Thesis Statistics 285 Erklärung 291 vi ACKNOWLEDGEMENTS Preceding the actual thesis, I want to acknowledge the support of many persons that directly or indirectly enabled me to manage the projects, and of course lately the final writing. First to mention are my supervisors, Prof. Dr. Thomas Dandekar, Dr. Matthias Wolf, Prof. Dr. Jörg Schultz and Dr. Tobias Müller, who designed interesting and prosperous projects for my graduate studies and were always helpful, where needed. Further, I would like to thank all the other companions of our working group on the way to the PhD: these were Dr. Frank Förster, Tina Schleicher, Christian Koetschan, and the bachelor students Benjamin Merget and Gregor Schalk. The remaining members of the Department of Bioinformatics can be added to this list for all their fruitful discussions and useful ideas. In addition to these persons of our department, I want to thank all my external collabo- rators, which are Prof. Dr. Mark Buchheim, Dr. Martin Wiemers, Dr. Eberhard Hegewald, Robert Junker, Dr. Nico Blüthgen, Christina Loewel, Prof. Dr. Curtis Daehler, Dr. Stefan Dötterl, Prof. Dr. Thomas Friedl, Prof. Dr. Roy Gross, Prof. Dr. Lothar Krienitz, Katrin Schlegelmilch, and Prof. Dr. Norbert Schütze, for all their efforts and their investments into our collaborative projects. Furthermore, I would like to especially thank Prof. Dr. Ulf Sorhan- nus as a collaborator and as a very obliging host (together with Prof. Dr. Marty Mitchell and Linda Kightlinger) during my visit at the Edinboro University of Pennsylvania. Many thanks for agreeing to review this thesis to my universitary advisory committee Prof. Dr. Thomas Dandekar and Prof. Dr. Ingolf Steffan-Dewenter and as well the advi- sory committee of my graduate school BIGSS consisting of Prof. Dr. Thomas Dandekar, Prof. Dr. Jörg Schultz and Prof. Dr. Heinrich Sticht. For financial sponsoring and meeting the costs of my travels and research equipment, I gratefully acknowledge the Elite Network Bavaria graduate school BIGSS. Furthermore, as this thesis has been awarded with the Bio- center Science Award, I would like to thank the Biocenter Würzburg Commitee for their choice and I appreciate very much the interest in my work. Especially, I am deeply grateful to my family including Katrin for supporting and encour- aging me during the complete time of my studies; and even more for the reviving times away from these! They and several good friends had to cope with less visits by me and other cut-backs; I really appreciate your patience and hope to see you more in the time after thesis submission! vii SUMMARY Biodiversity may be investigated and explored by the means of genetic sequence informa- tion and molecular phylogenetics. Yet, with ribosomal genes, information for phylogenetic studies may not only be retained from the primary sequence, but also from the secondary structure. Software that is able to cope with two dimensional data and designed to answer taxonomic questions has been recently developed and published as a new scientific pipeline. This thesis is concerned with expanding this pipeline by a tool that facialiates the annotation of a ribosomal region, namely the ITS2. We were also able to show that this states a crucial step for secondary structure phylogenetics and for data allocation of the ITS2-database. This resulting freely available tool determines high quality annotations. In a further study, the complete phylogenetic pipeline has been evaluated on a theoretical basis in a comprehen- sive simulation study. We were able to show that both, the accuracy and the robustness of phylogenetic trees are largely improved by the approach. The second major part of this thesis concentrates on case studies that applied this pipeline to resolve questions in taxonomy and ecology. We were able to determine several indepen- dent phylogenies within the green algae that further corroborate the idea that secondary structures improve the obtainable phylogenetic signal, but now from a biological perspec- tive. This approach was applicable in studies on the species and genus level, but due to the conservation of the secondary structure also for investigations on the deeper level of taxonomy. An additional case study with blue butterflies indicates that this approach is not restricted to plants, but may also be used for metazoan phylogenies. The importance of high quality phylogenetic trees is indicated by two ecological studies that have been conducted. By integrating secondary structure phylogenetics, we were able to answer questions about the evolution of ant-plant interactions and of communities of bacteria residing on different plant tissues. Finally, we speculate how phylogenetic methods with RNA may be further enhanced by integration of the third dimension. This has been a speculative idea that was supplemented with a small phylogenetic example, however it shows that the great potential of structural phylogenetics has not been fully exploited yet. Altogether, this thesis comprises aspects of several different biological disciplines, which are evolutionary biology and biodiversity research, community and invasion ecology as well as molecular and structural biology. Fur- ther, it is complemented by statistical approaches and development of informatical software. All these different research areas are combined by the means of bioinformatics as the central connective link into one comprehensive thesis. viii ZUSAMMENFASSUNG Biologische Diversität kann mit Hilfe molekularer Sequenzinformation und phylogenetis- chen Methoden erforscht und erfasst werden. Bei ribosomalen Genen kann man jedoch wertvolle Information nicht nur aus der Primärsequenz beziehen, sondern auch aus der Sekundärstruktur. In den letzen Jahren wurde Software entwickelt, die solche Daten für taxonomische Fragestellung verwerten kann. Diese Arbeit beschäftigt sich mit einer Er- weiterung dieser Methodik durch eine Software-Anwendung, die die Annotation des ribo- somalen Genes ITS2 deutlich vereinfacht. Mit dieser Studie konnten wir zeigen, dass dies einen entscheidenden Schritt der Sequenz-Struktur-Phylogenie und der Datenerfassung der ITS2-Datenbank darstellt.