Chloroplast Comparative Genomics: Implications for Phylogeny, Evolution, and Biotechnology Christopher Saski Clemson University, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Clemson University TigerPrints All Dissertations Dissertations 8-2007 Chloroplast Comparative Genomics: Implications For Phylogeny, Evolution, and Biotechnology Christopher Saski Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Part of the Genetics Commons Recommended Citation Saski, Christopher, "Chloroplast Comparative Genomics: Implications For Phylogeny, Evolution, and Biotechnology" (2007). All Dissertations. 115. https://tigerprints.clemson.edu/all_dissertations/115 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. CHLOROPLAST COMPARATIVE GENOMICS: IMPLICATIONS FOR PHYLOGENY, EVOLUTION AND BIOTECHNOLOGY A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Genetics by Christopher Alan Saski August 2007 Accepted by: Jeffrey P. Tomkins, Committee Chair Dr. Hong Luo Dr. William R. Marcotte Jr. Dr. Kerry Smith ABSTRACT Lack of complete chloroplast genome sequences is still a limiting factor determining phylogenetic relationships, discerning evolutionary forces, and extending chloroplast genetic engineering to useful crops. Therefore, the chloroplast genomes from six economically important crops were isolated and sequenced. The results will have an impact on chloroplast biology and biotechnology. The complete soybean chloroplast genome was compared to the other completely sequenced legumes, Lotus and Medicago. The rpl22 gene was found to be missing from all three legumes, a very informative phylogenetic marker. There is a single, large inversion changing the gene order in the legumes from the typical order found in Arabidopsis. Detailed analysis of repeat elements within the chloroplast genomes analyzed indicate they may play some functional role in evolution, and that the psbA and rbcL repeats indicate that the loss of an inverted repeat has only occurred once during the evolutionary history of the legumes. Ideal sites for integration of transgenes were also determined. Next, the chloroplast genomes of the agriculturally important solanacaeae crops Solanum lycopersicum and potato were isolated and sequenced. Analysis of the complete chloroplast genome sequences revealed significant insertions and deletions (indels) within certain coding regions. Photosynthesis, RNA, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. The identified repeats characterized across the solanaceae are similar to the legumes, located in the same genes or intergenic regions indicating a possible functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in ii chloroplast genomes. Analysis of RNA editing sites demonstrated they were less common than what was previously observed in tobacco and Atropa, suggesting a loss of editing sites and a possible increase in variation at the RNA level. Finally, the complete chloroplast genome sequences of barley, sorghum, and creeping bentgrass, were identified and compared to six published grass chloroplast genomes to reveal that gene content and order are similar, but two microstructural changes have occurred. First, the expansion of the inverted repeat at the small single copy/inverted repeat boundary that duplicates a portion of the 5’ end of ndhH is restricted to three genera of the subfamily Pooideae (Agrostis, Hordeum, and Triticum). Second, a 6bp deletion in ndhK is shared by creeping bentgrass, barley, rice, and wheat, and this event supports the sister relationship between the subfamilies Erhartoideae and Pooideae. Repeat analysis revealed many dispersed repeats shared among the grasses, as well as repeats that flank a major genome rearrangement common only to the grasses suggesting this repeat had a functional role in the genome rearrangement. Examination of simple sequence repeat markers identified 16-21 potential SSRs. Distances based on intergenic spacer regions were analyzed as well as RNA editing sites. Phylogenetic trees based on DNA sequences of 61 protein- coding genes of 38 taxa using both maximum parsimony and likelihood methods provide moderate support for a sister relationship between the subfamilies Erhartoideae and Pooideae. iii DEDICATION I dedicate this manuscript to my wife and parents for all their love, support, inspiration, and dedication. iv ACKNOWLEDGMENTS I would like to Acknowledge Dr. Jeff Tomkins as my advisor. I acknowledge Dr. Henry Daniell and Dr. Robert Jansen for insightful discussions, motivation, data interpretation, and for assisting with scope and direction in this study. I would also like to acknowledge my graduate committee; Dr. William R. Marcotte Jr, Dr. Hong Luo, and Dr. Kerry Smith. v TABLE OF CONTENTS Page TITLE PAGE.................................................................................................................... i ABSTRACT ....................................................................................................................... ii DEDICATION................................................................................................................. iv ACKNOWLEDGEMENTS........................................................................................... v LIST OF TABLES............................................................................................................ ix LIST OF FIGURES ......................................................................................................... xi CHAPTER 1. INTRODUCTION ......................................................................................... 1 Endosymbiosis........................................................................................ 1 Chloroplasts and Other Plastid Types ................................................ 4 Gene Transfer......................................................................................... 8 Why do Plastids Have Genomes ......................................................... 8 Phylogenetic Utility of Chloroplast Genomes................................... 9 Chloroplast Molecular Markers............................................................ 12 Plastids and Biotechnology................................................................... 13 2. THE COMPLETE CHLOROPLAST GENOME OF GLYCINE MAX AND COMPARATIVE ANALYSIS WITH OTHER LEGUME GENOMES........................................................................ 16 Introduction............................................................................................ 16 Methodology........................................................................................... 17 DNA Sources.......................................................................................... 17 DNA Sequencing and Data Assembly................................................ 17 Genome Annotation.............................................................................. 18 Molecular Evolutionary Comparisons ................................................ 20 Results...................................................................................................... 20 Size, gene content and organization of the Glycine chloroplast genome..................................................................................................... 20 Comparison of genome organization among legumes and Arabidopsis................................................................................................ 23 vi Table of Contents (Continued) Page Extent of the Inverted Repeat.............................................................. 27 Repeat Analysis....................................................................................... 32 Discussion ............................................................................................... 40 3. COMPLETE CHLOROPLAST GENOME SEQUENCES OF SOLANUM BULBOCASTANUM, SOLANUM LYCOPERSICUM AND COMPARATIVE ANALYSIS WITH OTHER SOLANACEAE GENOMES ............................................ 45 Introduction............................................................................................ 45 Methodology........................................................................................... 47 DNA Sources.......................................................................................... 47 DNA Sequencing and Genome Assembly......................................... 47 Genome Annotation.............................................................................. 47 Molecular Evolutionary Comparisons ................................................ 48 Comparison of Intergenic Regions...................................................... 48 Variations Between Coding Sequences and cDNAs......................... 49 Results...................................................................................................... 49 Size, gene content and organization of Solanum lycopersicum and Solanum bulbocastanum chloroplast......................................................... 50 Gene content and Gene Order ............................................................ 52 Repeat Structure ..................................................................................... 55 Intergenic Spacer Regions....................................................................