Detection of Ancient Genome Duplications in Several Flowering Plant
Total Page:16
File Type:pdf, Size:1020Kb
DETECTION OF ANCIENT GENOME DUPLICATIONS IN SEVERAL FLOWERING PLANT LINEAGES AND SYNTE-MOLECULAR COMPARISON OF HOMOLOGOUS REGIONS by JINGPING LI (Under the Direction of Andrew H. Paterson) ABSTRACT Flowering plants have evolved through repeated ancient genome duplications (or paleo- polyploidies) for ~200 million years, a character distinct from other eukaryotic lineages. Paleo-duplicated genomes returned to diploid heredity by means of extensive sequence loss and rearrangement. Therefore repeated paleo-polyploidies have left multiple homeologs (paralogs produced in genome duplication) and complex networks of homeology in modern flowering plant genomes. In the first part of my research three paleo-polyploidy events are discussed. The Solanaceae “T” event was a hexaploidy making extant Solanaceae species rare as having derived from two successive paleo-hexaploidies. The Gossypium “C” event was the first paleo-(do)decaploidy identified. The sacred lotus (Nelumbo nucifera) was the first sequenced basal eudicot, which has a lineage-specific “λ” paleo-tetraploidy and one of the slowest lineage evolutionary rates. In the second part I describe a generalized method, GeDupMap, to simultaneously infer multiple paleo-polyploidy events on a phylogeny of multiple lineages. Based on such inferences the program systematically organizes homeologous regions between pairs of genomes into groups of orthologous regions, enabling synte-molecular analyses (molecular comparison on synteny backbone) and graph representation. Using 8 selected eudicot and monocot genomes I showed this framework of multiple paleo-polyploidy detection and synte-molecular network facilitates genomic comparisons among flowering plants and reveals deep correspondences among their genome structure. INDEX WORDS: Ancient genome duplication, Paleo-polyploidy, Synteny, Synte-molecular comparison, Genome evolution, Flowering plants DETECTION OF ANCIENT GENOME DUPLICATIONS IN SEVERAL FLOWERING PLANT LINEAGES AND SYNTE-MOLECULAR COMPARISON OF HOMOLOGOUS REGIONS by JINGPING LI B.S., Zhejiang University, China, 2005 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ATHENS, GEORGIA 2014 © 2014 Jingping Li All Rights Reserved DETECTION OF ANCIENT GENOME DUPLICATIONS IN SEVERAL FLOWERING PLANT LINEAGES AND SYNTE-MOLECULAR COMPARISON OF HOMOLOGOUS REGIONS by JINGPING LI Major Professor: Andrew H. Paterson Committee: Michael Arnold Jessica Kissinger Rodney Mauricio Paul Schliekelman Electronic Version Approved: Julie Coffield Interim Dean of the Graduate School The University of Georgia December 2014 ! iv DEDICATION To my parents. ! ! v ACKNOWLEDGEMENTS I would like to thank generous support from the University of Georgia Graduate School Assistantship for my first 21 months of study. I am deeply grateful to my advisor Dr. Andrew Paterson for guiding me through the intellectual quest of graduate school, and providing me with precious funding and opportunities to participate in several fascinating research projects. I would like to thank many current and previous lab members, and fellow students for lots of direct and indirect help. I am also thankful to many professors in the Institute of Bioinformatics whose wonderful courses prepared me with both biological and computational skills necessary for my research. Last but not least I would like to thank my great committee members for their valuable time and insights into my dissertation research. ! ! vi TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ……………………………………………………………………………… v LIST OF TABLES ………………………………………………………………………………………. ix LIST OF FIGURES ……………………………………………………………………………………… x CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW …………………………………… 1 1.1 Dissertation structure and related publications ………………………………………………… 2 1.2 Brief overview of flowering plant phylogeny ………………………………………………… 4 1.3 Repeated ancient genome duplications prevalent in flowering plant evolution ……………… 6 1.4 Methodological development for homeologous region detection and ancient genome duplication inference ………………………………………………………………………… 11 1.5 Ancient genome duplications identified in sequenced flowering plant lineages ……………… 13 1.6 Framework of plant genome comparison ……………………………………………………… 17 CHAPTER 2 TWO PALEO-HEXAPLOIDIES UNDERLIE FORMATION OF MODERN SOLANACEAE GENOME STRUCTURE……………………………………………………………… 18 2.1 Preface ………………………………………………………………………………………… 19 2.2 Abstract………………………………………………………………………………………… 21 2.3 Introduction …………………………………………………………………………………… 21 2.4 Methods to identify paleo-polyploidy ………………………………………………………… 23 2.5 The paleo-hexaploidy T: triplication of the Solanaceae ancestral genome …………………… 24 2.6 Further circumscribing the T event using additional asterid genomes………………………… 26 2.7 A more ancient hexaploidy γ predated divergence of rosid and asterid plants………………… 34 2.8 The nature and consequences of the γ and T paleo-hexaploidy events………………………… 35 2.9 Summary and perspective……………………………………………………………………… 38 ! ! vii CHAPTER 3 A WELL-RETAINED ANCIENT GENOME DUPLICATION IN THE SLOWLY EVOLVING BASAL EUDICOT NELUMBO (LOTUS) LINEAGE …………………………………… 40 3.1 Introduction …………………………………………………………………………………… 40 3.2 A lineage-specific paleo-tetraploidy (λ) after divergence with core eudicots ………………… 41 3.3 Slow lineage evolutionary rate ………………………………………………………………… 44 3.4 Outgroup for core eudicot and monocot genome comparisons ……………………………… 45 3.5 Discussion……………………………………………………………………………………… 49 CHAPTER 4 THE PAN-GOSSYPIUM PALEO-POLYPLOIDY AND TWO NUMT REGIONS IN TWO OF THE SUBGENOMES IN MODERN G. RAIMONDII ………………………………… 54 4.1 Abstract………………………………………………………………………………………… 54 4.2 Introduction …………………………………………………………………………………… 55 4.3 Methods………………………………………………………………………………………… 57 4.4 Results ………………………………………………………………………………………… 59 4.5 Discussion……………………………………………………………………………………… 67 4.6 Conclusion …………………………………………………………………………………… 73 CHAPTER 5 TWO PALEO-TETRAPLOIDIES IN ANCIENT GRASS AND MONOCOT LINEAGES ……………………………………………………………………………………………… 75 5.1 Introduction …………………………………………………………………………………… 75 5.2 Overview of synteny conservation in studied genomes ……………………………………… 76 5.3 Circumscribing the pan-grass σ duplication event …………………………………………… 78 5.4 Circumscribing the pre-commelinid τ duplication event in early monocots ………………… 79 5.5 The value of ancestral gene order in genome comparisons …………………………………… 80 5.6 Variation of lineage nucleotide evolutionary rates and estimated ages of σ and τ …………… 81 5.7 Origins of high GC3 genes in grasses ………………………………………………………… 84 CHAPTER 6 GeDupMap: SIMULTANEOUS DETECTION OF MULTIPLE ANCIENT GENOME DUPLICATIONS AND A SYNTE-MOLECULAR FRAMEWORK ………………………………… 87 ! ! viii 6.1 Abstract ……………………………………………………………………………………… 87 6.2 Introduction …………………………………………………………………………………… 88 6.3 Methods ……………………………………………………………………………………… 89 6.4 Results ………………………………………………………………………………………… 96 6.5 Discussion …………………………………………………………………………………… 102 6.6 Conclusion …………………………………………………………………………………… 104 CHAPTER 7 CONCLUSION AND PERSPECTIVE ……………………………………………… 106 7.1 Structural evolution of paleo-polyploid angiosperm genomes ……………………………… 106 7.2 Effective synte-molecular comparisons of related plant genomes …………………………… 108 7.3 Closing remarks ……………………………………………………………………………… 109 REFERENCES ………………………………………………………………………………………… 111 ! ! ix LIST OF TABLES Page Table 1.1: Paleo-polyploidy events identified in sequenced angiosperm genomes……………………… 14 Table 3.1: Distribution of lotus loci at different paleo-polyploidy depths ……………………………… 43 Table 4.1: Mean synonymous substitution rate (Ks) values between syntenic gene pairs ……………… 66 Table 5.1: Sources and basic information of angiosperm genomes used in this study. ………………… 77 Table 5.2: Summary of synteny blocks in the six studied genomes. …………………………………… 78 Table 6.1: Sources and basic information of angiosperm genomes used in this study ………………… 96 ! ! x LIST OF FIGURES Page Figure 1.1: Different models to form (paleo)hexaploidy. ………………………………………………… 8 Figure 2.1: The Solanum whole genome triplication …………………………………………………… 20 Figure 2.2: Histograms of Ks (nucleotide substitutions per synonymous site) between paralogous and orthologous gene pairs in tomato, monkey flower, bladderwort, and grape ………………… 28 Figure 2.3: Simplified cladogram of some representative asterid and outgroup lineages ……………… 29 Figure 2.4: Alignment of tomato and kiwifruit genomes ……………………………………………… 30 Figure 2.5: Alignment of tomato and monkey flower genomes ………………………………………… 31 Figure 2.6: LASTZ alignment between the coffee BAC region and tomato genome…………………… 33 Figure 2.7: Schematic representation of orthologous and paralogous regions in tomato (S. lycopersicum) and potato (S. tuberosum) genomes. ………………………………………………………… 37 Figure 3.1: Dot plot alignment of the lotus and grape genomes ………………………………………… 42 Figure 3.2: Distribution of synonymous substitution rate (Ks) between homeologous gene pairs in intra- and inter- genomic comparisons ……………………………………………………… 44 Figure 3.3: Number and percentage of genes in the query genomes having homeologous genes in the reference genome …………………………………………………………………………… 47 Figure 3.4: Multiple alignment of a set of syntenic regions in papaya, peach, grape and lotus ………… 48 Figure 3.5: Ks distributions between orthologous homeologs gene pairs comparing the lotus genome to the grape genome (a) and sorghum genome (b) ………………………………………… 50 Figure 3.6: Differences in mRNA length, CDS length, intron length, and percentage mRNA length difference attributable