Comparative Analysis of Plant Genomes Through Data Integration
Total Page:16
File Type:pdf, Size:1020Kb
Comparative Analysis of Plant Genomes through Data Integration Michiel Van Bel Promoter: Prof. Dr. Yves Van de Peer Co-Promoter: Prof. Dr. Klaas Vandepoele Ghent University Faculty of Sciences Department of Plant Biotechnology and Bioinformatics VIB Department of Plant Systems Biology Bioinformatics and Systems Biology Dissertation submitted in fulfillment of the requirements for the degree of Doctor (PhD) in Sciences, Bioinformatics). Academic year: 2012-2013 Examination Committee Prof. Dr. Geert De Jaeger (chair) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Yves Van de Peer (promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Klaas Vandepoele (co-promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Jan Fostier Faculty of Engineering, Department of Information Technology, Ghent University Prof. Dr. Peter Dawyndt Faculty of Science, Department of Applied Mathematics and Computer Science, Ghent University Dr. Steven Robbens Bayer Cropscience, Belgium Dr. Matthieu Conte Syngenta Seeds, France II Acknowledgements While the cover of this book carries my name, this thesis did not come to fruition by my hand only. These past years have been a great experience, for which I would like to express my gratitude to several people. First of all, I would like to thank Thomas Abeel, for getting me in touch with Yves’ research group, and encouraging me to start a PhD in bioinformatics. Without a chance encounter with him, I never would have dreamed obtaining a PhD would be possible. Secondly, I would like to thank my promoter and co-promoter, Yves Van de Peer and Klaas Vande- poele. The opportunity Yves has given to me to pursue a PhD and the great research environment of Yves’ lab have proven to be invaluable. The constant support and patience of Klaas in guiding me form the fundaments of this PhD. Our numerous discussions on how to proceed with our shared research were definitely instrumental in my growth as a researcher. Thirdly, I would like to express my gratitude to the members of my PhD jury, for reading my thesis and evaluating my work. Next up in line to be thanked is Sebastian Proost: a great colleague and flat-mate. Most of my research was done in collaboration with him, and the results need to be seen as such. A big thank you as well for my fellow IT knowledgeable colleagues at the lab: Sofie Van Landeghem, Marijn Vandevoorde, Thomas Van Parys, Frederik Delaere and Kenny Billeau. Their beacon of computer related jokes, general fun and laughter, and overall support in the darkness of the biology department were definitely important in keeping me happy and working. I also want to thank Yvan Saeys, for guiding me through the rough first year of my PhD. Though our shared research didn’t pan out, it was a good learning experience on what to do and not to do. All Binari people, present and past, need to be thanked, as well as all the people within the BSB and bio- comp group. The overall fun and interesting discussions we had will be remembered. Honorary mention goes to Lieven Sterck, whose constant presence provides a great atmosphere of collegiality within the lab. Another group to be thanked consists of the people from the IT staff. Without their unwavering dedi- cation in keeping our servers running and our hard drives spinning, together with their tacit approval of my development skills on the web server, this PhD would have been a lot more difficult. Outside of the PSB building I would give a big thank you to all my friends, especially my former school mates. The efforts most of you put into obtaining a PhD really gave me that extra boost in confidence to continue my own research. And last but definitely not least I would like to thank my family: my brother, sisters and mother. Their IV constant support, interest and love gave me the strength these past 6 years to carry on. Table of Contents Examination Committee I Acknowledgements III 1 Research Purpose and Scope 1-1 1.1 Overview . 1-3 1.2 Creation of a Platform for Comparative and Evolutionary Genomics . 1-3 1.3 Creation of a Platform for Transcriptome Analysis . 1-3 2 Introduction 2-1 2.1 Abstract: A history of genetics . 2-3 2.2 Comparative and Evolutionary Genomics in Plants . 2-5 2.2.1 Duplications in Plant Genomes . 2-6 2.2.2 Orthology . 2-7 2.3 Functional Genomics . 2-8 2.3.1 Gene Ontology . 2-8 2.3.2 Protein Domains . 2-9 2.3.3 Molecular Interactions . 2-10 2.3.4 Text Mining . 2-10 2.4 Bioinformatics Tools and Platforms . 2-10 2.4.1 Web Visualizations and Technologies . 2-11 2.4.2 Online Plant Genomics Platforms . 2-13 2.5 Author Contribution . 2-14 3 PLAZA: a Comparative Genomics Resource to Study Gene and Genome Evolution in Plants 3-1 3.1 Introduction . 3-5 3.2 Results . 3-6 3.2.1 Data Assembly . 3-6 3.2.2 Delineating Gene Families and Subfamilies . 3-7 3.2.3 Projection of Functional Annotation Using Orthology . 3-10 3.2.4 Exploring Genome Evolution in Plants . 3-12 3.2.5 Database Access, User Interface, and Documentation . 3-14 3.3 Methods . 3-15 3.3.1 Data Retrieval and Delineation of Gene Families . 3-15 3.3.2 Comparison of OrthoMCL with Phylogenetic Trees . 3-16 3.3.3 Alignments and Phylogenetic Trees . 3-16 3.3.4 Functional Annotation . 3-17 3.3.5 Detection of Collinearity . 3-17 3.3.6 Relative Dating using Synonymous Substitutions . 3-18 3.4 Summary and Future Prospects . 3-18 VI 3.5 Author Contribution . 3-19 4 Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform 4-1 4.1 Introduction . 4-5 4.2 Results and Discussion . 4-6 4.2.1 Gene Annotation and Gene Families . 4-6 4.2.2 Core Plant Gene Families and Detection of Clade-specific or Expanded Gene Families . 4-6 4.2.3 Integrative Orthology Viewer: an Ensemble Approach to Detect Orthology Rela- tionships . 4-8 4.2.4 Clusters of Functionally Related Genes in Eukaryotic Genomes . 4-11 4.2.5 Colinearity-based Genome Analysis . 4-13 4.2.6 User Interactivity via Workbench and Bulk Downloads . 4-16 4.3 Material and Methods . 4-16 4.3.1 Gene Models and Gene Families . 4-16 4.3.2 Colinearity . 4-17 4.3.3 Functional Annotation . 4-17 4.3.4 Functional Gene Clusters . 4-17 4.3.5 Orthology Prediction and Evaluation . 4-18 4.4 Author Contribution . 4-18 5 PLAZA Applications 5-1 5.1 The Study of Gene Duplicates Using the PLAZA Platform . 5-5 5.1.1 Duplicated Resistence Genes in Arabidopsis and Poplar . 5-5 5.1.2 Tandem and Block Duplicates in Chlamydomonas reinhardtii .......... 5-5 5.2 Comparative Co-expression Analysis in Plants . 5-8 5.2.1 Construction of Co-expression Networks and Comparison Across Species of Co- expression . 5-8 5.2.2 Functional Annotation . 5-9 5.2.3 Studying Conserved Gene Functions Using Comparative Co-expression Analysis 5-9 5.3 Studying Algal Genomics Using the pico-PLAZA Platform . 5-11 5.3.1 Gene Dynamics in Algal Genomes . 5-12 5.3.2 Functional Analysis of Large-scale Expression Data . 5-14 5.3.3 Environmental Genomics . 5-14 5.4 Author contribution . 5-15 6 TRAPID, an Efficient Online Tool for the Functional and Comparative Analysis of De Novo RNA-Seq Transcriptomes 6-1 6.1 Introduction . 6-5 6.2 Results and Discussion . 6-6 6.2.1 General Properties of the TRAPID Transcript Analysis Platform . 6-6 6.2.2 Evaluation of Homology Assignments . 6-9 6.2.3 Evaluation of the ORF Finding Routine . 6-9 6.2.4 Comparison of TRAPID with Blast2GO and KAAS . 6-10 6.2.5 Detection of Functional Biases in Transcriptome Subsets Using Enrichment Anal- ysis......................................... 6-11 6.3 Material and Methods . 6-13 6.3.1 Datasets, Construction Reference Protein Databases and Selection of Gene Fam- ily Representatives . 6-13 VII 6.3.2 Similarity Search, Gene Family Assignment and Functional Transfer Using Ho- mology . 6-13 6.3.3 Frame Assignment and Detection of Putative Frameshifts . 6-13 6.3.4 Meta-annotation . 6-13 6.3.5 Correction Using FrameDP . 6-14 6.3.6 Multiple Sequence Alignments and Phylogenetic Trees . 6-14 6.3.7 Implementation . 6-14 6.4 Conclusion . 6-14 6.5 Author Contribution . 6-14 7 Technology and Development 7-1 7.1 Data Processing . 7-5 7.1.1 Data Parsing . 7-5 7.1.2 Data Validation . 7-5 7.2 Visualizations . 7-6 7.2.1 Graphs and Charts . 7-6 7.2.2 Phylogenetic Trees . 7-6 7.2.3 WGDotplot . 7-7 7.2.4 CirclePlot . ..