A Comprehensive Dataset of Genes with Loss-Of-Function Mutant Phenotypes in Arabidopsis Thaliana
Total Page:16
File Type:pdf, Size:1020Kb
A COMPREHENSIVE DATASET OF GENES WITH LOSS-OF-FUNCTION MUTANT PHENOTYPES IN ARABIDOPSIS THALIANA By JOHN PAUL LLOYD Bachelor of Science in Microbiology Oklahoma State University Stillwater, OK 2009 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE July, 2012 A COMPREHENSIVE DATASET OF GENES WITH LOSS-OF-FUNCTION MUTANT PHENOTYPES IN ARABIDOPSIS THALIANA Thesis Approved: David Meinke Thesis Adviser Andrew Doust Gerald Schoenknecht Ming Yang Sheryl Tucker Dean of the Graduate College ii TABLE OF CONTENTS Chapter Page I. INTRODUCTION ......................................................................................................1 Nature of Phenotype Data ........................................................................................1 Availability of Phenotype Data ................................................................................2 Outline and Scope of Thesis ....................................................................................4 II. SINGLE GENE MUTANT PHENOTYPE DATASET ...........................................7 Definition and Classification of Mutant Phenotypes ...............................................7 Confidence Status of Gene-to-Phenotype Associations ........................................14 Dataset Construction ..............................................................................................15 Dataset Overview ...................................................................................................20 Future Updates to the Arabidopsis Phenotype Dataset ..........................................26 III. ARABIDOPSIS GENES WITH GAMETOPHYTE PHENOTYPES ..................31 Phenotypes of Mutant Gametophytes ....................................................................31 Construction and Organization of the Gametophyte Dataset ................................32 Overview of the Gametophyte Dataset ..................................................................36 Analysis of the Gametophyte Dataset ....................................................................37 IV. MULTIPLE MUTANT PHENOTYPE DATASET ..............................................42 Phenotypes Resulting from Mutations in Multiple Genes .....................................42 Construction and Organization of the Multiple Mutant Dataset ............................44 Overview of the Multiple Mutant Dataset .............................................................48 Sequence and Expression Similarity of Redundant Genes ....................................54 V. ANALYSIS OF THE ARABIDOPSIS PHENOTYPE DATASET ......................56 Protein Function and Mutant Phenotype................................................................56 Subcellular Localization and Mutant Phenotype ...................................................59 Protein Connectivity and Mutant Phenotype .........................................................63 Genetic Redundancy and Mutant Phenotype .........................................................65 Phenotypes of Putative Orthologs ..........................................................................69 Genes with No Loss-of-Function Phenotype .........................................................72 REFERENCES ............................................................................................................75 APPENDIX A: Controlled Vocabulary for Mutant Phenotype Descriptions .............81 iii APPENDIX B: Arabidopsis Phenotype Classification System ................................107 APPENDIX C: Single Gene Mutant Phenotype Dataset, Phenotype Information ...110 APPENDIX D: Single Gene Mutant Phenotype Dataset, Gene Information ...........228 APPENDIX E: Arabidopsis Genes with Gametophyte Phenotypes .........................400 APPENDIX F: Protein Function Classification System ...........................................416 APPENDIX G: Multiple Mutant Phenotype Dataset, Gene Information .................418 APPENDIX H: Multiple Mutant Phenotype Dataset, Grouping Information ..........441 APPENDIX I: Phenotypes of Putative Orthologs in Other Plants ...........................469 iv LIST OF TABLES Table Page 1. Biochemical Defect Hierarchy ................................................................................9 2. Information Included in the Dataset .....................................................................18 3. Phenotype Group and Class Distributions ............................................................22 4. Gametophyte Transmission Defect Symbols ........................................................35 5. Transcription of Genes with Embryo and Gametophyte Defects during Pollen Development .................................................................................................41 6. Features and Phenotypes of Multiple Mutant Clusters .........................................50 7. Phenotype Group Similarity of Paired Interactors ................................................66 v LIST OF FIGURES Figure Page 1. Overview of the Phenotype Classification System ...............................................11 2. Distribution of Phenotype Subsets ........................................................................23 3. Historical Perspective on Genes with Mutant Phenotypes ...................................25 4. Chromosomal Distribution of Arabidopsis Phenotype Genes ..............................27 5. Protein Functions of Genes with Embryo and Gametophyte Phenotypes ............39 6. Complex Multiple Mutant Clusters ......................................................................47 7. Chromosomal Distribution of Fully-Redundant Genes ........................................49 8. Phenotype Group Distributions of Single and Multiple Mutants .........................52 9. Protein Functions of Phenotype Gene Products....................................................58 10. Subcellular Localization of Phenotype Genes ....................................................61 11. Genetic Redundancy of Phenotype Genes and the Arabidopsis Genome ..........68 vi CHAPTER I INTRODUCTION Nature of Phenotype Data Mutant phenotypes, both dominant and recessive, are an important tool in the understanding of gene function. Dominant phenotypes usually result from a gain-of- function mutation. Examples include disruptions of promoter regions that cause overexpression or inappropriate expression of a gene product, and changes to protein domains or active sites that create a novel gene function not masked by a wild type allele. In some cases, dominant phenotypes result from dominant negative mutations that produce a defective protein product with an antagonistic effect on other proteins in the cell. Because of the diversity of these defects, it can be difficult to interpret the biological significance of a dominant phenotype. By comparison, recessive phenotypes, which result from the loss of gene function, are more straightforward to interpret. Recessive phenotypes result from either complete abolition or varying degrees of reduction in gene function. Less frequently, semi-dominant phenotypes observed in 1 heterozygotes are produced by the loss of gene function. These are often the result of haploinsufficiency; a single functional allele does not generate enough gene product to meet the needs of a cell. The focus of my Master’s research is genes with recessive, loss- of-function mutant phenotypes in Arabidopsis thaliana . Well-curated and reliable datasets of mutant phenotype information are an important component of model genetic systems. Ideally, these datasets should provide a direct connection between a mutant gene and the resulting phenotype and should include the full range of phenotypes identified in an organism through genetic analysis. Phenotype datasets allow for convenient access to mutant phenotype information in an organism, but they function as more than simple repositories of data. Studying comprehensive mutant phenotype information can offer insights into the biological consequences resulting from disruption of a protein family, metabolic network, or organelle function. They can also be utilized to evaluate the selective pressure exerted to maintain particular morphologies and the role phenotype plays in gene evolution (Hurst and Smith, 1999; Hirsh and Fraser, 2001; Jordan et al., 2002). Phenotype datasets therefore facilitate the investigation of a wide range of important biological questions. Availability of Phenotype Data The importance of collections of phenotype information associated with specific gene disruptions is reflected in the presence of widely-used phenotype datasets in a variety of eukaryotic model systems. The Online Mendelian Inheritance in Man database (McKusick, 2007) and Mouse Genome Database (Blake et al., 2011) are important phenotype datasets in mammals. These two collections represent an important 2 integration between basic research and clinical care. Flybase, Wormbase, and the Saccharomyces Genome Database keep phenotype datasets for the classic model systems of Drosophila , C. elegans , and yeast, respectively (Drysdale and FlyBase Consortium, 2008; Engel et al., 2010; Harris et al., 2010). The emerging model fish system, D. rerio , also has a phenotype database at the Zebrafish Information Network (Bradford et al., 2011). PhenomicDB serves as a single location that synthesizes and houses phenotype data from these databases, as well as information from