Characterization of a Plant Gene Family Expanded in Glycine Max
Total Page:16
File Type:pdf, Size:1020Kb
Scholars' Mine Masters Theses Student Theses and Dissertations Spring 2014 Characterization of a plant gene family expanded in glycine max Lisa Snoderly-Foster Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses Part of the Biology Commons, and the Environmental Sciences Commons Department: Recommended Citation Snoderly-Foster, Lisa, "Characterization of a plant gene family expanded in glycine max" (2014). Masters Theses. 7277. https://scholarsmine.mst.edu/masters_theses/7277 This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected]. iv CHARACTERIZATION OF A PLANT GENE FAMILY EXPANDED IN GLYCINE MAX by LISA SNODERLY-FOSTER A THESIS Presented to the Faculty of the Graduate School of the MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY In Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE IN APPLIED AND ENVIRONMENTAL BIOLOGY 2014 Approved by Ronald Frank, Advisor Katie Shannon Dave Westenberg iv iii ABSTRACT Glycine max, commonly named the cultivated soybean, is one of the oldest and most important food crops in the world. The study of the G. max genome provides valuable insight into the molecular mechanisms that govern its reproduction and environmental responsiveness, key factors in maximizing crop yield. Since the complete sequencing of the genome in 2010, the analysis has become faster and easier, especially with the development of numerous web-based, publically accessible bioinformatics tools. This research effort utilizes these tools to characterize a small, unannotated G. max gene family. Although no definitive evidence was uncovered for the production of a functional protein product from these genes, evidence does exist for the transcription of 3 of 5 genes. Through gene model verification, synonymous substitution calculations, structural fold analysis, cis-element identification, and comparisons to molecules of known structure, an attempt was made to define the evolutionary history and pinpoint putative function of the conceptually translated amino acid sequences from this family of genes. iv ACKNOWLEDGMENTS I would like to extend thanks to Dr. Ronald Frank for his mentorship. He saw enough potential in me to take me on as a student and to invest a substantial amount of time and energy in personally guiding me through this process. Your wisdom has been invaluable. I would also like to extend thanks to Dr. Dave Westenberg and Dr. Katie Shannon, members of my thesis committee. Thank you each for allowing me to do a rotation in your lab. I gained valuable experience and was afforded the opportunity to learn techniques that I might not have had a chance to learn otherwise. Dr. Gayla Olbricht, thank you for taking the time to look into my research and help me determine whether a statistical analysis could be performed on the data I collected. Finally, I would like to extend gratitude to my family for their support. To my partner Jennifer, without your constant reassurance that this was the right path for me and that the financial hardships have been worth the end gain, I might not have had the resolve to give this my best effort. You have been the foundation of my success. To my parents, thank you for supporting this venture, and for your encouragement and willingness to help in any way possible. v TABLE OF CONTENTS Page ABSTRACT ...................................................................................................................... .iii ACKNOWLEDGMENTS ................................................................................................ .iv LIST OF ILLUSTRATIONS ......................................................................................... …xi LIST OF TABLES………………………………………………………………………xiv NOMENCLATURE ………………………………………………………………….....xv SECTION 1. INTRODUCTION ...................................................................................................... 1 1.1. GLYCINE MAX…………………………………………………………………1 1.2. GENE DUPLICATION AND GENE FAMILIES ............................................. 4 1.3. EVIDENCE OF GENE EXPRESSION.............................................................. 8 1.3.1. ESTs……………………………………………………………………..8 1.3.2. Consensus Data………………………………………………………….9 1.3.2.1. Promoter elements……………………………………………...9 1.3.2.2. Polyadenylation signals...……………………………………..10 1.3.2.3. Intron/exon borders..…………………………………………..12 1.3.2.4. Splicing signals within introns………………………………...13 1.3.3. MicroRNA……………………………………………………………...15 1.3.4. Dyad Symmetry………………………………………………………...16 1.4. DATABASES AND OTHER BIOINFORMATICS TOOLS…………….…..16 1.4.1. Proteins: PFAM, Panther, KOG, and PDB………..…………………...16 1.4.2. Phytozome………………………………………………………….…..17 vi 1.4.3. NCBI…………………………………………………………………...18 1.4.4. DNA Subway…………………………………………………………..19 1.4.5 Phylogeny.fr…………………………………………………………….19 1.4.6. ExPASy…………………………………………………………….…..20 1.4.7. MEME…………………………………………………………….……20 1.4.8. CLUSTALw……………………………………………………….…...20 1.4.9. PAL2NAL………………………………………………………….…..21 1.4.10. SNAP…………………………………………………………….…....21 1.4.11. PLACE………………………………………………………………..21 1.4.12. CELLO…………………………………………………………….….22 1.4.13. PSIPRED: Protein Sequence Analysis Workbench.………………….22 1.4.14. DNA Dot Plots...………………………………………………….…..23 1.4.15. I-TASSER………………………………………………………….…24 2. MATERIALS AND METHODS……………………………………………..........26 2.1. BLAST………………………………………………………….…………..…26 2.2. SEQUENCE ALIGNMENTS………………………………………………....26 2.3. CHOICE OF GENE FAMILY AND IDENTIFICATION OF MEMBERS.…27 2.4. EVOLUTIONARY AND EXPRESION ANALYSIS FOR GENE MODEL CONSTRUCTION……………………………………………….…29 2.4.1. Neighbor Gene Analysis…..…………………………………………...29 2.4.2. EST’s……………………………………………………………….…..30 2.4.3.Synonymous and Nonsynonymous Substitution Rates.………………...31 2.4.4. Glycine max Family Phylogeny….……………….……………………31 2.4.5. Plant Species Family Phylogeny.……………………..………….…….31 vii 2.4.6. Constructing Gene Models…...………………………………………...32 2.4.6.1. Predicting gene models………………………………………..32 2.4.6.2. Verifying intron/exon borders using EST data………………..32 2.4.6.3. Identification of start codon through ORF (open reading frame) analysis………………………………………………...33 2.4.7. Promoter Element Identification………………...……………………..33 2.4.8. Evolutionary Analysis of Verified Gene Family Member Resolve Models………………………………………………………………….34 2.4.8.1. Multiple and pairwise alignments to analyze coding capacity and possible mutation sites……………………………………34 2.4.8.2. Generation of dot plots to assess similarity of sequence outside of the coding area……………………………………..35 2.5. NON-CODING SEQUENCE ANALYSIS……………………………….…..35 2.6. FUNCTIONAL ANALYSIS……………………………………………….…36 3. RESULTS……………………………………………………………………….….39 3.1. CHOICE OF GENE FAMILY AND IDENTIICATION OF MEMBERS…...39 3.1.1. Criteria Match……………………………………………………….….39 3.1.2. Association Map Created Using BLAST within Glycine max Genome Browser…..…………………………………………………...40 3.1.3. Chromosome Maps………………………………………………….….43 3.1.4. General Summary of LJFgene Family Member Composition…………44 3.2. GENE STRUCTURE PREDICTION AND EST EXPRESSION ANALYSIS……………………………………………………………………45 3.2.1. Constructed Gene Models…..……………………………………….…45 3.2.2. Decorated Sequences…………………………………………………...46 3.2.3. Promoter Element Locations....………………………………………...46 viii 3.2.4. EST Data for Intron/Exon Border Verification…….…………………..47 3.3. EVOLUTIONARY ANALYSIS………………………………………….…..48 3.3.1. Neighbor Gene Analysis……..…………………………………….…..48 3.3.2. Synonymous and Nonsynonymous Substitution Rates…….…………..51 3.3.3. Phylogenetic Trees..……………………………………………………54 3.3.4. Potential Coding Capacity…………………..………………………….57 3.3.4.1. Multiple alignment of nucleic acid sequences………………...57 3.3.4.2. Multiple alignment of conceptually translated peptide sequences……………………………………………………...60 3.3.4.3. Codon alignment of gene family members extended on both the 3’ and 5’ ends………………………………………..62 3.3.4.4. Pairwise dot plot matrices…………………………………….64 3.4. FUNCTIONAL ANALYSIS…………………………………………………72 3.4.1. Domain Identification Through Conservation of Sequence….………..72 3.4.2. Promoter Element Analysis……………………………..……………..74 3.4.3. Subcellular Localization Predictions……………..……………………77 3.4.3.1. CELLO………………………………………………………..77 3.4.3.2. Hydropathicity analysis………………………...……………..78 3.4.3.3. I-TASSER gene ontology results……………………………..80 3.4.4. Secondary Structure Predictions………………….………..…………..80 3.4.5. Tertiary Structure and Function Predictions……..…………………….82 3.5. NON-CODING SEQUENCE ANALYSIS…………………………………...90 3.5.1. Nucleotide Sequences, Amino Acid Translations, and Putative Models for Non-coding Sequences Associated with LJFgene Family…………………………………………………………...……...90 ix 3.5.2. Motif Conservation……………………………………………………..94 3.5.3. Alignment and Dot Plot of LJFnm’s Against LJFgene(s)..………....….94 3.5.4. MicroRNA Prediction………………………...…………………….….97 3.5.5. Promoter Element Identification…………...…………………………..98 4. DISCUSSION…………………………………………………………………….101 4.1. CHOICE OF GENE FAMILY AND IDENTIFICATION OF MEMBERS……………………………………………………………..……101 4.2. GENE STRUCTURE PREDICTION AND EST EXPRESSION ANALYSIS………………………………………………………….…….…102 4.3. EVOLUTIONARY ANALYSIS………………………………………….…104 4.4. STRUCTURE, FUNCTION, AND LOCALIZATION PREDICTIONS……119 4.5. NON-CODING SEQUENCE ANALYSIS………………………………….129 4.6. FINAL CONCLUSIONS…………………………………………………….132 APPENDICES A. GENE FAMILIES MEETING CHOICE