Identification of Regional Motifs in the 5'UTR and Their Implication In
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1-1-2003 Identification of egionalr motifs in the 5'UTR and their implication in translational control mechanisms Sachet Ashok Shukla Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Recommended Citation Shukla, Sachet Ashok, "Identification of egionalr motifs in the 5'UTR and their implication in translational control mechanisms" (2003). Retrospective Theses and Dissertations. 20043. https://lib.dr.iastate.edu/rtd/20043 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Identification ofregional motifs in the 5'UTR and their implication in translational control mechanisms by Sachet Ashok Shukla A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Bioinformatics and Computational Biology Program of Study Committee: Srinivas Aluru, Co-major Professor Charles Link, Jr., Co-major Professor Philip Dixon W. Allen Miller Iowa State University Ames, Iowa 2003 Copyright© Sachet Ashok Shukla, 2003. All rights reserved. 11 Graduate College Iowa State University This is to certify that the master's thesis of Sachet Ashok Shukla has met the thesis requirements of Iowa State University Signatures have been redacted for privacy 111 To my mother, father and dear brother Sanket. IV TABLE OF CONTENTS LIST OF FIGURES v LIST OF TABLES VI ACKNOWLEDGEMENTS Vlll ABSTRACT x CHAPTER 1. INTRODUCTION 1 Introduction l Regulation of gene expression 2 Translational control 2 Motivation 7 CHAPTER 2. MATERIALS AND METHODS 10 Materials 10 Methodology 10 Test statistic 11 Overdispersion 13 CHAPTER 3. IMPLEMENTATION 16 Cleaning 16 Scoring of kozak sequences 16 Partitioning 20 Motif length determination 21 Further analysis 24 CHAPTER 4. RESULTS 26 Introduction 26 Nature of motifs 28 Common motifs 28 uORF analysis 31 CHAPTER 5. DISCUSSION 35 Human 35 Mouse 36 Rat 36 Drosophila 36 REFERENCES 38 APPENDIX. MOTIF TABLES 41 v LIST OF FIGURES Figure 1. uA TG frequencies 3 Figure 2. uORF frequencies 5 Figure 3. uORF length distribution 6 Figure 4. Kozak strength distribution 19 Figure 5(a). I and N region classification of non-redundant 5'UTRs 21 Figure 5(b). WIS and W' IS' region classification of non-redundant 5'UTRs 21 Figure 6. 0 and U region classification of non-redundant 5 'UTRs 31 Vl LIST OF TABLES Table l(a). Kozak base frequencies 8 Table 1(b ). Initiator and upstream start codons: strength distribution 8 Table 2. Kozak score matrix - human 18 Table 3. Background 5'UTR and CDS base frequencies 20 Table 4( a). Motif length determination - .fm1n values at different word lengths 23 Table 4(b ). Motif length determination - percentage of identifiable motifs 24 Table 5(a). Summary: Analysed data 27 Table 5(b). Summary: Identified significant motifs 27 Table 5(c). Summary: Motif overlap results 27 Table 5(d). Summary: Expected motif overlap results 28 Table 5(e). Summary: Normalized motif overlap results 28 Table 6(a). Summary: Analysed data in the uORF analysis 33 Table 6(b). Summary: Identified significant motifs in the uORF analysis 33 Table 6(c). Summary: Motif overlap results in the uORF analysis 33 Table 6( d). Summary: Expected motif overlap results - uORF analysis 34 Table 6(e). Summary: Normalized motif overlap results - uORF analysis 34 Table 7(a). Identified significant motifs in different partitions - human 41 Table 7(b ). Motif overlap between different partitions - human 44 Table 7(c). Identified significant motifs in the uORF analysis - human 45 Table 7( d). Motif overlap between different partitions in the uORF 50 analysis - human vu Table 8(a). Identified significant motifs in different partitions - mouse 52 Table 8(b ). Motif overlap between different partitions - mouse 54 Table 8(c). Identified significant motifs in the uORF analysis - mouse 55 Table 8(d). Motif overlap between different partitions in the uORF 58 analysis - mouse Table 9(a). Identified significant motifs in different partitions - rat 59 Table 9(b ). Motif overlap between different partitions - rat 59 Table 9(c). Identified significant motifs in the uORF analysis - rat 60 Table 9( d). Motif overlap between different partitions in the uORF 62 analysis - rat Table lO(a). Identified significant motifs in different partitions - drosophila 63 Table 1O(b ). Motif overlap between different partitions - drosophila 65 Table lO(c). Identified significant motifs in the uORF analysis - drosophila 66 Table 10( d). Motif overlap between different partitions in the uORF 71 analysis - drosophila Vlll ACKNOWLEDGEMENTS The last 3 years at Iowa State have been very exciting, edifying and rewarding for me at various levels. I have had the good fortune of learning from and working with some exceptionally accomplished people who have taken an active interest in my own education. I would like to acknowledge the support and contributions of some of them in the following paragraphs. First and foremost, I have to thank my family for encouraging me to pursue a Masters degree to begin with. Their constant love, support and guidance has been instrumental to the success of any endeavor in my life. I would like to thank Dr. Srinivas Aluru for the strong motivation and support he has provided throughout my stay at ISU. When I first started my research, I was looking for an easy enough project that would be sufficient to get me the degree - it was under Srinivas' influence that I decided to do something more worthwhile (not to mention difficult!), and I am glad I did. Dr. Charles Link has consistently encouraged my initiative-taking streak in research, and together with Dr. Nicholas Vahanian, provided me a great professional opportunity in his start-up company. The stimulating atmosphere at NewLink has benefitted me tremendously, especially my understanding of biology. Dr. Philip Dixon has helped me a lot in understanding the original statistical technique that this work is based on, and also suggested the major improvements that are described in the text. His contribution has been vital; the quality of the work would have been seriously compromised without his input. IX I have had numerous helpful discussions with Dr. Teresa Dicolandrea at various stages in the project. Her biological insights have greatly influenced this work. She has also painstakingly read through the whole dissertation and made many useful suggestions about the format and layout of the document. Dr. Won Bin Young has taken an early interest in my research project and has made useful comments from time to time. He had also given me the initial Science paper, which has played a big role in the subsequent work. I would also like to thank Brad Powers who has been a great friend ever since I first arrived at ISU. He has helped me out on in several ways, from first introducing me to New Link to coming to my aid in car lock-out situations; the whole Masters experience was a lot more fun with him. Lastly, I would like to thank Dr. Drena Dobbs, Kathy Weiderin and Dr. Dan Voytas for patiently putting up with my numerous misadventures in the BCB program. x ABSTRACT This study uses a novel approach based on the RESCUE technique (Relative Enhancer and Silencer Classification by Unanimous Enrichment) (Fairbrother et al, 2002) to identify region-specific motifs in the 5'UTR. A highly selective screening procedure is described and implemented, which drastically reduces the false positive rate of identified motifs by the original technique. For increased accuracy, we present the results only for species that have well-curated mRNA data as maintained in the Refseq curated database. The results of these computations suggest that there are motifs in the 5'UTR that act in conjunction with the kozak consensus sequence in the process of translation initiation. Specifically, motifs have been identified in the inter-ATG regions of 5'UTRs with multiple uATGs (upstream ATGs) that may have an effect on translation initiation. Strong and weak kozak sequences have also been associated with mutually exclusive motif sets both upstream and downstream of the true start codon. Finally a number of motifs were identified as being preferentially present in the uORF (upstream Open Reading Frame) regions, which argues against the theory that uORF sequences are random. In general, uORF regions are also found to be strongly selective against motifs associated with strong kozak sequences. In addition to the above-stated results which are applicable across species, motif overlap analysis (ex.motifs that are associated with both strong kozak sequences and the inter-ATG region upstream of the true start codon) also suggests some species-specific translational control mechanisms. The region-specific identification of motifs itelf is probably indicative of higher-order secondary and tertiary structures and interactions. The experimental validation of these results could lead to the discovery of novel primary/secondary motifs and translational control mechanisms encoded in the 5' untranslated regions of different species. 1 CHAPTER 1. INTRODUCTION 1.1 Introduction Gene expression broadly refers to the interpretation of the genetic material by the cell to produce functional and structural units, such as proteins, enzymes, rRNAs, tRNAs etc, that are then be deployed to perform necessary life processes. Controlled regulation of gene expression in every tissue is crucial for the proper functioning of the organism as a whole. Many diseases such as cancer, are caused by a failure of the cellular machinery to adequately regulate the expression of essential genes. Understanding initiation and control of gene expression is therefore of vital importance.