Comparative Genomic Study of Upstream Open Reading Frames
Total Page:16
File Type:pdf, Size:1020Kb
CHALMERS | GÖTEBORG UNIVERSITY A project report submitted for the award of MSc in Bionformatics Comparative Genomic Study of upstream Open Reading Frames Marija Cvijovic Supervisor: Per Sunnerhagen Examiner: Olle Nerman Comparative Genomic Study of uORFs Marija Cvijovic 2 Table of Contents Preface ..........................................................................................................................5 Acknowledgements.........................................................................................................6 Abstract.........................................................................................................................7 1 INTRODUCTION ............................................................................................................8 1.1 MOTIVATION ................................................................................................................8 1.2 OUTLINE.......................................................................................................................8 2 BACKGROUND...............................................................................................................9 2.1 CELL ............................................................................................................................9 2.2 DNA..........................................................................................................................10 2.2.1 Genes..................................................................................................................11 2.2.2 DNA replication..................................................................................................12 2.3 FROM DNA TO PROTEIN .............................................................................................13 2.3.1 From DNA to RNA (transcription) ......................................................................13 2.3.2 From RNA to protein (translation) ......................................................................14 2.4 GENE EXPRESSION ......................................................................................................15 2.4.1 Posttranscriptional control .................................................................................15 2.5 UNTRANSLATED REGIONS OF MRNA ..........................................................................18 2.5.1 Structural characteristics of untranslated regions ...............................................18 2.5.2 Upstream Open Reading Frames ........................................................................19 2.5.3 Different modes of action of uORFs ....................................................................19 3 DATA ACQUISITION AND CATALOGUE................................................................20 3.1 SACCHAROMYCES CEREVISIE AS MODEL ORGANISM .....................................................20 3.2 SACCHAROMYCES GENOME DATABASE (SGD)............................................................21 3.3 INFORMAX .................................................................................................................21 3.4 MAKING THE CATALOGUE ...........................................................................................22 4 STUDY............................................................................................................................24 4.1 DISTRIBUTION OF UORFS............................................................................................25 4.2 MECHANISM OF UORFS ..............................................................................................25 4.2.1 GCN4..................................................................................................................25 4.2.2 YAP1 and YAP2 ..................................................................................................26 4.3 GENOME STRUCTURE AND EVOLUTION........................................................................27 4.4 GENOME ALIGNMENT AND SYNTENY............................................................................28 4.4.1 Synteny ...............................................................................................................28 4.4.2 Alignment............................................................................................................28 4.5 EXTENDED ANALYSIS .................................................................................................30 5 RESULTS .......................................................................................................................32 6 CONCLUSIONS.............................................................................................................34 6.1 FUTURE WORK ............................................................................................................34 REFERENCES..................................................................................................................35 APPENDIX A ....................................................................................................................37 List of Figures FIGURE 2-1 – DNA STRUCTURE............................................................................................10 FIGURE 2-2 – GENE STRUCTURE ...........................................................................................11 FIGURE 2-3 - MRNA STRUCTURE ..........................................................................................18 FIGURE 3-1 - MAIN PAGE OF THE CATALOGUE .......................................................................22 FIGURE 3-2 - CPA1 GENE PAGE.............................................................................................23 FIGURE 4-1 - DIFFERENT TYPES OF UORF DISTRIBUTION (NOT DRAWN TO SCALE) ..................25 FIGURE 4-2 - EVOLUTIONARY TREE.......................................................................................27 FIGURE 4-3 - SYNTENY VIEWER FOR CLN3 GENE (NEIGHBOURING GENES ARE ALSO REPRESENTED)..............................................................................................................28 FIGURE 4-4 – INFORMAX SCREENSHOT .................................................................................29 FIGURE 4-5 - SCHEMATIC REPRESENTATION OF GCN4 UORFS IN DIFFERENT ORGANISMS .......31 List of Tables TABLE 4-1 - LIST OF STUDIED GENES………………………………………………………….21 TABLE 5-1 – RESULTS...………………………………… ……………………………………30 Comparative Genomic Study of uORFs Marija Cvijovic Preface This report is Master Thesis in Bioinformatics and it is a part of International Masters Program at Chalmers and Göteborg University. The research project has been carried out in Lundberg Laboratory, Göteborg within the department of Cell and Molecular Biology. Examiner: Olle Nerman, Department of Mathematical Statistics, Chalmers University of Technology, Göteborg, Sweden Supervisor: Per Sunnerhagen, Department of Cell and Molecular Biology, Göteborg University, Göteborg, Sweden 5 Comparative Genomic Study of uORFs Marija Cvijovic Acknowledgements 1 12 1 (1) TACKSAMYCKET human (1) ---------PER Consensus (1) E I would like to thank to my supervisor Per Sunnerhagen for invaluable help and patience for a mathematician that came into biology playground. Also, many thanks to my examiner Olle Nerman for giving me all those ideas and suggestions. Special thanks goes to all people gathered around Bioinformatics programme at Chalmers and Göteborg University for enjoyable and unforgettable time I had in Sweden. Marija Cvijovic December 2004, Göteborg 6 Comparative Genomic Study of uORFs Marija Cvijovic Abstract The untranslated regions of mRNA molecules are involved in several post- transcriptional regulatory pathways. The 5’UTR is the sequence between the 5’ terminal cap structure and the initiation codon for protein synthesis. The 5’ end (the leader) can accurately regulate the amount of protein synthesised from a specific mRNA. In a fraction of mRNAs, upstream open reading frames (uORFs) are present. A general method for detecting uORFs with a regulatory role is of great importance. The ribosome can recognise the AUG of a uORF as an initiation codon, translate the downstream sequence into protein and terminate before the main ORF is translated. Although the complete genome of the yeast Saccharomyces cerevisiae has been sequenced, the number of mRNAs containing uORFs is not known. It is predicted that in S.cerevisiae 200 genes (3 %) have uORFs. The facts that mRNA start sites are not known, and that some genes have more than one promoter, constitute major problems in the identification of real uORFs. In this study, a collection of 18 genes with known uORFs was examined. It has been observed that some genes have a very long leader (CLN3 – 864nt), while some are extremely short (DCD1 – 33nt). Also, a longer UTR doesn’t necessarily correlate to a longer uORF or higher uORF frequency (CLN3 – UTR 864nt, one uORF – 4 codons long). A comparative analysis has been done for the yeast Saccharomyces cerevisiae and six related species (S. paradoxus, S. mikatae, S. bayanus, S. castellii, S. kluyveri, S. kudriavzevii). We aligned sequences (1000 nt upstream of the initiation codon) using a pairwise sequence alignment method, and then labelled all identified uORFs. It has been observed that uORFs in some genes, where a functional role for the uORFs had been established by experiment, show a striking conservation. Upstream ORFs from other genes, where no indication of such regulation was available, displayed a lower