CHAPTER 1
NON-CODING RNAs
Alexander Donatha, Sven Findeißa, Jana Hertela, Manja Marza, Wolfgang Ottoa, Christine Schulzc, Peter F Stadlera,b,c,d, and Stefan Wirtha aBioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, H¨artelstrasse 16-18, D-01407, Leipzig, Germany bInstitute for Theoretical Chemistry, University of Vienna, W¨ahringerstraße 17, A-1090 Wien, Austria cFraunhofer Institute for Cell Therapy und Immunology, Perlickstrasse 1, D-04103 Leipzig, Germany d Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
1.1 INTRODUCTION
The advent of high-throughput techniques that allow comprehensive unbiased studies of transcription has lead to a dramatic change in our understanding of genome organization. A decade ago, the genome was seen as a linear arrangement of separated individual genes which are predominantly protein-coding, with a small set of ancient non-coding “house- keeping” RNAs such as tRNA and rRNA dating all the way back to an RNA-World. How- ever, in contrast to this simple views more recent studies reveal a much more complex genomic picture. The ENCODE Pilot Project [239], the mouse cDNA project FANTOM [151], and a series of other large scale transcriptome studies, e.g. [202], leave no doubt that the mammalian transcriptome is characterized by a complex mosaic of overlapping, bi-directional transcripts and a plethora of non-protein coding transcripts arising from the same locus, Fig. 1.1. This newly discovered complexity is not unique to mammals. Similar high-throughput studies in invertebrate animals [152, 93] and plants [135] demonstrate the generality of the mammalian genome organization among higher eukaryotes. Even the yeasts Saccha- romyces cerevisiae and Saccharomyces pombe, whose genomes have been considered to be well understood, are surprising us with a much richer repertoire of transcripts than previ-
(Title, Edition). By (Author) 1 Copyright c 2008 John Wiley & Sons, Inc. 2 NON-CODING RNA
highly transcribed regions
mosaic transcript
non−coding transcript protein coding mRNA