Master's thesis Tracking the evolution of mammalian wide interspersed repeats across the mammalian tree of life. Author Jakob Friedrich Strauß First marker Prof. Dr. Joachim Kurtz Second marker Prof. Dr. Wojciech Maka lowski November 2, 2010 Tracking the evolution of mammalian wide interspersed repeats across the mammalian tree of life. Jakob Friedrich Strauß Institute of Bioinformatics WWU M¨unster Master's thesis November 2, 2010 Abstract This work focuses on the detection of mammalian wide interspersed repeats (MIRs) within the mammalian tree of life. MIRs are retroposons that belong to the class of SINEs (short interspersed repeats). The amplification of the ancient MIR is estimated 130 ma years ago, just before the mammalian radiation and has soon become inactive. The main idea of this project is to cross detect MIR elements in orthologous loci between fully sequenced mammalian genomes. While a MIR sequence in an extant species may still have a strong resemblance to the ancient MIR element, orthologous sequences in other mammalian species may have diverged beyond recognition, e.g. in mammals with a high substitution rate as rodents. But even those strongly diverged MIR elements may be detectable when the orthologous loci of a closely related species contains a less diverged element. We analyze in which gene-families MIR elements have spread and distinguish between MIR occurrence within UTRs, introns and exons. Thus we hope to illuminate the contri- bution of MIR elements to mammalian evolution. Contents 1 Background1 1.1 Transposable Elements.............................1 1.2 Genomic impact of transposable elements...................3 1.2.1 Long interspersed nuclear elements (LINEs).............4 1.2.2 Short interspersed nuclear elements (SINEs).............4 1.3 Mammalian wide interspersed repeats (MIRs)................5 1.3.1 MIR elements and the tree of life...................7 1.4 Detection of transposable elements.......................8 1.5 Goals of this study............................... 11 i JF Strauß IOB, WWU M¨unster 2 Material and Methods 13 2.1 Cross species MIR identification in genome / genome alignments..... 13 2.2 MIR associated annotation features...................... 15 2.3 MIR sequence site heterogeneity....................... 15 2.4 Genomes..................................... 16 2.5 Software..................................... 17 2.6 Databases.................................... 18 3 Results and Discussion 21 3.1 Cross species MIR identification in genome / genome alignments..... 21 3.2 Species specific repeatmasker library..................... 23 3.2.1 MIR seed alignments.......................... 23 3.2.2 MIR site heterogeneity......................... 24 3.3 MIR associated annotational features..................... 28 3.4 MIR elements in the lizard and the bird genomes.............. 29 3.5 Building profiles for MIR sequences...................... 30 4 Conclusion 32 5 Outlook 32 References 32 6 Supplementary material 38 List of Figures 1 Multicolored corn cobs.............................1 2 Classification and hypothetical phylogeny of TEs...............3 3 Schema of a long interspersed nuclear element................4 4 Schema of a short interspersed nuclear element................5 5 Schema of the MIR element...........................6 6 MIR alignment with Gln-tRNA and L2a tail.................6 7 Phylogeny of Sauria SINEs, mammalian MIR and bird MIRs........8 8 TE detection tools................................ 10 9 Concept of genome genome comparison.................... 14 10 Approach of MIR reannotation......................... 15 11 Phylogeny of analysed species......................... 17 12 RepeatMasker: Example MIR alignment output............... 18 13 Phylogeny: MIR and MIR like sequences in RepBase............ 19 14 Reannotation of MIRs.............................. 23 15 Length histogram MIRs in human and mouse................. 24 16 MIR heterogeneity plot: baboon........................ 25 17 MIR heterogeneity plot: human........................ 25 18 MIR consensus plot: baboon.......................... 25 ii JF Strauß IOB, WWU M¨unster 19 MIR consensus plot: human.......................... 26 20 Sequence site plot difference.......................... 26 21 Sequence site heterogeneity plots A...................... 27 22 Sequence site heterogeneity plots B...................... 28 List of Tables 1 List of genome versions............................. 16 2 RepBase Update: Mammals.......................... 20 3 RepBase Update: Total............................. 20 4 RepeatMasker annotation of MIR elements in the chosen mammals.... 21 5 BlastZ alignment blocks in the chosen mammals............... 21 6 New RM annotation of unannotated genomes with the small MIR library. 22 7 MIR position overlap with selected genomic features............. 29 iii JF Strauß IOB, WWU M¨unster 1 Background 1.1 Transposable Elements Transposable elements (TEs) are stretches of genomic DNA that are able transpose. Transposition hereby means a change in location and can, but does not necessarily need to include an increase in copy number. TEs were first discovered and described by Bar- bara McClintock. In 1944 she started researching a position on the maize chromosome 9 that she called dissociator, as it was prone to break the chromosome at this position. In 1948 she found that the dissociator could actually change its position and induce stable mutations knocking out pigmentation genes resulting in multicolored maize cobs. This lead to the first publication about transposons at all in 1950: The origin and behavior of mutable loci in maize[1]. For a long time the paradigm for TEs was that they are selfish and parasitic, littering the genome with copies of themselves [2,3]. This paradigm changed over time as more and more functions and roles could be assigned for TEs, such as large scale mutations, gene regulation, gene duplication, pseudogene creation, exon cre- ation and exon shuffling (see paragraph 1.2). The mechanism by which TEs transpose is used to classify them. Their features on the sequence level can be used to further classify them into subclasses. As of now three main classes of TEs are described: Class I, Class II and Class III elements. Class III elements are are not a real class, and are most of the time referred to as unclassified TEs. Figure 1: This picture shows multicolored corn cobs. The multicoloring is the result of transposable elements inserting into pigmentation genes. (Source: Wikimedia:Asbestos/GFDL) Class I elements are called DNA transposons. DNA transposons transpose by a cut and paste mechanism. They get cut out of their position and are reintroduced at another position. Typical for DNA transposons are the inverted repeats (ITRs), short repeats that 1 JF Strauß IOB, WWU M¨unster are flanking the transposon. The free intermediate of a DNA transposons is DNA, hence the name. Increase in copy number happens slowly, as only certain events, like segmental duplication, duplication of the free DNA state and cellular repair mechanisms can cause this. Usually DNA transposons encode for proteins such as the transposase used in the process of transposition, cutting out and reinserting the TE. Class II elements are called retrotransposons. In general the insertion of a TE can be unspecific, random, within limitations of course as not being lethal, but as for LTR retrotransposons it can also show a distinct target site preference [4]. Retrotransposons are the most versatile class of TEs. All retrotransposons have in common to transpose by a copy and paste mechanism. They get transcribed to RNA and reverse transcribed into DNA. With this mechanism, each transpositon results in the gain of a new copy. Retrotransposons can be divided into long terminal repeat (LTR) retrotransposons and non LTR retrotransposons. LTR retrotransposons usually encode for many proteins such as gag, the reverse transcriptase, RH and integrase. The LTRs of these transposons are direct repeats, opposed to inverted repeats of DNA transposons, and emerge as an artifact of the reverse transcription. The subclass of non LTR retrotransposons consists of long interspersed nuclear ele- ments (LINEs) and short nuclear interspersed elements (SINEs). While LINEs are au- tonomous retrotransposons, they encode for the proteins needed for their transposition, SINEs are LINE dependent in the way, that they recruit the LINE transposition machin- ery. Figure2 shows how class I, class II TEs and MITEs could have evolved. This fig- ure suggests a phylogeny, that could explain how retroviruses have evolved from LTR retrotransposons. Class III, unclassified, transposons are called MITEs. Their mechanism for transpo- sition is at the time unknown. In figure2 they are shown as descendants from DNA transposons and don't have their own class yet. In the human genome about 45% of the genome sequence is reported to be composed of TEs from which two-third (30% in total) are LINEs and SINEs [5]. As a lot of very old TE sequences have probably diverged beyond recognition, it it likely that much more of the human genome are derived from TEs. 2 NATURE|Vol 443|5 October 2006 NEWS & VIEWS FEATURE gene expression can also be felt through effects is reinforced by the observation that early nutri- result from the insertion of SINEs and LINEs involving TEs, because these elements can con- tion can influence the expression of various near or within genes. The human genome trol genes epigenetically when inserted within genes, including TEs, at critical developmental has
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages42 Page
-
File Size-