Repetitive Sequences in Complex Genomes: Structure and Evolution
Total Page:16
File Type:pdf, Size:1020Kb
ANRV321-GG08-11 ARI 25 July 2007 18:8 Repetitive Sequences in Complex Genomes: Structure and Evolution Jerzy Jurka, Vladimir V. Kapitonov, Oleksiy Kohany, and Michael V. Jurka Genetic Information Research Institute, Mountain View, California 94043; email: [email protected], [email protected], [email protected], [email protected] Annu. Rev. Genomics Hum. Genet. 2007. 8:241–59 Key Words First published online as a Review in Advance on transposable elements, repetitive DNA, regulation, speciation May 21, 2007. The Annual Review of Genomics and Human Genetics Abstract is online at genom.annualreviews.org Eukaryotic genomes contain vast amounts of repetitive DNA de- This article’s doi: rived from transposable elements (TEs). Large-scale sequencing of 10.1146/annurev.genom.8.080706.092416 these genomes has produced an unprecedented wealth of informa- by Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only. Copyright c 2007 by Annual Reviews. tion about the origin, diversity, and genomic impact of what was All rights reserved once thought to be “junk DNA.” This has also led to the identifica- Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.org 1527-8204/07/0922-0241$20.00 tion of two new classes of DNA transposons, Helitrons and Polintons, as well as several new superfamilies and thousands of new families. TEs are evolutionary precursors of many genes, including RAG1, which plays a role in the vertebrate immune system. They are also the driving force in the evolution of epigenetic regulation and have a long-term impact on genomic stability and evolution. Remnants of TEs appear to be overrepresented in transcription regulatory mod- ules and other regions conserved among distantly related species, which may have implications for our understanding of their impact on speciation. 241 ANRV321-GG08-11 ARI 25 July 2007 18:8 INTRODUCTION bling an “arms race.” Eukaryotic hosts con- tinuously suppress activities of TEs, but TE The term “repetitive sequences” (repeats, proliferation persists in virtually all known eu- DNA repeats, repetitive DNA) refers to ho- karyotic species. Of all eukaryotic genomes mologous DNA fragments that are present sequenced to date, only the genome in multiple copies in the genome. Repeti- Plasmod- appears not to host any active tive DNA was originally discovered based ium falciparum TEs (35). on reassociation kinetics and classified into Why do complex, conservative genomes “highly” and “middle” repetitive sequences tolerate the activities of inherently antago- (14), roughly corresponding to tandem and nistic elements? TEs cannot be easily elim- interspersed repeats discussed below. This re- inated and their endurance in the host can be view is centered primarily on repeat research compared to that of parasites. Furthermore, based on DNA sequence analysis and does not if TEs can provide evolutionary advantages cover the so-called low copy repeats (LCRs), to the host, their chances of survival increase. also known as segmental duplications, which The view that TEs are beneficial to the host represent a separate category of duplicated di- is not new (16, 44, 68, 87) but recent progress verse chromosomal segments (105). in the field puts it squarely at the center of the Repeats can be clustered into distinct ongoing debate on eukaryotic evolution. families each traceable to a single ancestral sequence or a closely related group of ances- tral sequences. In contrast to multigene fami- lies, which are defined based on their biologi- STRUCTURE AND cal role, repetitive families are usually defined SYSTEMATICS OF based on their active ancestors, called mas- TRANSPOSABLE ELEMENTS ter or source genes, and on their generation mechanisms. Over time, individual elements General Characteristics from repetitive families may acquire diverse Figure 1 presents a schematic structure of biological roles. TEs. All types of TEs are represented by There are two basic types of repetitive autonomous and nonautonomous variants. sequences: interspersed repeats and tandem Whereas an autonomous element encodes a repeats. Interspersed repeats are DNA frag- complete set of enzymes characteristic of its ments with an upper size limit of 20–30 kb, family and is self-sufficient in terms of trans- inserted more or less at random into host position, a nonautonomous element trans- DNA. In contrast, tandem repeats represent poses by borrowing the protein machinery arrays of DNA fragments immediately adja- encoded by its autonomous relatives. De- cent to each other in head-to-tail orientation. spite their dazzling diversity, all eukaryotic by Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only. This review focuses on interspersed repetitive TEs fall into two basic types: retrotrans- DNA from eukaryotic genomes. Interspersed posons and DNA transposons. Retrotrans- Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.org repeats are mostly inactive and often incom- posons are transposed through an RNA in- plete copies of transposable elements (TEs) termediate. Their messenger RNA (mRNA) inserted into genomic DNA. TEs are seg- is expressed in the host cell, reverse tran- ments of DNA or RNA capable of being re- scribed, and the resulting complementary produced and inserted in the host genome. At DNA (cDNA) copy is integrated back into the the same time, genomes are essentially con- host genome. Reverse transcription and inte- servative structures that have evolved mech- gration are catalyzed by reverse transcriptase anisms to counteract such insertions. There- (RT) and endonuclease/integrase (EN/INT), fore, TEs and host genomes are locked in a which are encoded by autonomous elements. permanent antagonistic relationship resem- Unlike retrotransposons, DNA transposons 242 Jurka et al. ANRV321-GG08-11 ARI 25 July 2007 18:8 a Non-LTR retro(trans)posons - LINEs and SINEs [TT] [ ]ORF1, ORF2 (EN, RT) (Tail) [ ] Autonomous Int. Pol II promoter [TT] [ ] (Tail) [ ] Non-autonomous [TT] [ ] (Tail) [ ] Non-autonomous Pol III promoter b LTR retrotransposons and retrovirus-like elements LTR gag, pol, env LTR (A)n RNA [ ] LTR gag, pol, env LTR [ ] Autonomous [ ] LTR LTR [ ] Non-autonomous TG AATAAA CA LTR Pol II promoter PolA signal (TATA box) c Cut-and-paste transposons (DNA transposons) TIR TIR [ ] Transposase [ ] Autonomous TIR TIR [ ] [ ] Non-autonomous d Rolling-circle transposons (Helitrons) [A] TC REP, helicase CTTR [T] Autonomous [A][TC CTTR T] Non-autonomous by Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only. Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.org e Self-synthesizing transposons (Polintons) AG CT [ ] INT, PolB, CysP, ATPase + ~5 ORFs [ ] Autonomous AG CT [ ] [ ] Non-autonomous Figure 1 A schematic representation of major classes of transposable elements, including nonautonomous elements. www.annualreviews.org • Repeats in Complex Genomes 243 ANRV321-GG08-11 ARI 25 July 2007 18:8 are transposed by moving their genomic DNA TPRT model, reverse transcription is primed copies from one chromosomal location to an- by the free 3 hydroxyl group at the target other without any RNA intermediate. Most DNA nick introduced by EN (29). The model retrotransposons and DNA transposons are was recently enhanced by the finding that ini- flanked by target site duplications (TSDs) re- tiation of the L1 reverse transcription does not sulting from fill-in repair of staggered nicks require base pairing between the primer and generated at the DNA target site upon inser- template (72). Moreover, as expected from the tion of TEs (42). model, EN is not necessary for L1 retrotrans- All currently known eukaryotic retrotrans- position when free 3-hydroxyl groups be- posons can be divided into four classes: come available in disfunctional telomeres (91). non-long terminal repeat (LTR) retrotrans- Both RT and EN domains in L1 are encoded posons, LTR retrotransposons, Penelope, and by the same ORF. An mRNA expressed dur- DIRS retrotransposons. Although the first two ing transcription of a genomic copy of LINE classes (Figure 1a,b) are relatively well estab- retrotransposon serves as a template for re- lished and studied (29), the Penelope and DIRS verse transcription, and the resulting cDNA classes were only recently introduced (2, 30, is inserted in the genome. 81, 98). Members of all four classes of retro- Based on structural features of non- transposons are present in the genomes of vir- LTR retrotransposons and phylogeny of RTs, tually all eukaryotic kingdoms: Protista, Plan- LINEs can be assigned to five groups, called tae, Fungi, and Animalia. The only exception R2, L1, RTE, I, and Jockey, which can be sub- is Penelope, which, so far, has not been identi- divided into 15 clades (29, 70). It is believed fied in plants. that the R2 group is composed of the most Eukaryotic DNA transposons belong to ancient non-LTR retrotransposons, the CRE, three classes: “cut-and-paste” transposons, NeSL, R2, and R4 clades, which are char- Helitrons, and Polintons (Figure 1c,d,e). The acterized by a single ORF coding for RT corresponding mechanisms of transposition and an EN C terminal to the RT domain. are cut-and-paste (23), rolling-circle replica- The R2 EN is similar to different restric- tive (60), and self-synthesizing (65), respec- tion enzymes, and all TEs from the R2 group tively. The cut-and-paste transposons and retrotranspose into highly specific target sites. Helitrons cannot synthesize their own DNA; Members of the remaining four groups en- instead, they multiply using host replication code the apurinic-apyrimidinic endonuclease machinery. (APE), which is always N terminal to the RT domain. In addition to RT and EN, members of the first group code for RNase H (29), in- Non-Long Terminal Repeat and cluding the Ingi, I, LOA, R1, and Tad1 clades. by Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only. Long Terminal Repeat Nonautonomous non-LTR retrotransposons Retrotransposons are usually referred to as short interspersed Annu.