657

Interspersed repeats and other mementos of transposable elements in mammalian Arian FA Smit

The bulk of the human is ultimately derived from Among the new additions are a series of consensus transposable elements. Observations in the past year lead to sequences for autonomous members of the MER1 family some new and surprising ideas on functions and of mammalian DNA transposons [4]. These nine elements consequences of these elements and their remnants in our (Charlie1–8 [4], Cheshire [5]) and a tenth, more distantly genome. The many new examples of human derived related repeat (MER69) encode proteins related to the from single transposon insertions highlight the large hobo-Activator-Tam (hAT) transposases. Besides these contribution of selfish DNA to genomic . and the Tc1–family (5 Tigger and 2 mariner) fossils [4], a third, small group in our DNA is characterized by TTAA- Addresses flanking sites. One member (Looper [5]) encodes a protein Axys Pharmaceuticals, Inc., 11099 North Torrey Pines Road, that is 50% similar to the putative transposase of the Suite 160, La Jolla, California 92037-1029, USA; TTAA-flanked piggyBac transposon in moths. Most recon- e-mail: [email protected] structed DNA transposons belong to the Charlie or Tigger Current Opinion in & Development 1999, 9:657–663 group which are so far mammalian-specific. This is some- 0959-437X/99/$ — see front matter © 1999 Elsevier Science Ltd. what surprising, as the lack of intermediates between any All rights reserved. related elements suggests that they all represent indepen- dent (lateral) introductions into the genome, whereas most Abbreviations ERV endogenous types of eukaryotic DNA transposons may be able to trans- hAT hobo-Activator-Tam pose in mammalian cells [6•]. LINE long interspersed nuclear element LTR Almost all long terminal repeat (LTR) sequences in Rep- MIR mammalian-wide ORF Base (211 in release 3.06) now have been linked to the SINE short interspersed nuclear element three classes of mammalian endogenous TIR terminal (ERVs) [4] (see [7,8] for classification); at least 2.3% of our UTR untranslated region autosomal DNA is derived from class I ERVs and non- autonomous associates (e.g. the MER4 group), 0.3% from Introduction class II elements, and 4.6% from foamy -like (ERV-L) elements. Three-quarters of the last fraction is taken by In May 1999, 10% of the had been the non-autonomous MaLR elements; their relationship to sequenced, providing a suitable occasion to update an ear- ERV-L [1] is now sealed by the similarity of the single lier survey of interspersed repeats in our genome [1] in the product of older MaLRs to the ERV-L gag protein [4]. On first part of this review. Recently characterized elements have spread some more light on the origin of these the Y , class I and II LTR elements are 2.5 repeats. The characteristics of most but not all repeats fit times more frequent than on the autosomal and X chromo- the model that they are remnants of selfish DNA. In par- somes and outnumber the ERV-L elements which occur ticular, indications of a possibly symbiotic relationship of with normal copy numbers (AFA Smit, unpublished data). SINEs with mammalian genomes are reviewed. Com- pared with those of mouse and other model , Non-LTR or LINEs are divided in 11 human transposons are remarkably subdued, leading to clades on the basis of protein phylogeny, like the ERV • speculations on host defense mechanisms. The final sec- classification [9 ]. Besides L1 (L1 clade), L2 (Jockey tion of this review discusses new findings on the many clade) and the ruminant specific BovB or BDDF (RTE ways in which transposable elements can contribute to clade), two relatively (L3 and CR1_Hs) genomic evolution. representing the CR1 clade have now been identified in human DNA [5]. L1 has two open reading frames (ORFs), Updated survey of human interspersed repeats the first encoding a protein that specifically binds the sec- The observed density of the main classes of repeats in a ond in the transcript [10]. A new entry in RepBase, HAL1 total of 327 Mb of human genomic DNA, pigeon-holed by [4], contains a single ORF that is very similar to ORF1 of the GC richness of 50 kb fragments is shown in Table 1. ancient L1s, followed by an untranslated region and a The average fraction of euchromatic DNA identifiably poly(A) tail. As the ~20,000 HAL1 copies are at least as derived from transposable elements is over 42%, an diverged (old) as the oldest known L1 subfamilies, the increase from the 35% determined in 1996 due to improve- ‘modern’ L1 element may be a fusion product of HAL1 ments in the detection method [2] and consensus with an existing LINE, which would explain why the L1 sequences, and to additions of human repeats in the Rep- ORF1 product is unrelated to products of other non-LTR Base database [3••]. retrotransposons [9•]. 658 Genomes and evolution

Table 1

Interspersed repeat density in different isochores of the human and mouse genome.

GC level <36% 36–38% 38–40% 40–42% 42–44% 44–46% 46–50% 50–54% >54%

Human* A 23 Mb 41 Mb 44 Mb 39 Mb 33 Mb 27 Mb 34 Mb 20 Mb 11 Mb (bp analyzed) X 6 Mb 12 Mb 13 Mb 9 Mb 4 Mb 2 Mb 2 Mb 0.4 Mb 0.7 Mb Alu A 5.0% 6.7% 8.6% 11.4% 13.9% 16.6% 19.2% 22.2% 17.7% X 4.0% 4.5% 5.6% 9.7% 12.8% 15.6% 16.2% 22.2% 13.6% LINE1 A 20.0% 19.6% 16.9% 14.0% 11.3% 9.9% 7.4% 4.8% 3.0% X 35.3% 33.2% 28.4% 18.6% 15.0% 9.0% 9.0% 6.1% 5.3% MIR A 1.5% 1.8% 2.1% 2.3% 2.3% 2.9% 3.3% 3.3% 2.4% X 1.5% 1.8% 2.3% 2.4% 2.0% 2.1% 2.2% 2.7% 1.2% LINE2 A 3.1% 3.5% 3.7% 3.6% 3.1% 3.7% 3.6% 3.4% 2.7% X 2.7% 3.3% 3.4% 3.9% 3.0% 4.2% 3.9% 2.9% 2.3% LTR elements A 6.8% 7.7% 7.8% 8.6% 8.9% 8.5% 6.5% 4.7% 2.9% X 8.8% 10.7% 9.8% 9.2% 9.4% 10.5% 7.7% 2.7% 2.8% DNA transposon A 2.9% 3.0% 3.0% 3.1% 3.0% 2.8% 2.5% 1.8% 1.2% X 2.7% 2.7% 2.7% 3.3% 2.8% 3.0% 2.1% 1.1% 1.0% Total A 39.4% 42.3% 42.1% 43.0% 42.7% 44.4% 43% 41% 31% X 55.1% 56.2% 52.2% 47.1% 45.0% 44.3% 41% 38% 26% Mouse <0.2Mb <0.2Mb 1.2 Mb 1.9 Mb 0.9 Mb 1.5 Mb 3.0 Mb 1.6 Mb <0.2Mb B1 – – 1.2% 1.6% 2.7% 4.4% 6.4% 6.9% – Other SINEs – – 2.2% 3.0% 5.7% 7.4% 8.0% 9.6% – LINE1 – – 30.4% 25.4% 14.0% 7.7% 3.4% 1.4% – LTRs – – 7.9% 11.0% 12.7% 10.6% 9.9% 3.6% – Total – – 43.0% 42.2% 36.6% 31.6% 28.9% 20.3% –

Analysis was performed with RepeatMasker (version 050599 default elements, the loci retain the same GC level (the differences in GC settings, RepBase version 3.06) on 327 Mb of non-redundant human level mostly reflect distinct neutral substitution biases). The total genomic DNA downloaded from http://www.ncbi.nlm.nih.gov/genome/seq interspersed repeat density is between 40 and 44% in all but the in June 1999 and 10 Mb of >40 kb long mouse genomic DNA entries highest GC level isochores. In the latter all repeat classes are under- in GenBank Release 112. *For the human genome, the data is split in represented, presumably because of high density and the total autosomal (A) and X chromosomal (X) DNA. The GC level and repeat repeat density would only be 26%, if it were not for a large contribution density of longer database entries were calculated and tallied for of repeat rich, GC-rich DNA from chromosome 19. The increased 50-kb non-overlapping windows. The opposite distribution patterns of density of LTR elements on X is mostly due to a larger ratio of LINE1 and SINEs (Alu, MIR, B1 and tRNA-derived SINEs in mouse) complete elements over solitary LTRs (a ratio even higher on Y). are discussed in the text. Note that after elimination of the Alu and L1

Selfish or symbiotic elements? DNA and having high levels in somatic cells Selfish elements that are long-term denizens of a genome under stress. Schmid [12••] put forth a compelling theory should have been under selection for maximal propagation that can explain both observations. One argument is that with minimal harm to the host. Thus, one expects their the preference for GC-rich DNA may be linked to the transpositional activity to be limited to the germline, and striking hypomethylation of Alus in the male germline any target-site preference to be for relatively inert regions (suggesting a function in imprinting or sperm chromatin of the genome. assembly), but this can not be extrapolated to other SINEs, because most of these have very few or no CpGs. Indeed, L1 transcription is more or less germline-specific, However, Chu et al. [13•] have shown that increased tran- and L1 copies have accumulated in the AT-rich regions of scription of Alu, unlike 7 SL and other functional RNA, our genome (Table 1), which are thinly populated with saturates and inhibits the double-stranded RNA-induced low-expression genes. The target specificity of L1 for kinase PKR, and, probably through this, induces protein TTTTAA [11] may partially have evolved for this reason . The optimal gene to quickly and transiently and could by itself be responsible for the rather linear rela- suppress PKR inhibition of translation upon stress may tionship of L1 density and GC richness. However, also be a short RNA of unspecific sequence, but one with sta- contributing is a gradient in the average L1 insert length, ble secondary structure, freedom to bind PKR as an which is 2–2.5 times richer in AT than the GC-rich iso- inactive monomer, and the potential for instant expression chores on all (AFA Smit, unpublished data). in large quantity. SINEs that are easily inducible fit this L1 is also much more abundant on both sex chromosomes description very well, and their location in open chromatin, (Table 1), possibly reflecting preferred integration in prob- which tends to correlate with GC-rich DNA, is therefore ably inert and less recombinogenic heterochromatic DNA advantageous. Note that, in AT-rich DNA, Alu and MIR (one X is heterochromatic for a period in both germlines). also are over-represented in transcribed DNA (Table 2). The hypothesis that SINEs could be stress genes is sup- All the SINEs shown in Table 1 behave counter-intuitive- ported by the widespread appearance of the above ly, however, by accumulating within GC-rich (gene-rich) phenomena; in mammals MIRs have been substituted by Transposable elements in mammalian genomes Smit 659

Table 2 functionality, as conservation of the expressed elements appears limited to the env genes. Such exaptation of ERV Number of interspersed repeats per Mb of intronic or intergenic genes has a precedent in mouse, in which the retrovirus DNA in both orientations. resistance gene Fv1 is derived from an ERV-L Alu MIR L1 L2 LTR DNA gag gene [23].

< 44% GC* Approaching the issue of repeat functionality from another Intergenic + 141 40 58 36 71 47 •• – 134 39 62 35 70 52 direction, Lyon [24 ] proposed that an, as yet unknown, Intronic + 192 60 34 35 13 51 X-chromosome-specific interspersed repeat may promote – 217 62 56 34 67 61 the spread of Xist mRNA, the agent of X-chromosome >44% GC inactivation. This theory is supported by observations that Intergenic + 360 51 24 44 50 38 – 355 51 28 40 61 39 Xist does not spread to autosomal translocations on X [25]. Intronic + 352 68 14 28 11 41 It is less likely that the booster element is L1, which has – 378 70 34 44 41 42 been proposed for its over-representation on X [24••], because Xist mRNA preferentially localizes on GC-rich *The numbers are derived from 3.7 Mb of human introns extracted from GenBank112 entry annotations and 9.2 Mb of ‘intergenic regions’ (L1-poor) R bands [25]. (the regions in these entries not annotated as introns or coding regions). The many unannotated genes may make existing differences Activity of transposable elements over time between introns and intergenic regions less obvious. Sequences were The age distribution of the major interspersed repeats in sorted into 10 GC level groups, and the tabulated numbers are an average of the numbers observed in these isochores to avoid our genome is shown in Figure 1. L2 and its tag-along distortions by the generally higher gene (intron) densities of more SINE MIR appear dominant in ancient times but are slow- GC-rich entries. Only L1 copies with complete 3′ ends were counted. ly replaced by the L1–Alu pair. Also conspicuous is the In AT-rich regions, Alu and MIR are 1.5 times more common in introns reduced accumulation of all repeats except L1, especially than in non-transcribed DNA, and intronic L1 and LTR elements are 2 and 5 times under-represented, respectively, in the orientation of the in the last 30-million-year period. So far, the only elements • gene. The latter bias is probably due to the transcriptional termination associated with insertional are Alu [26], L1 [27 ] site within these elements (absent from SINEs and weak or absent in and the composite repeat SVA (SINE-R) [28•,29•]. The DNA transposons), which could debilitate genes when introduced in best candidate for an active retrovirus is HERV-K, which the ‘wrong’ orientation. Unless these diverse elements can sense the may have several hundred human-specific copies [30•], orientation of a gene during integration (the LINE1 insertion preference TTTTAA is not under-represented in either orientation in introns), the some encoding functional [31]. The bias should result from purifying selection. Outstanding numbers in most prolific LTR elements (the ERV-L class) ceased bold and italicized. activity 40–60 million years ago [1,8]. There is no evidence of currently active human DNA transposons; the copies of sometimes unrelated, order-specific SINEs, but all show a the youngest known elements (MER85) are diverged by bias for GC-rich DNA (Table 1 [14]), and cell stress has 6%, predating the monkey–human split. now been shown to increase SINE transcription in insects as well [15]. The theory is further strengthened by an The quiescence in our genome contrasts with the commo- ancient correlation between tRNA binding and translation tion of SINEs, LINEs and LTR elements in mouse [27•]. inhibition by the PKR-like yeast GCN2 protein [12••]. For example, it is estimated that some 3,000 mouse but only 40 human L1 elements are active, leading to 2.5% and The mechanism for accumulation in GC-rich DNA is enig- 0.07% of the observed spontaneous mutations in mouse matic, as Alus use the L1 machinery and prefer integrating and human, respectively [32]. Transposon activity in in the same AT-rich target site [16]. Arcot et al. [17•] have mouse is dwarfed, in turn, by that in other organisms such reported that polymorphic Alus show a random distribution as Drosophila, in which over half of observed mutations are over chromosome 11, which is unrelated to GC richness. transposon related, and maize, whose genome is estimated Indeed, in the 327 Mb analyzed here, the 1400 Alus with to have doubled over the last 3 million years by retrotrans- fewer than 4.5% substitutions have a completely flat distri- poson insertions [33••]. bution over the isochores, whereas Alus diverged by 5–7% have the familiar bias (AFA Smit, unpublished data). This Does this mean that the human lineage has had a particu- may suggest that Alus integrate randomly and then are larly efficient ‘host defense’ mechanism against fixed preferentially in GC-rich DNA, but most of the 1400 intragenomic parasites? Two general mechanisms seem to young Alus are probably already fixed. be used by host cells to subdue repeated sequences, which are a tell-tale feature of transposon activity. A post-transcrip- The spread of ERVs in mammalian genomes is sometimes tional mechanism, which operates in I element suppression argued to be a symbiotic event, often on the basis of the in Drosophila [34] for example, appears to involve RNA specific expression of ERVs in the placenta (reviewed in interactions and subsequent degradation; whereas transcrip- [18•]). However, the best recorded cases (HERV-3 [19,20•] tional suppression is based on DNA interactions and often and HERV-W [21,22•]) may represent individual env genes involves methylation of the repeats. The idea that the pri- recently co-opted (exapted) by the genome for their mary function of DNA methylation is in host defense is 660 Genomes and evolution

Figure 1

Activity of major repeats over time in the 9% human genome. As the vast majority of copies accumulate mutations in a neutral fashion, 8% their divergence to their consensus, or approximately original, sequence corresponds 7% roughly to the time elapsed since integration. 6% Alu For reference, 16–18% of unconstrained LINE1 nucleotides have been substituted since the 5% split of primates from other mammalian orders. MIR Substitution levels of repeats were calculated 4% LINE2 (with the Jukes–Cantor equation) from LTR divergence levels reported in the Table 1 3% RepeatMasker analysis output. Because of DNA the high CpG content in Alu consensus 2% sequences, Alu substitution levels were Fraction of the human genome calculated separately for non-CpG positions 1% only. CpG levels of other repeats generally are too low to make a difference in this graph, but 0% may cause a slight distortion. < 5 5–10 10–1515–20 20–25 25–30 30–35 > 35 Substitution level (%) Current Opinion in Genetics & Development

hotly contested [35,36•,37•]. It was rekindled by an estimate Moreover, 3′ flanking regions can contain promoters, that 90% of human non-CpG island (i.e. mostly methylated) enhancers or exons, so that dispersal of such L1 copies can CpGs occur in interspersed repeats [35]; in the current lead to the alteration of expression of existing genes or the analysis of 327 Mb, 28% of such CpGs are found in Alus creation of new genes by exon shuffling, as dramatically (12% of genome), but only 22% are in other interspersed shown by Moran et al. [43••]. repeats (32% of genome). This under-representation of CpGs is not surprising, because most repeats are old and not Domestication of individual transposable under selective pressure, whereas CpGs mutate fast. How- elements ever, methylation clearly can suppress transposon activity in Throughout time, host genomes have rummaged through mouse [38•]. Also, concatemeric transgenes become methy- the novel sequences accumulated by transposition and have lated and transcriptionally suppressed depending on copy recruited numerous elements. Many examples exist of number [39•], although it is not yet clear whether methyla- transposable-element copies that now contribute transcrip- tion can be directed against interspersed copies as well, as tionally regulatory elements or other small fragments to observed in and fungi [40]. Methylation may also cellular genes. More interesting is that quite a few cellular inhibit mitotic recombination between repetitive sequences genes, such as the HERV-derived genes mentioned above, in mouse [41]. Finally, undermethylation has been correlat- encompass complete genes. Human ed with transposon activity in marsupial hybrids [42•]. examples, some with household names, are listed in Table 3.

Benefits of weak transcriptional termination BC200, the only known fully recruited SINE in human, is sites a -specific RNA that is part of a ribonucleoprotein Insertion into coding regions, deregulation of transcription complex preferentially located in the dendrites of all high- by internal elements, and recombination er primates. It is presumably derived about 50 million between interspersed copies are usually considered among years ago from a monomeric Alu and has since been selec- the ill effects of transposition. However, more prominent tively conserved in all studied descendants [44]. may have been premature transcriptional termination by insertion of L1 and especially LTR elements in introns, Telomerase is a reverse transcriptase that is related to which form a comparatively huge target. The biased orien- those of non-LTR retrotransposons. It is debatable tation and under-representation of L1 and LTRs in introns whether retrotransposons gave rise to telomerases or vice probably results from selection against these events versa (telomerases are found in all ) [45], but a (Table 2). For this reason, L1 may have evolved a weak ter- large phylogenetic study of non-LTR elements puts the mination site so that 3′ flanking regions are often included origin of non-LTR elements before the origin of eukary- in current L1 transcripts [43••]. Because of 5′ truncation, otes and telomerases [9•]. copies of such elements may not even contain an L1 sequence, and considerably more than 15% (Table 1) of A spectacular example of exaptation appears to be at the ori- our genome may be derived directly from L1 transposition. gin of antigen-specific immunity. The VDJ recombination Transposable elements in mammalian genomes Smit 661

Table 3 mapped to the Childhood Absence Epilepsy linkage region in human, fits phylogenetically among the mammalian Tig- A list of human genes probably derived from transposable ger transposases. elements.

Gene Protein ID Related transposon* Ref. At least seven of the proteins in Table 3 are chimeras with a C-terminal transposase-derived domain. These probably •• BC200 3150436# FLAM Alu [43 ] originated from fortuitous in-frame splicing of a cellular Telomerase 2330017 non-LTR † [44] HERV-3 env 1196425 HERV-R [19] gene into the coding region of a DNA transposon insertion HERV-7q env 4773880 HERV-W [21,22•] (an exotic type of exon shuffling). The NIB1543 cDNA RAG1-2 131827 Tc1-like † [45,46••] encodes a long chimaeric protein with a transposase CENP-B 29863 Pogo (Tc1) [55] domain that is only 2.4% diverged at the DNA level from Jerky 3435201/3 Tigger (Tc1) † [48•] JRKL 4504807 Tigger (Tc1) † the reconstructed Hsmar1 mariner [51]. This conservation KIAA0461 3413884 Fot1 (Tc1) †§ suggests that this domain may still recognize the terminal NIB1543 2231380 Hsmar1 mariner (Tc1) †‡ [49] inverted repeats (TIRs) of Hsmar1 copies, which are espe- †‡ § Buster1 AF205601 Charlie (hAT) cially abundant as paired TIRs (named Made1 in Buster2 AF205599 Charlie (hAT) § • Buster3 AF205598 Charlie (hAT) § RepBase) in our genome. Both Tramp [52 ] and KIAA0543 KIAA0766 3882253 Charlie (hAT) †§ appear to be fusion products of zinc-finger domain proteins AAD15420.1 4263748 Charlie8/MER102 (hAT) †‡ § and hAT transposases. pP52rIPK 4758872 MER69 (hAT) § KIAA0543 3043610 hAT ‡§ KIAA0637 3327087 hAT ‡ [4] We have identified three genes derived from Charlie-like Tramp 4160548 Tol2 (hAT) ‡ [50] DNA transposons in the EST databases. One of these, Buster1, encodes a protein of similar length and 70% iden- *Names are in italics when the specific ancestral element is known. †Proteins that appear to have retained the residues essential in tity to Charlie1 transposase and contains all the residues transposition. ‡Proteins formed by fusion of a cellular and transposon that are strictly conserved between hAT transposases. gene. §Relationship with transposons not reported before. #GenBank However, Buster1 is not an active DNA transposon but nucleotide ID. another fusion gene; the gene is not flanked by TIRs, is expressed in both human and mouse, and the 5′ 300 bp of the mRNA form a separate exon, matching 80% of the 5′ system shares two main features with DNA transposons: a UTR and first two codons of the cDNA of eIF4G2 80%. recombinase (encoded by the tightly linked RAG1 and eIF4G2 (or DAP-5) is a translational repressor and a puta- RAG2 genes) and mobile DNA bracketed by recombinase- tive modulator of interferon-γ-induced apoptosis [53,54]. binding sites (the signal sequences), which led to The conservation of the 5′ UTR of the gene, including a speculations that our immune system may have originated GUG initiation codon, suggests that Buster1 may have from a transposon that had inserted into the genome of an ancestor of the jawed vertebrates. Similarities in chemistry, conserved the expression pattern of eIF4G2. Interesting- rIPK binding site and protein sequences between VDJ recombi- ly, another hAT-transposase-derived gene, p52 , is also nation and Tc1-like transposition have bolstered this idea. connected to the interferon response, by indirectly This year, this hypothesis got major support from the upregulating the activity of the translational inhibitor observation that RAG proteins can mediate a complete PKR [55] mentioned above in connection with SINE transposition event in vitro, in other words, excision of an activity. It is tempting to see a trend here, in which element bound by signal sequences and reintegration into intragenomic parasites have been recruited in antipara- a new target [46••,47••]. sitic defense mechanisms such as the immune system (RAG1/2 and ERV genes) and the interferon response The major centromere-binding protein CENP-B is an (SINEs, Buster1, p52rIPK). ancient descendant of a pogo-like DNA transposase (a cen- tromere binding CENP-B homolog exists in fission yeast). Conclusions Csink and Henikoff [48•] proposed that centromeres are The recent developments reviewed here makes one defined principally by their late replication in S phase, and believe that it is hard to overestimate the influence of that repeats guarantee exclusion of replication ori- transposable elements on genome evolution. Far from gins. The expanded satellites would be required to merely expanding genomes with interspersed repeats, ‘borrow’ a DNA-binding protein for stabilization during their legacy ranges from spliceosomal introns and antigen- mitosis [48•]. Expansion of the terminus of a pogo-like specific immunity to many recent recruits in highly DNA transposon would have come with a built-in binding specialized functions. Much of their influence may even protein. Indeed, the DNA-binding domain of CENP-B is be untraceable, for example, exons shuffled through L1 still conserved with that of the pogo transposases [5,49]. transposition, chromosomal rearrangements instigated by Other pogo family derivatives, like Jerky and its close rel- DNA transposon activity, or adaptations of the host to ative JRKL, have not retained the DNA-binding domain. overcome, balance or exploit the constant mutational pres- Jerky [50], associated with epileptic seizures in mouse and sure of selfish DNA. 662 Genomes and evolution

References and recommended reading GC-rich DNA) demand consideration in the theory of Alu evolution and pos- Papers of particular interest, published within the annual period of review, sible function. have been highlighted as: 18. Harris JR: Placental (ERV): structural, • • of special interest functional, and evolutionary significance. BioEssays 1998, •• of outstanding interest 20:307-316. This paper summarizes the evidence for a correlation between ERVs and pla- 1. Smit AFA: The origin of interspersed repeats in the human cental function and evolution. An intriguing subject which is worth a look for genome. Curr Opin Genet Dev 1996, 6:743-748. outsiders. 2. RepeatMasker server on World Wide Web URL: 19. Boyd MT, Bax CMR, Bax BE, Bloxam DL, Weiss RA: The human http://repeatmasker.genome.washington.edu endogenous retrovirus ERV-3 is upregulated in differentiating placental trophoblast cells. Virology 1993, 196:349-353. 3. RepBase Update 1999 on World Wide Web URL: •• http://www.girinst.org/~server/repbase.html 20. Lin L, Xu B, Rote NS: Expression of endogenous retrovirus ERV-3 Widely used in the genomics world for the elimination of repeats before • induces differentiation in BeWo, a choriocarcinoma model of sequence analysis, this database of reconstructed transposable elements is human placental trophoblast. Placenta 1999, 20:109-118. still underutilized as a source of information for transposon research. Much Evidence that ERV-3, once an HERV-R envelope gene, is now directly of the data is not published elsewhere nor available in general databases. involved in the development of the placenta, specifically in cell fusions to form the syncytiotrophoblasts. 4. Smit AFA; Jurka J, Kapitonov V, Niak A: Entries in RepBase Update on World Wide Web URL: http://www.girinst.org/~server/repbase.html 21. Alliel PM, Perin JP, Pierig R, Nussbaum JL, Menard A, Rieger F: Endogenous retroviruses and multiple sclerosis. II. HERV-7q. C R 5. Smit AFA, Riggs AD: Tiggers and other DNA transposon fossils in Acad Sci III 1998, 321:857-863. the human genome. Proc Natl Acad Sci USA 1996, 93:1443-1448. 22. Blond JL, Beseme F, Duret L, Bouton O, Bedin F, Perron H, 6. Schouten GJ, Van Luenen HGAM, Verra NCV, Valerio D, • • Mandrand B, Mallet F: Molecular characterization and placental Plasterk RHA: Transposon Tc1 of the nematode Caenorhabditis expression of HERV-W, a new human endogenous retrovirus elegans jumps in human cells. Nucleic Acids Res 1998, family. J Virol 1999, 73:1175-1185. 26:3013-3017. This paper reports a relatively young HERV family (previously named HERV17 One of several papers describing the activity of foreign DNA transposons in in RepBase) that is related to multiple sclerosis associated retrovirus (MSRV). mammalian cells. Besides being of evolutionary interest, the importance of At least one copy (on chromosome 7 [21]), like ERV-3, has retained an this work lies in the prospect of using DNA transposons as vehicles in env–ORF and is specifically expressed in the placenta. These env genes may human gene therapy and as tools in mammalian genetics. be involved in the suppression of the maternal immune response against the • 7. Wilkinson DA, Mager DL, Leong JC: Endogenous human foreign looking fetus or the fusion of trophoblast cells [19,20 ]. retroviruses. In The Retroviridae. Edited by Levy. New York: Plenum 23. Stoye JP: Fv1, the mouse retrovirus resistance gene. Rev Sci Tech Press; 1994:465-535. 1998, 17:269-277. ERV-L 8. Benit L, Lallemand JB, Casella JF, Philippe H, Heidmann T: X-chromosome inactivation: a repeat hypothesis. elements: a family of endogenous retrovirus-like elements active 24. Lyon MF: •• Cytogenet Cell Genet 1998, 80:133-137. throughout the evolution of mammals. J Virol 1999, 73:3301-3308. The apparent necessity for an X-chromosome-specific factor to bind Xist 9. Malik HS, Burke WD, Eickbush TH: The age and evolution of non RNA and observations of repeat-induced gene silencing in other species are • LTR retrotransposable elements. Mol Biol Evol 1999, 16:793-805. elegantly combined to suggest that contact of an interspersed repeat with A phylogenetic study that provides welcome clarity in the relationships of the Xist RNA may lead to the silencing of these elements and the intervening daunting number of LINE-like elements. Telomerases appear to be descen- unique sequences by their conversion to heterochromatin. dants of non-LTR retrotransposons, rather than vise versa, when all the cur- 25. Duthie SM, Nesterova TB, Formstone EJ, Keohane AM, Turner BM, rently available information is taken into account. Zakian SM, Brockdorff N: Xist RNA exhibits a banded localization 10. Hohjoh H, Singer MF: Sequence-specific single-strand RNA on the inactive X chromosome and is excluded from autosomal binding protein encoded by the human LINE1-retrotransposon. material in cis. Hum Mol Genet 1999, 8:195-204. EMBO J 1997, 16:6034-6043. 26. Deininger PL, Batzer M: Alu repeats and human diseases. Mol 11. Cost GJ, Boeke JD: Targeting of human retrotransposon Genet Metab 1999, 67:183-193. integration is directed by the specificity of the L1 endonuclease 27. Kazazian HH Jr: Mobile elements and disease. Curr Opin Genet for regions of unusual DNA structure. Biochemistry 1998, • 37:18081-18093 Dev 1998, 8:343-350. A good introduction to the (limited) contribution of transposable elements to 12. Schmid CW: Does SINE evolution preclude Alu function? Nucleic human diseases. •• Acids Res 1998, 26:4541-4550. 28. Kobayashi K, Nakahori Y, Miyake M, Matsumura K, Kondo-Iida E, A scholarly study on whether Alu SINEs have spread as intragenomic para- • sites or have a more symbiotic relationship with the host. As most aspects Nomura Y, Segawa M, Yoshioka M, Saito K, Osawa M et al.: An of Alu biology and evolution are touched upon, this is the review to read ancient retrotransposal insertion causes Fukuyama-type when you’re interested in Alus. congenital muscular dystrophy. Nature 1998, 394:388-392. Potential 29. Rohrer J, Minegishi Y, Richter D, Eguiguren J, Conley ME: Unusual 13. Chu WM, Ballard R, Carpick BW, Williams BR, Schmid CW: • • Alu function: regulation of the activity of double-stranded RNA- mutations in Btk: an insertion, a duplication, an inversion, and activated kinase PKR. Mol Cell Biol 1998, 18:58-68. four large deletions. Clin Immunol 1999, 90:28-37. A set of carefully performed experiments shows that levels of Alu transcripts These two papers report the first non-Alu/L1 insertional mutations by a trans- observed in cells under stress, but not those of related RNA molecules, can posable element in human. The inserted SVA or SINE-R is a composite ele- antagonize PKR activation and increase translation of a reporter gene. ment containing several degraded copies of Alu, a GC-rich, 49-bp unit region, and a fragment of the HERV-K LTR. Although it is 14. Matassi G, Labuda D, Bernardi G: Distribution of the mammalian- entirely unclear how this element ‘gets around’, the variable length target site wide interspersed repeats (MIRs) in the isochores of the human and poly(A) tail seem to implicate the L1 machinery again. genome. FEBS Lett 1998, 439:63-65. 30. Medstrand P, Mager DL: Human specific integrations of the 15. Kimura RH, Choudary PV, Schmid CW: Silk worm Bm1 SINE RNA • HERV-K endogenous retrovirus family. J Virol 1998, 72:9782-9787. increases following cellular insults. Nucleic Acids Res 1999, The authors build a phylogenetic tree of HERV-K elements in our genome 27:3380-3387. and come to the conclusion that probably several hundred copies are spe- cific to humans. Out of eight copies, one showed an ancestral (empty) state 16. Jurka J: Sequence patterns indicate an enzymatic involvement in in some ; however, this copy resides in the HLA and the alleles integration of mammalian . Proc Natl Acad Sci USA are an estimated 30-million-years old. 1997, 94:1872-1877. 31. Berkhout B, Jebbink M, Zsiros J: Identification of an active reverse 17. Arcot SS, Adamson AW, Risch GW, LaFleur J, Robichaux MB, transcriptase enzyme encoded by a human endogenous HERV-K • Lamerdin JE, Carrano AV, Batzer MA: High-resolution cartography retrovirus. J Virol 1999, 73:2365-2375. of recently integrated human chromosome 19-specific Alu fossils. J Mol Biol 1998, 281:843-856. 32. DeBerardinis RJ, Goodier JL, Ostertag EM, Kazazian HH Jr: Rapid Although reported earlier by the same authors, these larger scale observa- amplification of a retrotransposon subfamily is evolving the tions that recent distribution patterns of Alu appear random (not targeted to mouse genome. Nat Genet 1998, 20:288-290. Transposable elements in mammalian genomes Smit 663

33. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL: The cotranscribed region can be incorporated in new genes quite efficiently, pre- •• paleontology of intergene retrotransposons of maize. Nat Genet sumably through fortuitous in-frame splicing from a gene upstream of the L1 1998, 20:43-45. integration site. The influence of transposable elements on genomes always seems to humble observations in mammals. The authors estimate the age of the LTR 44. Skryabin BV, Kremerskothen J, Vassilacopoulou D, Disotell TR, retrotransposons in a locus from the divergence of 5′ and 3′ LTRs and come Kapitonov VV, Jurka J, Brosius J: The BC200 RNA gene and its to the conclusion that the maize genome may have doubled in size in the last neural expression are conserved in Anthropoidea (Primates). 3 million years because of retrotransposon insertions. J Mol Evol 1998, 47:677-685. 45. Nakamura TM, Cech TR: Reversing time: origin of telomerases. 34. Jensen S, Gassama MP, Heidmann T: Taming of transposable Cell 1998, 92:587-590. elements by homology-dependent gene silencing. Nat Genet 1999, 21:209-212. 46. Agrawal A, Eastman QM, Schatz DG: Implications of transposition •• mediated by V(D)J-recombination proteins RAG1 and RAG2 for Cytosine methylation and the 35. Yoder JA, Walsh CP, Bestor TH: origins of antigen-specific immunity. Nature 1998, 394:744-751. ecology of intragenomic parasites. Trends Genet 1997, See annotation [47••]. 13:335-340. 47. Hiom K, Mele M, Gellert M: DNA transposition by the RAG1 and 36. Regev A, Lamb MJ, Jablonka E: The role of DNA methylation in •• RAG2 proteins: a possible source of oncogenic translocations. • invertebrates: developmental regulation or genome defense? Mol Cell 1998, 94:463-470. Biol Evol 1998, 15:880-891. These two papers [46••,47••] describe experiments that substantiate the A learned review combined with a survey of methylation patterns in inverte- DNA transposon origin of our immune system’s recombination mechanism. brate genomes that seems to disfavor the idea that the primary and least dis- One can not ask for a better example of a contribution of transposons and pensable function of DNA methylation lies in defense against intragenomic of the quirky nature of evolution. parasites. The authors observe a correlation of the occurrence of DNA methylation with the number of somatic cell divisions, but not with genome 48. Csink AK, Henikoff S: Something from nothing: the evolution and size or abundance of transposable elements. • utility of satellite repeats. Trends Genet 1998, 14:200-204. An appealing theory on what makes a centromere (the latest replicating unit 37. Simmen MW, Leitgeb S, Charlton J, Jones SJ, Harris BR, Clark VH, in a chromosome), and on the reason for ‘the other’ abundant type of repeats • Bird A: Nonmethylated transposable elements and methylated in our genome (satellite repeats block out origins of replications over long genes in a chordate genome. Science 1999, 283:1164-1167. regions). This observation of an ‘opposite’ methylation pattern in a urochordate clear- ly does not fit in with a host defense function of methylation in this . 49. Pietrokovski S, Henikoff S: A helix–turn–helix DNA-binding motif predicted for transposases of DNA transposons. Mol Gen Genet 38. Walsh CP, Chaillet JR, Bestor TH: Transcription of IAP endogenous 1997, 254:689-695. • retroviruses is constrained by cytosine methylation. Nat Genet 1998, 20:116-117. 50. Toth M, Grisby J, Buzsaki G, Donovan GP: Epileptic seizures The authors observe a marked increase of IAP endogenous virus activity in caused by inactivation of a novel gene, jerky, related to mice in which the DNA methyltransferase gene dmt1 has been knocked out, centromere-binding protein-B in transgenic mice. Nat Genet 1995, consistent with a role for methylation in host defense mechanisms. 11:71-75. 39. Garrick D, Fiering S, Martin DIK, Whitelaw E: Repeat induced gene 51. Robertson HM, Zumpano KL: of an ancient • silencing in mammals. Nat Genet 1998, 18:56-59. mariner transposon, Hsmar1, in the human genome. Gene 1997, The authors show that suppression of transgene expression involves methy- 31:203-217. lation and is correlated with the number of concatemeric repetition of the 52. Esposito T, Gianfrancesco F, Ciccodicola A, Montanini L, Mumm S, transgene, hinting at the possible existence of a transcriptional host defense • D’Urso M, Forabosco A: A novel pseudoautosomal human gene mechanism as has been uncovered in plants and fungi. encodes a putative protein similar to Ac-like transposases. Hum Mol Genet 1999, 8:61-67. 40. Bender J: Cytosine methylation of repeated sequences in • eukaryotes: the role of DNA pairing. Trends Biochem Sci 1998, These two papers [51,52 ] describe two proteins that at first sight appear to 23:252-256. be transposases, but turn out to contain large amino-terminal domains derived from cellular genes. The existence of cellular transposase derived 41. Chen RZ, Petterson U, Beard C, Jackson-Grusby L, Jaenisch R: DNA proteins with apparently intact active sites like the mariner derivative hypomethylation leads to elevated rates. Nature 1998, (Table 3) is intriguing for their potential influence on DNA stability. 395:89-93. 53. Imataka H, Olsen HS, Sonenberg N: A new translational regulator 42. O’Neill RJW, O’Neil MJ, Graves JAM: Undermethylation with homology to eukaryotic translation initiation factor 4G. • associated with retroelement activation and chromosome EMBO J 1997, 16:817-825. remodelling in an interspecific mammalian hybrid. Nature 1998, DAP-5, a novel 393 54. Levy-Strumpf N, Deiss LP, Berissi H, Kimchi A: :68-72. homolog of eukaryotic translation initiation factor 4G isolated as A hybrid of different wallaby species shows extensive chromosomal a putative modulator of gamma interferon-induced programmed rearrangements and expansion of centromeres. The hybrid has an extensive- cell death. Mol Cell Biol 1997, 17:1615-1625. ly undermethylated genome, including undermethylated ERVs which appear to have transposed to and expanded in some centromeres. The latter expan- 55. Gale M Jr, Blakely CM, Hopkins DA, Melville MW, Wambach M, sion may be due to concatemerization rather than transposition. Romano PR, Katze MG: Regulation of interferon-induced protein kinase PKR: modulation of P58IPK inhibitory function by a novel 43. Moran JV, DeBerardinis RJ, Kazazian HH Jr: Exon shuffling by L1 protein, P52rIPK. Mol Cell Biol 1998, 18:859-871. •• retrotransposition. Science 1999, 283:1530-1534. Making use of an L1 expression system and clever recombinant L1 con- 56. Tudor M, Lobocka M, Goodell M, Pettitt J, O’Hare K: The pogo structs, the authors show that 3′ flanking regions of L1 elements very regu- transposable element family family of Drosophila melanogaster. larly get incorporated in the L1 transcript and that a reporter exon in such a Mol Gen Genet 1993, 232:126-134.