VIEW Open Access Restricting Retrotransposons: a Review John L
Total Page:16
File Type:pdf, Size:1020Kb
Goodier Mobile DNA (2016) 7:16 DOI 10.1186/s13100-016-0070-z REVIEW Open Access Restricting retrotransposons: a review John L. Goodier Abstract Retrotransposons have generated about 40 % of the human genome. This review examines the strategies the cell has evolved to coexist with these genomic “parasites”, focussing on the non-long terminal repeat retrotransposons of humans and mice. Some of the restriction factors for retrotransposition, including the APOBECs, MOV10, RNASEL, SAMHD1, TREX1, and ZAP, also limit replication of retroviruses, including HIV, and are part of the intrinsic immune system of the cell. Many of these proteins act in the cytoplasm to degrade retroelement RNA or inhibit its translation. Some factors act in the nucleus and involve DNA repair enzymes or epigenetic processes of DNA methylation and histone modification. RISC and piRNA pathway proteins protect the germline. Retrotransposon control is relaxed in some cell types, such as neurons in the brain, stem cells, and in certain types of disease and cancer, with implications for human health and disease. This review also considers potential pitfalls in interpreting retrotransposon-related data, as well as issues to consider for future research. Keywords: Alu, Autoimmunity, Epigenetics, LINE-1, Methylation, Restriction, Retrovirus, RNAi, SINE, SVA Background highly mutated. However, some intracisternal A-particle Sixty-five years on from Barbara McClintock’s seminal (IAP) and Etn/MusD family LTR elements remain inser- discovery of mobile DNA [1] we now understand that tionally active in mice [8], and formation of infective virions genomes are dynamic and changeable, with transposable by recombination or phenotypic mixing of intact proteins elements (TEs) being major contributors to their fluidity. from different ERV proviruses has been reported [9–12]. We recognize that TEs, sometimes called “junk DNA”, Among the 31 human endogenous retrovirus subfamilies are major players in genome evolution and have helped extant in the human genome, no replication-competent shape the form and function of many genes [2]. Never- HERVs are known, although their existence has not been theless, TEs are foremost parasitic DNA, and parasites ruled out [13, 14], and recently an unfixed fully intact must be controlled or they will destroy a host. There is HERV-K (HML2) provirus was identified in some individ- far more junk than treasure in mobilomes. uals [15]. Many HERVs, and their lone LTRs that populate DNA transposons comprise about 3 % of the human the genome as a consequence of non-homologous recom- genome and most move by a “cut and paste” mechanism bination, remain capable of expression and may act as tran- involving excising an element and reinserting it elsewhere scriptional regulatory elements for genes (reviewed in [16]). (Fig. 1 [3]). With the exception of at least one family of Non-LTR retrotransposons are as old as the earliest piggyBac elements in little brown bats [4], no active DNA multi-cellular organisms and their 28 clades have origins transposons are known in mammals. There are two clas- in the Precambrian Era of 600 million years ago [17, 18]. ses of retrotransposon. Both move by a “copy and paste” Long Interspersed Elements (LINEs) and Short Inter- mechanism, involving reverse transcription of an RNA spersed Elements (SINEs) comprise most of this group intermediate and insertion of its cDNA copy at a new site in mammals. LINE-1s (L1s), the only currently active in the genome. LTR retrotransposons are named for the autonomous mobile DNA in humans, have been evolv- long terminal repeats that flank their sequences (reviewed ing during at least 150 million years of mammalian radi- in [5–7]). Endogenous retroviruses (ERVs) are relics of ation. Multiple active L1 lineages coexisted in ancestral past germline viral infections and for the most part are primates, but for the past 40 Myr there has been a single unbroken lineage of subfamilies [19, 20]. Expansion of L1s was massive, and roughly 500,000 copies now occupy about Correspondence: [email protected] McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University 17 % of the human genome. Remnant copies of extinct L2 School of Medicine, Baltimore, MD, USA212051 © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Goodier Mobile DNA (2016) 7:16 Page 2 of 30 Fig. 1 Types of transposable elements in mammals. Abbreviations: DR, direct repeat; ITR, inverted terminal repeat; Gag, group-specific antigen; Prt, protease; Pol, polymerase; Env, envelope; RT, reverse transcriptase domain; INT, integrase domain; TSD, target site duplication; LTR, long terminal repeat; EN, endonuclease domain; C, zinc knuckle domain; An, poly (A); A/B, A- and B-box Pol III promoter; SVA, SINE-R, VNTR, Alu element; VNTR, variable number tandem repeats (reproduced from [3]; Elsevier license number 3803340576977) and L3 family elements comprise an additional 4 % SVA copies per human genome, most of which are full- [21]. L1s have also been responsible for genomic inser- length and about 50 of which may be active [24–31]. SVA- tion of 8000 processed pseudogenes and over a million like variants have been described, including a human- non-autonomous SINEs [22]. B1s and Alus, the pre- specific subfamily generated by fusion of the first intron of dominant SINEs of mice and men, respectively, originate the MAST2 gene with an SVA [29, 32, 33], and the LAVA, from the 7SL RNA component of the signal recognition PVAandFVAelementsofnon-humanprimates[34–36]. particle. Alus are about 300 base pairs in length with a From 12 Myr ago, the primate LINE-1 expansion slo- dimeric structure; B1s are monomeric (reviewed in [23]). wed, and most insertions are molecular fossils, truncated, SVAs are hominid-specific SINEs, and the youngest family rearranged, or mutated [20]. However, although most L1s of active human retrotransposons. Their name is an acro- no longer “jump”, at least 100 remain potentially mobile nym reflecting their composite nature: a HERV-K(HML- in any individual diploid human genome [37, 38]. Many 2)-derived SINE-R, variable-number-of-tandem-repeats more L1s are transcribed. Interestingly, only a small num- (VNTR), and an Alu-like region. There are roughly 2700 ber of the active L1s are ”hot” for retrotransposition and Goodier Mobile DNA (2016) 7:16 Page 3 of 30 these have accounted for most de novo insertions. How- together with L1 RNA, ORF2p, and many other RNA- ever, when several of these “hot” Ta-1 L1s were examined binding proteins [58, 59]. Recently, endogenous ORF1p and across diverse human populations, considerable individual ORF2p have been reported to also colocalize in nuclear foci allelic variation affected their ability to retrotranspose of cancer cells [60]. [39]. Up to 5 % of newborn children have a new retro- How retrotransposons impact the mammalian cell transposon insertion, and to date there are 124 known and genome has been the subject of many other reviews human disease-causing germline insertions of L1s, Alus, [3, 41, 42, 61–67]. These effects extend beyond simple and SVAs [40–42]. The current residual activity of human mutation by genomic insertion. L1 RNA and protein retrotransposons is the background that escapes a variety overexpression has been linked with apoptosis, DNA of mechanisms that have evolved to limit replication of damage and repair, tumor progression, cellular plasti- mobile DNA. This review focuses on mammalian non- city, and stress response [68–72]. Consequently, the cell LTR retrotransposons and how the cell controls them. has evolved a battery of defenses to protect against the Non-LTR retrotransposons are mobilized by a mechan- dangers of unfettered retrotransposition. It is not surpris- ism very different from that used by retroviruses and LTR ing that many of the known anti-retrotransposon restric- retrotransposons. Extensive biochemical analyses of insect tion factors are also anti-retroviral. Phylogenetic analyses R1 and R2 elements, together with genomic sequence ana- suggest that eukaryote non-LTR retrotransposons predate lyses, indicate that L1s likely retrotranspose by a process LTR retrotransposons, which in turn gave rise to retrovi- known as target-primed reverse transcription (TPRT) that ruses through the acquisition of an envelope (env) gene occurs at the site of DNA insertion. According to this [73–76]. Indeed, some restriction factors may have first model, L1-encoded endonuclease nicks the bottom strand evolved to control ancient endogenous retroelements and of target DNA exposing a 3'-hydoxyl that primes reverse were later recruited to the fight against exogenous in- transcription of bound L1 RNA. Second-strand DNA syn- vaders. It is reasonable to presume that from the study of thesis follows and the integrant is resolved in a manner still factors controlling endogenous retrotransposition new in- poorly understood [43]. Short target site duplications sights into the control of viral infections will emerge. (TSDs) of variable