<<

Evolutionary Anthropology 19:236–249 (2010)

ARTICLES

LINEs and SINEs of Primate

MIRIAM K. KONKEL, JERILYN A. WALKER, AND MARK A. BATZER

The primate order is a monophyletic group thought to have diverged from the ogy of retrotransposons. We discuss this Euarchonta more than 65 mya.1 Recent paleontological and molecular evolution stud- briefly in the following sections. For fur- ies place the last common ancestor of primates even earlier (‡ 85 mya).2 More than ther details we refer readers to other 300 extant primate species are recognized today,3,4 clearly emphasizing their diversity recent reviews.10–14 Occasionally, LINEs and success. Our understanding of the and the composition of and SINEs are referred to as retrotrans- their genomes has been revolutionized within the last decade through the increasing posons and retroposons, respectively. In availability and analyses of sequenced genomes. However, several aspects of pri- this review, we use the term retrotrans- mate evolution have yet to be resolved. DNA sequencing of a wider array of primate poson for all non-LTR (long terminal species now underway will provide an opportunity to investigate and expand on these repeat) retroelements, if not otherwise questions in great detail. One of the most surprising findings of the human (Homo indicated. sapiens) genome project was the high content of repetitive sequences, in particular of mobile DNA.5 This finding has been replicated in all available and analyzed primate 5–7 draft genome sequences analyzed to date. In fact, transposable elements (TEs) LINE AND SINE contribute about 50% of the genome size of humans,5 chimpanzees (Pan troglo- dytes),6 and rhesus macaques (Macacca mulatta).7 The proportion of TEs among the TEs are classified into different overall genome content is likely even higher due to the decay of older mobile ele- groups on the basis of their transposi- ments beyond recognition, rearrangements of genomes over the course of evolution, tion mechanism and family-specific and the challenge of sequencing and assembling repeat-rich regions of the ge- characteristics. Primate-specific non- nome.8,9 Retrotransposons, in particular L1, long interspersed element 1 (LINE1), and LTR retrotransposons, such as L1, Alu, Alu, a short interspersed element (SINE), are prominent in primate genomes, and and SVA (see Boxes 1and2) belong to B1 B2 have played a major role in genome evolution and architecture. The evolution and the group of Class I elements and prop- success of the primate-specific LINE and SINE subfamilies (L1 and Alu in particular), agate via a ‘‘copy and paste’’ mechanism 13,15 their application in phylogenetic studies, and their impact on the architecture of pri- using an RNA intermediate. Newly mate genomes will be the focus of this review. In addition, we will briefly cover the integrated retrotransposon insertions emergence and impact of SVA (SINE-R/VNTR/Alu), a composite retrotransposon of are usually flanked by a short stretch relatively recent origin, and of other SINEs that are not common to all primates. (6–20 bp) of duplicated unique host DNA called target site duplications (TSDs).16,17 In primates, L1 appears to The evolution of retrotransposons by their amplification mode and inser- be the only currently active autono- has been affected not only by commonly tion mechanism. Consequently, it is im- mous retrotransposon. Autonomous considered aspects such as population portant to have some general under- retrotransposons encode the required and genetic selection, but also standing of the unique features and biol- enzymatic machinery to copy them-

Dr. Mark A. Batzer is currently LSU faculty in scholarship and academics retrotransposons. Both Konkel and Walker System Boyd Professor and Dr. Mary (2008, 2009). Dr. Batzer is a fellow of the have been involved in retrotransposon Lou Applewhite Distinguished Professor, American Association for the Advance- studies and population genetic analyses Department of Biological Sciences at ment of Science and is active in numerous for several whole genome sequencing and Louisiana State University, Baton Rouge, organizations, advisory boards, and pa- analysis projects. Louisiana. Dr. Batzer’s research interests nels. E-mail: [email protected] focus on comparative of hu- Miriam K. Konkel, M.D., is a postdoctoral Abbreviations: mya (million years ago), myrs (million years), ORF (open reading frame), TSD man and nonhuman primates, mobile researcher in Dr. Batzer’s laboratory. Her (target site duplication), SINE (short interspersed DNA, forensic genomics, and computa- current research focuses on the evolution element), LINE (long interspersed element), TE tional biology. Additional research inter- of retrotransposons in diverse species, with (transposable element), LTR (long terminal ests include the identification of genes emphasis on primates and other mammals; repeat), UTR (untranslated region). Please see related to healthy aging in humans. He her background also includes clinical prac- Glossary for more details. has been awarded an LSU Alumni Asso- tice and infectious diseases research. Key words: retrotransposon; phylogenetic ciation Faculty Excellence Award (2005), Jerilyn A. Walker, M.S., is a research subfamily; source element; mobile elements LSU Distinguished Faculty Award (2007), associate in the Batzer laboratory. Her Howard Hughes Medical Institute (HHMI) research centers on the development of VC 2010 Wiley-Liss, Inc. DOI 10.1002/evan.20283 Distinguished Mentor Award (2008), and quantitative assays for species-specific Published online in Wiley Online Library LSU Rainmaker award for the top 100 DNA identification through detection of (wileyonlinelibrary.com). ARTICLES LINEs and SINEs of Primate Evolution 237

GLOSSARY: Nonautonomous element—de- ond-strand DNA synthesis, are not pendent on enzymatic machinery well understood. DNA—desoxyribonucleic acid. from autonomous elements; for SRP9/14—subunit of the human Two anti-parallel backbones com- example, Alu and SVA. Signal Recognition Particle 9/14. prised of the sugar deoxyribose and SINEs—short interspersed ele- SRP9 and SRP14 proteins form a phosphoric acid are joined by phos- ments. They were originally defined stable heterodimer (SRP9/14) that phodiester bonds; attached to each by their interspersed nature and binds to 7SL RNA of Alu elements; sugar is one of four nucleotides (ad- length (75–500bp), but have been impaired binding reduces Alu mo- enine (A), guanine (G), thymine (T), further characterized by their RNA bilization. or cytosine (C). The nucleotides polymerase III transcription. APOBEC3—Apolipoprotein B encode genetic information. LINEs—long interspersed ele- mRNA Editing Complex 3; mRNA—messenger ribonucleic ments. Full-length elements are believed to inhibit L1 and Alu ret- acid; similar to DNA but contains 6kb in length, contain an internal rotransposition. ribose instead of deoxyribose and promoter for polymerase II, two Bottleneck—substantial reduc- uracil (U) instead of thymine (T). ORFs, and end in a polyA-tail. tion in the size of a population over CpG dinucleotide—a 50 cytosine SVA—composite elements named a short time. A bottleneck poten- (C) nucleotide followed 30 by a after their main components, SINE, tially results in radical changes of guanine (G) nucleotide within a a variable number of tandem allele frequencies and reduced linear DNA sequence. The cyto- repeats (VNTR), and Alu. genetic variation. sines of CpG dinucleotides are tar- TSD—target site duplication; a Source element—element that is gets of DNA methylation, resulting short stretch (generally 6–20 bp in both transcriptionally and retro- in 5-methylcytosine. Deamination length) of identical DNA generated transpositionally active and able to of 5-methylcytosine results in thy- at each end of a retrotransposon generate copies. mine. In general, CpG sites mutate integration event as a result of the Precise parallel insertion—inde- 10 times faster than do other di- staggered cut in the target site pendent retrotransposon insertions nucleotide combinations.38–41 For DNA; TSDs are a hallmark of at exactly the same target site. Alu insertions less than 50 myrs in TPRT-mediated retrotransposition. Near-parallel insertion—inde- age, the CpG mutation rate is 6 ORF—open reading frame; a pendent retrotransposon insertions times faster as compared to non- portion of a DNA sequence in within the PCR amplicon or CpG sites.42 which there are no termination co- genomic region, but not at identical Homopolymeric tract—stretch dons (stop codons) in at least one insertion sites. of DNA sequence containing identi- ofthepossiblereadingframes; Homoplasy—shared genetic state cal nucleotides; the simplest form begins with a start codon (initia- or allele that is not inherited from a of a repetitive sequence. tion codon) and ends with a stop common ancestor, but rather is due PolyA-tail—homopolymeric tract codon. ORFs potentially encode to independent events. of adenosine nucleotides; here at for protein or polypeptide. L1 ele- Incomplete lineage sorting—a the 30 end of non-LTR retrotrans- ments contain ORF1 and ORF2; marker (for example, an Alu ele- posons. the product of ORF1 is an RNA- ment) that is polymorphic at the Retrotransposons—Class I ele- binding protein (ORF1p). ORF2 time of the divergence of several ments, including endogenous ret- encodes a protein (ORF2p) with species becomes randomly distrib- roviruses and retrotransposons, endonuclease and reverse tran- uted in the emerging taxa. that move in a genome via a ‘‘copy scriptase activities. Gene conversion—unequal non- and paste’’ mechanism through an TPRT—target-primed reverse reciprocal recombination of a ho- RNA intermediate and are reverse transcription (Fig. 1); term for the mologous sequence (for example, transcribed into DNA by reverse integration mechanism of non- between Alu elements). transcriptase. LTR retrotransposons into the ge- Exonization—a transposable ele- LTR retrotransposons—retro- nome. The bottom strand of chro- ment residing in an intron is transposons with long terminal mosomalDNAiscutatatarget recruited into the coding sequence repeats, such as endogenous retro- site (50-TTTT/AA-30)byanendonu- and thus exonized. In particular, viruses. clease encoded by L1, followed by Alu elements have been commonly Non-LTR retrotransposons—ret- binding of non-LTR retrotranspo- identified in alternatively spliced rotransposons lacking LTRs; SINEs, son RNA at the DNA cleavage site exons.139 LINEs, and SVAs. and reverse transcription by L1- Molecular domestication—the Autonomous element—an ele- encoded reverse transcriptase. The sequence of a transposable element ment that provides its own machin- following steps, such as genera- is incorporated into a novel func- ery for amplification; for example, tion of second strand nick and sec- tion within a genome. full-length LINEs with intact ORFs. 238 Konkel et al. ARTICLES

to join the analyzed genomes in the near Box 1. Introduction to Transposable Elements future. In addition, several smaller scale retrotransposon studies using more In the 1940s, transposable elements into elements that contain the neces- diverged primate species have provided were initially discovered in plant sary enzymatic machinery required insights into retrotransposon evolution genomes by Dr. Barbara McClin- for their movement (termed autono- and amplification patterns.29–32 The tock.140 The elements were associated mous) and those that do not (termed overall physical expansion of primate with variable differences in corn ker- nonautonomous).10–14 The elements genomes is driven by repeats, with nel color that were the result of the are also categorized based on their L1 and Alu elements being the movement of maize mobile elements. overall size as either long inter- major contributors.31 Retrotranspo- These elements are also known as spersed elements (LINEs), in which sons accumulate in primate genomes ‘‘jumping genes,’’ ‘‘mobile DNA,’’ L1 can be over 6 kb in length, or due to the imbalance between their ‘‘selfish DNA,’’ or ‘‘junk DNA.’’ Subse- short interspersed elements (SINEs), rates of insertion and removal, such quent studies of genome sequences, which are shorter than 500 bp.14 as by ectopic recombination. Accord- including the first sequence of the Within primates, the major autono- ingly, the retrotransposon composi- human genome, showed that the ma- mous LINE is termed L1, it is pres- tion of primate genomes is composed jority of eukaryotic genomes contain a ent at a copy number of over of both old and new elements. substantial number of different types 500,000 elements.5–7 LINEs arose in In general, L1 and Alu ele- of mobile elements.5,6,118,119 In fact, mammalian genomes 170 million ments appear to have remained many of the genomes that have been years ago.35 LINEs contain the major active throughout primate evolu- analyzed, particularly the completely proteins required for the movement tion.5–7,30,31,33,34 Since L1 originated sequenced mammalian genomes, are of retroelements, such as an endonu- well before the origin of primates (at composed of nearly 50% transposable clease and reverse transcriptase, least 170 mya),35 primate genomes elements of one type or another.5–7 along with a separate chaperone pro- contain L1 insertions predating the Mobile elements move throughout tein.16,141 In comparison, the major origin of primates, as well as more the genomes in which they reside in SINE is a 300 bp element termed recent primate-specific insertions. In either a ‘‘cut and paste’’ or ‘‘copy and Alu that is present at a copy number contrast, Alu elements are unique to paste’’ type of mechanism.11–14 in excess of 1 million, it arose about primate genomes. Despite their rela- Within mammals, the most common 65 million years ago.5,13 Alu elements tively recent origin, Alu elements have types of mobile elements move by a are SINEs that are specific to the pri- amplified to more than one million ‘‘copy and paste’’ mechanism mate order.13 The third type of mo- copies and account for 10% of the throughanRNAintermediate.Asa bile element is a composite element genome mass in all three sequenced result, these elements are termed ret- termed SVA, which is nonautono- primate genomes.5–7 With 17% of the rotransposons or retroelements. Ret- mous and represented by only a few overall genome content, L1 is arguably rotransposons may be broken down thousand copies in ape genomes.123 the most successful and only currently known active autonomous retrotrans- poson in primates. L1 is responsible not selves.18 L1 shows a strong cis prefer- posons upon primate genomes has been only for its own retrotransposition, but ence in vitro, meaning that the L1 RNA revolutionized. However, even a fully also for the insertion of nonautono- recruits its own translated proteins dur- sequenced genome reveals only selective mous elements and processed pseudo- ing retrotransposition.19–21 However, information and allows, at best, a nar- genes.5,19,36 Consequently, about one- the enzymatic machinery of L1 is also rowwindowintothecurrentstateofa third of the genome mass of all pri- known to insert nonautonomous retro- genome. Most recently integrated mate genomes analyzed to date is transposons such as Alu elements into ‘‘young’’ elements are subject to neutral derived from L1 retrotransposition the genome.13,22 The vast majority of selection, strongly suggesting that the related events.37 In addition, in some retrotransposon insertions in primate vast majority of retrotransposon inser- primate species, such as humans, L1 genomes are believed to insert into a ge- tions are neutral residents in primate is at present the only active driver of nome via target-primed reverse tran- genomes.28 Under neutral selection, retrotransposition due to the lack of 23,24 scription (TPRT; Fig. 1). However, only 1/(2Ne) new insertions (with Ne LTR retrotransposon activity (that is, nonclassical insertion pathways have being the effective population size) endogenous retroviruses).12 also been identified that are far less reach fixation in a population.10 Conse- frequently used.25–27 quently, a large fraction of novel retro- transposon insertions are lost over the NUCLEOTIDE SUBSTITUTIONS course of evolution. At present, three AND CONCEPT OF L1 AND ALU ARE DRIVERS OF primate genomes, H. sapiens, P. troglo- RETROTRANSPOSON dytes,andM. mulatta, have been GENOME EXPANSION SUBFAMILIES sequenced and analyzed. An assembled With the availability of completed ge- draft genome sequence derived from an Retrotransposons have evolved nome sequences, our understanding of orangutan of Sumatran origin (Pongo continuously throughout primate the evolution and impact of retrotrans- abelii) is already available and expected evolution. Sequence alterations of ARTICLES LINEs and SINEs of Primate Evolution 239

Box 2. Structure of Retrotransposons

A full-length L1 is 6 kb in length. length SVA elements are composite gene and the 30LTR-region of It contains an internal polymerase II elements named after their main HERV-K10, an endogenous retrovi- promoter, two ORFs, 30 and 50 components SINE, VNTR (variable rus; and a polyadenylation signal UTRs, and terminates in a poly- number of tandem repeats), and followed by a polyA-tail.123,124 As a adenylation signal (indicated as pA Alu.142 They are nonautonomous consequence of the VNTR region, in Fig. B1) followed by a polyA-tail retrotransposons composed of five full-length SVA elements can vary (Fig. B1). L1s are often 50 truncated, different segments (Fig. B1). From greatly in size. Due to similar inser- inverted, rearranged, and involved 50 to 30, a full-length SVA contains a tion characteristics, SVA elements in transduction events.5–7,12,59 Most hexamer simple repeat region of are thought to use the L1 machinery L1 insertions are severely truncated variable length; an Alu homologous for retrotransposition.123,124 To date, upon insertion.5,93 Alu elements are region composed of two antisense SVA elements have not been very dimeric 300 bp-long elements that Alu fragments, including an addi- well studied. The concrete transcrip- do not encode proteins, do contain a tional sequence of unknown origin; tion mechanism (for example, poly- polymerase III promoter, and end in a VNTR region; a SINE region merase preference) and promoter a polyA-tail (Fig. B1).12,13,51 Full- derived from the 30end of the env- site are subject to debate.

Figure B1. Illustration of L1, Alu, and SVA. Full-length retrotransposons are not drawn to scale. The 50 region of Alu elements contain an internal RNA polymerase III promoter (A and B boxes). The internal polymerase II promoter of L1s is located within the 30UTR. The ORF1 of L1 elements encodes for an RNA-binding protein (ORF1p); ORF2 encodes a protein (ORF2p) with endonuclease and reverse transcriptase activities. The SVA element represents a composite retrotransposon without coding sequence. TSD means target site duplication; pA stands for polyadenylation site. retrotransposons are caused by ran- Nucleotide substitutions can alter tions, which are shared by more ret- dom mutations at a neutral substitu- the ability of retrotransposons to rotransposons than expected through tion rate upon insertion and/or nucle- mobilize and create new copies. It random mutations.8 Reconstruction otide substitutions after insertion.28 has been proposed that host selective of retrotransposon subfamily interre- Consequently, older retrotransposons pressure, such as host defense mech- lationships indicates hierarchical contain, on average, more substitu- anisms against retrotransposons is a characteristics, with the youngest tions than do younger insertions. driver of retrotransposon evolution.44 subfamilies containing the most Thus, the average substitution rate This scenario, similar in nature to in- diagnostic mutations and oldest sub- canbeusedtoestimatetheageof fectious disease host interactions, families the least.13 Some subfami- retrotransposon insertions in pri- creates a constant loop of repression lies have been identified, for example mate lineages. To estimate the age and escape. Host factors evolve con- in some platyrrhines,30 that likely of retrotransposon insertions, it is stantly to keep retrotransposons in arose through gene conversion, a crucial to distinguish between CpG check; selection pressure drives the mechanism that had been suggested and non-CpG bases. This is because evolution of retrotransposons and previously.47,48 Considering the aver- CpG sites have a higher mutation the creation of new subfamilies. The age random substitution rate within rate.38–42 This is of particular inter- concept of subfamilies within a ret- each subfamily and the range of est for Alu elements, as 30% of all rotransposon family was first sug- divergence from the consensus CpG sites reside within them.43 Alto- gested after the identification of spe- sequence between members of a par- gether, more than 40% of CpG dinu- cies-specific substitutions.45,46 Subfa- ticular subfamily, we are able to cleotides are found within TEs in milies can be constructed through reconstruct its reproductive history. primate genomes.5 the identification of diagnostic muta- Network phylogenetic analyses seem 240 Konkel et al. ARTICLES

Figure 1. Retrotransposition via target-primed reverse transcription (TPRT). L1, SVA, and Alu elements are thought to insert into the ge- nome through a mechanism called target-primed reverse transcription (TPRT). A. Shown is host DNA with a predicted target site. B. The L1 endonuclease encoded by the ORF2 loosely recognizes a target site (50-TTTTAA-30) and nicks the bottom strand of the host DNA.16,143 C. The polyA-tail of an mRNA intermediate (gray line with As) of an L1, SVA, or Alu element binds complementary to the cleaved TTTT overhang. D. The retrotransposon is reverse transcribed by the enzymatic activity of ORF2 protein.144 E. The exact mechanism of second strand DNA cleavage is not well understood. F. The details of second strand DNA synthesis are not yet known. G. Illustration of the inte- gration of the new retrotransposon insertion (light gray) into the host DNA (black). Medium gray are incomplete TSDs. H. The retrotrans- poson insertion is flanked by TSDs (medium gray). beneficial for the reconstruction of AluJ; intermediates are members of mate species.13,49,50,59 These subfa- retrotransposon relationships since AluS; and the youngest insertions milies can be short- or long-lived, they allow for persisting nodes, lead- belong to AluY (Fig. 2).13,52,53 AluJ with or without generation of new ing to multiple branching events com- subfamilies were actively amplifying subfamilies. Consequently, the paral- monly observed in retrotransposon early in primate evolution and can lel evolution of several Alu subfamilies phylogenies, in particular with Alu be detected in all primates. The and lineages throughout primate evo- elements.49,50 deepest primate divergence falls into lution has created a diverged, ‘‘bush- the period when the AluJ subfamilies like’’ picture, with several branches EVOLUTION OF ALU AND L1 were expanding.29,34 The Alu lineage and subbranches, and each primate SUBFAMILIES IN PRIMATE in the Tarsiiformes, a sister group to lineage possessing its own unique net- anthropoid primates, might also work of Alu subfamilies (Fig. 2). GENOMES have been derived from AluJ.54–56 The evolution of L1 in diverse pri- The identification of subfamily Before the divergence of platyrrhines mates is altogether less well charac- structure has led to a better under- and catarrhines, AluS derived from terized than that of Alu, with most of standing of the relationship between AluJ and successively took over our understanding derived from retrotransposon and primate genome amplification at approximately 55 detailed analyses of the three evolution. Alu elements are specific mya. More recently, AluY evolved sequenced genomes, in particular the to primates. (The origin of Alu ele- from AluS subfamily members and human genome. While the existence ments is reviewed in detail by Roy- succeeded in the catarrhine line- of more than one L1 subfamily Engel, Batzer and Deininger.)51 Alu age.57,58 Detailed reconstruction of within a species is common, most subfamilies have been grouped to- Alu subfamilies shows parallel retro- studies point toward the propagation gether into three major subfamilies. transposition activity of several dif- of a single L1 lineage with a linear The oldest subfamilies belong to ferent subfamilies in any given pri- evolution pattern in mammalian ARTICLES LINEs and SINEs of Primate Evolution 241

Figure 2. Alu subfamily evolution in primates. The evolution of Alu elements in primate genomes is roughly illustrated. The left panel shows the three major Alu subfamilies, AluJ (green), S (blue), and Y (red). The range of their activity and continuous evolution is indicated through a color gradient. The estimated sizes of AluY, S, and J subfamilies, drawn from Wang and coworkers,145 are given at the bottom left. The major Alu subfamily thought to be active at the time of divergence of each lineage is shown at the base of each lineage branch. Lineage-specific subfamilies are likely derived from that subfamily. The color gradient within each lineage branch indicates that Alu subfamilies continued to evolve in each lineage and created lineage-specific subfamilies. Each major subfamily contains several subfamilies. Several different Alu subfamilies are commonly active in parallel and often evolve, causing diverged Alu subfamily networks. On the right, the evolution of lineage specific Alu subfamilies in the Cercopithecoidea lineage leading to rhesus macaque (M. mulatta) is exemplified. The network was reconstructed with Alu subfamily data from Han and associates with permission from the original pub- lisher (Science).59 [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] genomes over prolonged periods.60–65 mulatta genome revealed that L1PA5 greatly between different lineages. For However, the coexistence of two or gave rise to the OWM-specific L1 lin- example, the ringtailed lemur (Lemur more L1 lineages over prolonged eage.7,59 The origin of lineage-spe- catta) genome appears to contain the periods has been reported in some cific L1 insertions in the OWM lineage lowest Alu density thus far identified primates.8 Early in primate evolution may have occurred early in L1PA5 in primates, whereas the common as many as three L1 lineages, L1MA4-1, evolution, causing mostly lineage-spe- marmoset (Callithrix jacchus) genome L1PB3-1,andL1PA17-1, have been cific insertions in both OWMs and shows evidence of the highest Alu den- active in parallel for up to 30 myrs hominoids. This illustrates that con- sity.32,34 A burst of both L1 and Alu (Fig. 3).60 Intriguingly, the 50UTRs clusions extrapolated from the per- insertions occurred 35–40 mya in (untranslated regions) of these three spective of one genome onto another anthropoid primates.60,66 Since then, lineages were clearly distinct and the need to be regarded with caution. the overall collective retrotransposition overall combined retrotransposition rate seems to have decelerated in rate was not exceedingly high, indi- anthropoids. The propagation rate of cating that these L1 lineages might RETROTRANSPOSITION INSERTION both L1 and Alu appears to be higher 60 have competed for host factors. RATE VARIATION DURING in OWMs than in human and chim- L1PA succeeded and has remained panzee; in humans, the retrotranspo- PRIMATE EVOLUTION active within the anthropoid lineage sition rate of Alu elements appears leading to humans (Fig. 3).8 The propagation of lineage-specific to be higher than that in chimpan- An analysis of orthologous L1 sites retrotransposon subfamilies and the zee.5,6,31,34,67,68 through the lens of the human ge- accumulation of their respective copy Many factors can affect the via- nome has indicated the absence of numbers in different primate taxa vary bility of actively mobilizing retro- L1PA5 insertions in baboon and greatly over the evolution of primates. transposons and their propagation activity of L1PA7 before and after Retrotransposition rates have varied rates. Extremely active retrotranspo- the divergence of the Cercopitheci- widely over the last 65 myrs of pri- sons are highly susceptible to loss or dae, Old World monkeys (OWMs) mate evolution, with periods of low saturation during speciation events or from the hominoid lineage.60 How- and high activity.8,53,60,66 Moreover, population bottlenecks because they ever, a subsequent analysis of the M. the retrotransposition rate varied are commonly polymorphic within 242 Konkel et al. ARTICLES

Figure 3. Evolution of L1 in primates. The evolution of L1 in primates on the basis of analyses of the human, chimpanzee, and rhesus macaque genome sequences is loosely illustrated. Fading of the lines indicates that the time span of subfamily activity is roughly esti- mated. In general, average age estimates of the different subfamilies were taken from Khan, Smit, and Boissinot60; the activity range of L1PA1-5 was estimated on the basis of lineage-specific analyses.59,78,146 The L1PB and L1MA lineages are not shown as separate subfa- milies, and L1PA6-17 subfamilies have been combined. Subfamilies L1PA1-5 are shown as separate lines to illustrate a typical pattern for the evolution of L1 subfamilies. All lineages show a similar pattern of overlapping activity of different subfamilies. The figure shows that L1PA1 is presently active in humans; chimpanzee-specific subfamilies are derived from L1PA2, with parallel evolution of two L1 lineages over time (branched line) and L1PA5 was the founder for OWM-specific subfamilies including rhesus macaques. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] a population.10,69 Consequently, the ceivably, environmental stress factors provided the framework to address number of active retrotransposons could alter the retrotransposition these primary questions: How do we can vary greatly and affect the amplifi- rate.74 Different factors may have con- distinguish active from inactive retro- cation rate (increase, decrease, or no tributed to retrotransposition rate var- transposons? How can this be used to change) after speciation or a bottle- iations during primate evolution. study primate evolution? Based on the neck. In addition, it has been pro- typical distribution pattern of SINEs posed that interaction of host factors RETROTRANSPOSON and LINEs observed in primate with the enzymatic L1 machinery AMPLIFICATION MODEL genomes, we know that only a small could cause periods of high and low fraction of retrotransposons are capa- activity.60,70 For example, members of IN PRIMATE GENOMES ble of retrotransposition at any given the ABOBEC family have been found The previous sections describing the time. This is best characterized by a to inhibit L1 and Alu.71–73 Con- dynamics of retrotransposons have modified ‘‘master-gene’’ model.49,50,75 ARTICLES LINEs and SINEs of Primate Evolution 243

In this model, ‘‘master’’ elements of a posons that have reached fixation can the insertion site. Consequently, par- subfamily create copies over a pro- be used for phylogenetic studies to allel insertions of two independent longed period with a few offspring denote branching events, whereas poly- retrotransposons within the ampli- elements that generate the bulk of de morphic insertions within a species con are uncommon events. About novo insertions.75,76 These highly can be used to study the population 0.4% of more than 11,000 primate- active elements are usually relatively genetic structure. specific retrotransposon insertions short-lived due to their highly delete- were identified as parallel insertions, rious nature to their host.75 with all but five insertions caused by RETROTRANSPOSONS AS The identification of potentially so-called near-parallel insertion active L1s is relatively straightfor- PHYLOGENETIC AND events.95 No parallel L1 insertions ward, as only full-length elements POPULATION GENETIC MARKERS have been identified to date, prob- with intact open reading frames Phylogenetics reconstructs evolu- ably because of the size variability of (ORFs) are capable of retrotranspos- L1 insertions resulting from their fre- tionary relationships between various 0 92,99,100 ing themselves. In primate genomes, species. It has been shown that retro- quent 5 truncation. Conse- only a small number of L1 insertions transposons represent highly valua- quently, in contrast to many other satisfy these requirements, as the ble genetic systems to infer the rela- DNA markers, retrotransposon inser- majority of L1s are truncated on 86–88 tions can be considered as nearly tionships of different species. 13,89,95–97,99 insertion and/or have accumulated Consequently, these markers, partic- homoplasy-free markers. random mutations. For example, in ularly Alu elements and, to a lesser A shared retrotransposon insertion the human diploid genome, only extent, L1, are now commonly used between two species or two individu- about 80–100 L1s are considered ret- to investigate phylogenetic and popu- als more than likely indicates a com- rotranspositionally competent on the mon ancestor. Thus, in contrast to 77 lation genetic relationships within basis of their nucleotide sequence. the order of primates.29,30,55,88–97 Alu most other commonly used marker This number appears even lower for elements are used more commonly systems, retrotransposon markers indi- the chimpanzee and rhesus macaque cate identity by descent as opposed to genomes with five and nine intact ele- identity by state.13,95,100 In general, 59,78 ments, respectively. Consequently, Retrotransposons precise deletions of retrotransposons only a limited number of L1s, in par- arerareandunlikelyevents.91,95,101 ticular members of the youngest sub- represent highly Consequently, the ancestral state is families, are active, and an even valuable genetic marked by absence of the retro- smaller number of L1s contribute to transposon.13,96 This is in contrast to the bulk of novel insertions. systems to infer the other commonly used marker systems The identification of Alu source relationships of different within which the ancestral state can- elements13,79 is far more demanding not be unambiguously predicted. Like than it is for L1, since they do not species other markers, polymorphic retrotrans- contain coding sequence and are poson insertions are not immune to highly similar to each other. Recent incomplete lineage sorting. Different research efforts have identified sev- because they are relatively easy to scenarios can commonly result in eral factors that alter the retrotrans- genotype with a single PCR reaction incomplete lineage sorting. Examples position activity of Alu elements. due to their relatively small size include two species with a prolonged These include polyA-tail length, nu- (300 bp). In contrast, the insertion divergence time over several million cleotide substitutions within the size of L1 varies widely, from about years due to, for instance, a large an- polyA-tail, distance of the polymerase 50 bp up to larger than 6 kb.5,98 cestral population size, recurrent re- III TTTT termination signal from the Accordingly, more than one PCR introduction of populations to the end of an Alu element, sequence varia- reaction is often required to geno- gene pool, or divergence of several tion from the consensus sequence of type larger L1 insertions. This makes species over a very short time. In pri- an active subfamily, interaction ability them less convenient than Alu ele- mates, incomplete lineage sorting of SRP9/14 to build RNA/protein com- ments, but they provide equal phylo- has been described, but altogether it plexes, and 50flanking sequence.14,80–84 genetic value and can be used in appears to be a minor problem.95,102 In addition, while not required, conjunction with or as alternatives to In general, the use of several markers ORF1p increases the retrotransposi- Alu elements. for each branch is recommended to tion rate of Alu elements.85 The inter- Retrotransposons are compelling determine lineage sorting events. play of the different factors has not genetic markers with unique proper- yet been studied in detail. Conceiv- ties relative to other commonly used ably, not all factors are required systems, among them single nucleo- RETROTRANSPOSON-BASED simultaneously for source drivers. The tide polymorphisms, microsatellites, PRIMATE PHYLOGENETIC STUDIES combination of varying mobilization and restriction fragment length poly- ratesofsourceelementsandtheir morphisms. Retrotransposons insert Here we have outlined the evolu- continued evolution has shaped each quasi-randomly into the host genome tionary mechanisms by which differ- primate genome uniquely. Retrotrans- and create unique TSDs specific to ent primate taxa accumulate a unique 244 Konkel et al. ARTICLES pattern of retrotransposon insertions, and coworkers109 in a comprehensive divergence from their respective sub- with some shared by other closely SINE-based dichotomous key for the family consensus sequences. Individ- related taxa and others specific to that identification of primates. In this uals within a species remain poly- lineage. This hierarchical accumula- study, a total of 443 Alu loci, 81 of morphic for insertion presence or ab- tion of ‘‘identical by descent’’ retro- which were novel, were evaluated to sence and create discrete differences transposon markers allows researchers characterize some of the deepest within the gene pool. This can be to target subfamilies that were active nodes of the primate phylogenetic tree used to reconstruct the population during the evolutionary period of in- and to refine several previously unre- structure of that species. Detailed terest to identify candidate loci with solved terminal branches.109 More- knowledge about population dynam- phylogenetic value. On the basis of over, this dichotomous key is highly ics is of great interest for under- these retrotransposon insertion pat- valuable to confirm a species and/or to standing the diversity within a spe- terns (presence or absence among dif- identify an unknown species. cies and the complexity of intraspe- ferent species) numerous phylogenetic Retrotransposons have also been cies relationships. This information relationships have been successfully used to exclude species from the is also useful for conservation reconstructed across almost the entire primate order. Schmitz and col- efforts, such as the reintroduction of order of primates (reviewed by Xing leagues110,111 presented clear evidence a species to the wild. and coworkers91). Figure 4 illustrates separating dermopterans (colugos or Retrotransposons have been com- the to-date use of retrotransposon flying lemurs) from primates. In this monly used to infer the population markers to infer primate phylogeny. case, the absence of Alu elements uni- structure of humans as well as non- The availability of sequenced pri- versal to all primates from the flying human primates and to determine mate genomes, in particular the lemur (Cynocephalus variegatus)ge- human geographic origins for foren- human genome, has revolutionized nome placed the flying lemur outside sics.7,89,91,96,97,113–116 The population the field of phylogenetics. Over the the primate order.91,110 The complete structure of human populations and last decade or so, several heavily relationship among Primates, Scan- their history, in addition to the popu- debated questions have been success- dentia, and Dermoptera, also known lation architecture of the worldwide fully resolved with retrotransposon as the ‘‘primate-tree shrew-colugo tri- human population has been investi- markers in the primate order. For chotomy,’’ has yet to be satisfactorily gated intensively with the use of Alu example, a phylogenetic study using resolved. The Tu Type I and Type II retrotransposon markers either alone Alu elements unequivocally resolved families of SINEs identified in the tree or in combination with other the human-chimpanzee-gorilla trichot- shrew (Tupaia belangeri) are derived markers.115–117 Most population struc- omy.103 Three separate studies con- from 7SL RNA, as are Alu SINEs in ture research has focused on humans firmed the monophyly of platyrrhines primates and B1 SINEs in rodents, because of the broad geographic dis- and determined the branching order but as yet no conclusive evidence pla- tribution of the species and the abun- of various families of platyrrhine pri- ces tree shrews closer to either prima- dant genetic information available for mates.30,104,105 This work was recently tes or rodents.112 The CYN-SINE fam- humans. However, retrotransposons confirmed and expanded by Osterholz, ily identified in the C. variegatus ge- have been successfully implemented Walter, and Roos106 usingatotalof nome is specific to Dermoptera and across the mammalian lineage for 128 retrotransposon integrations from thus is uninformative for resolving the inferring the population structure of across all platyrrhine genera. In addi- phylogeny of Dermoptera in relation marsupials such as the opossom tion, several studies have used Alu ele- to Scandentia, Primates, and Roden- (Monodelphis domestica)118 and ments extensively to refine the branch- tia.111 As more sequencing data monotremes such as the platypus (Or- ing pattern of OWMs.59,102,107,108 Xing become available, future studies may nithorhynchus anatinus).119 The only and colleagues107 reported a mobile- identify phylogenetically informative nonhuman primate population element-based phylogeny of OWMs retrotransposon markers that were genetic study using retrotransposon using 285 novel Alu insertions. This active during this evolutionary period. markers (Alu and L1) published to work was further refined within sub- date investigated the population struc- family Cercopithecinae (tribe Cercopi- ture of rhesus macaques.7 In this thecini) using 151 novel Alu insertion RETROTRANSPOSONS IN PRIMATE study, Chinese rhesus macaques could loci from 11 species.108 Recently, Li be clearly distinguished from Indian POPULATION GENETIC STUDIES and coworkers identified 298 new Alu rhesus macaques.7 insertion loci from the genus Macaca The same properties of retrotrans- To infer population structure, the within OWMs and reported a com- posons that make them useful phylo- use of more than 50 or (better, 75 to prehensive, robust resolution of mac- genetic markers (homoplasy free and 100) polymorphic retrotransposon aque phylogeny with higher statisti- identical by descent characters) also loci is required.116,120 The minimal cal support than that in previous make them ideal for population number of insertions necessary for studies.102 Roos, Schmitz, and Zisch- genetic studies. However, instead of reliable analysis of population struc- ler29 used SINE insertions to construct targeting fixed insertions, population ture depends on the level of genetic a strongly supported phylogenetic tree studies focus on recently integrated similarity of the populations.91 representing 20 strepsirrhine species. insertions that are still polymorphic Fewer loci are required to infer the This work was supported by Herke and belong to subfamilies with a low population structure of two distinct ARTICLES LINEs and SINEs of Primate Evolution 245

Figure 4. Primate retrotransposon-based phylogenetic tree. This phylogenetic cladogram of primates is supported by retrotransposon markers. Where possible, we show resolution up to the species level. The core of the cladogram (in white) was reconstructed from a recent review by Xing and colleagues.91 More detailed information on the species level has been integrated for Hominidae (green)109 and Cercopithecidae with Cercopithecinae and Colobinae (yellow),107 Macaca (blue),102 and Cercopithecini (guenon, red).108 Three platyrrhine branching events (asterisk) were resolved by Osterholz, Walter, and Roos.106 We followed the nomenclature of Groves’ Primate Taxonomy.3 [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

(often geographically more removed) LINEAGE-SPECIFIC Type III, it is a monomeric element 121 populations than for populations NONAUTONOMOUS derived from tRNA. Type III ele- with more similar gene pools. The RETROTRANSPOSONS ments have been shown by Southern success of retrotransposon-based pop- blot analyses to be present in galagos ulation genetic studies, the unique We will briefly discuss the emer- and lorises but absent from lemur characteristics of retrotransposon gence of two less common primate species,29 indicating lineage specific- markers, and the relative ease of SINEs, as well as SVA elements ity and origin after the divergence of their use make them an attractive (SINE-R/VNTR/Alu), a composite ret- Lorisiformes and Lemuriformes.29,122 marker system to investigate the rotransposon of relatively recent ori- The second SINE recognized in the population structure of other pri- gin. One SINE, first discovered in galago genome, a Type II element, mate species. Galago crassicaudatus, is termed represents a chimeric SINE most 246 Konkel et al. ARTICLES likely created by the integration of a has shaped the landscape of primate a major influence on genome architec- Type III element into the center of genomes. Primates regularly experi- ture. Due to their continuous insertion an Alu element.121 Both retrotrans- ence de novo retrotransposon inser- throughout primate evolution and posons contain typical hallmarks of tions, occasionally resulting in dis- their unique features, retrotransposons SINEs: TSDs flanking the insertion, ease.10,125,126 For example, the latest serveasvaluablemarkersforthe an A-rich 30 terminus, and a split estimates for Alu, L1, and SVA inser- investigation of phylogenetic, popula- intragenic RNA polymerase III pro- tions within the human species are tion genetic, and forensic relation- moter.122 Type II elements appear to one in 21, 212, and 916 live births, ships. With some evidence of varying have been highly active in galagos respectively.127 This is in good agree- retrotransposition rates in different (G. crassicaudatus and G. senegalen- ment with previous estimates for Alu primate lineages, the evolution of ret- sis).122 To our knowledge, there is no insertion rates.28 Earlier estimates for rotransposons might vary consider- further information available about de novo L1 insertion rates on the basis ably. As more sequenced primate the distribution of Type II elements of transgenic mouse models indicated genomes become available, we will be in other closely related species. a roughly four times higher activity able to draw a more complete picture Another example of lineage-specific rate.12,128 Occasionally, genomic dele- of retrotransposon evolution in the nonautonomous retrotransposons is tions are associated with retrotrans- whole primate lineage. the SVA family of elements, which are poson insertions, potentially resulting specific to the hominoid lineage and in the loss of important genetic infor- ACKNOWLEDGMENTS are most prevalent in their current mation such as exons.129,130 Apart form in the great apes.123,124 However, from insertional mutagenesis, which We thank G. Cook, T. J. Meyer, precursors of SVA have been identified in itself has a major impact on pri- and D. Srikanta for critical discus- in OWMs, indicating that SVA evolved mate genomes, the accumulation of sion during manuscript preparation. over several million years before mobi- very similar sequences makes the ge- Special thanks to B. Ullmer for his lizing in its current state.59 In the pub- nome more susceptible to nonallelic advice on figure designs and manu- lic human genome, 3000 insertions homologous recombination events script edits. We also want to thank have been identified, indicating their that can cause genome rearrange- the four anonymous reviewers for successful propagation in spite of their ments, including deletions and dupli- their thoughtful advice. Research in relatively recent origin.123 Quantitative cations.131–133 Other types of recom- the Batzer laboratory on retrotrans- PCR analyses indicate a similar num- bination events, such as Alu-medi- posons has been supported by grants ber of SVA insertions in the chimpan- ated gene conversion, have been from the State of Louisiana Board of zee, gorilla, and human genomes, a shown to alter gene function.14 An Regents Governor’s Biotechnology lower number in the orangutan example of this is the Alu-mediated Initiative GBI (2002-005) (M.A.B.), (Pongo pygmaeus)(1,000 insertions); loss of the agouti signaling protein National Science Foundation BCS- they also indicate the near absence gene in gibbons.134 Exonization of 0218338 (M.A.B.), and National Insti- of SVA insertions in the siamang retrotransposons, another mecha- tutes of Health PO1 AG022064 and (Symphalangus syndactylus, 40 inser- nism that retrotransposons contrib- RO1 GM59290 (M.A.B.). tions).123 Thereisclearevidenceof ute to structural variation14,135 has active SVA retrotransposition in the taken place occasionally during the human genome. De novo SVA inser- course of primate evolution.136 REFERENCES tions have been identified as the Although exonization is not wide- 1 Chatterjee H, Simon H, Ian B, Colin G. 2009. underlying cause of some human dis- spread, it is estimated that about 5% Estimating the phylogeny and divergence times eases.10,125,126 Whole genome analyses of alternatively spliced exons in of primates using a supermatrix approach. will prove useful in confirming these humans are derived from Alu ele- BMC Evol Biol 9:259. 14 2 Tavare´ S, Marshall C, Will O, Soligo C, Mar- copy number estimates. It is conceiva- ments. Occasionally, molecular tin R. 2002. Using the fossil record to estimate ble that in more diverged species the domestication of retrotransposons the age of the last common ancestor of extant copy number is underestimated by occurs as demonstrated for the SET- primates. Nature 416:726–729. quantitative PCR experiments that use MAR gene.14,137 In addition, L1 and 3 Groves C. 2001. Primate taxonomy. Washing- ton, D.C: Smithsonian Press. human reference sequences. Conceiv- SVA have been identified in 30 and 50 4 Goodman M, Grossman LI, Wildman DE. ably, even more lineage-specific retro- transduction events that occasion- 2005. Moving primate genomics beyond the transposon families will be identified ally can give rise to a new functional chimpanzee genome. Trends Genet 21:511–517. as more sequenced primate genomes gene.12,17,138 5 Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, become available, allowing exhaustive Doyle M, FitzHugh W. 2001. Initial sequencing comparative genomics studies. and analysis of the human genome. Nature CONCLUSIONS 409:860–921. Retrotransposons have had a major 6 Chimpanzee Sequencing and Analysis Con- sortium. 2005. Initial sequence of the chimpan- IMPACT ON GENOME influence on primate genomes and zee genome and comparison with the human have contributed to the expansion of genome. Nature 437:69–87. ARCHITECTURE primate genome sizes. In addition, ret- 7 Rhesus Macaque Genome Sequencing and Analysis Consortium. 2007. Evolutionary and Retrotransposons are major con- rotransposons have shaped each pri- biomedical insights from the rhesus macaque tributors to structural variation that mate genome uniquely and have had genome. Science 316:222–234. ARTICLES LINEs and SINEs of Primate Evolution 247

8 Smit A, To´ th G, Riggs A, Jurka J. 1995. An- 29 Roos C, Schmitz J, Zischler H. 2004. Pri- 49 Cordaux R, Hedges DJ, Batzer MA. 2004. cestral, mammalian-wide subfamilies of LINE-1 mate jumping genes elucidate strepsirrhine Retrotransposition of Alu elements: how many repetitive sequences. J Mol Biol 246:401–417. phylogeny. Proc Natl Acad Sci USA 101:10650– sources? Trends Genet 20:464–467. 9 Smit AF. 1996. The origin of interspersed 10654. 50 Price AL, Eskin E, Pevzner PA. 2004. Whole- repeats in the human genome. Curr Opin Genet 30 Ray DA, Batzer MA. 2005. Tracking Alu evo- genome analysis of Alu repeat elements reveals Dev 6:743–748. lution in New World primates. BMC Evol Biol complex evolutionary history. Genome Res 10 Belancio V, Hedges D, Deininger P. 2008. 5:51–59. 14:2245–2252. Mammalian non-LTR retrotransposons: for bet- 31 Liu G, Zhao S, Bailey JA, Sahinalp SC, 51 Roy-Engel AM, Batzer MA, Deininger PL. ter or worse, in sickness and in health. Genome Alkan C, Tuzun E, Green ED, Eichler EE. 2003. 2008. Evolution of human retrosequences: Alu. Res 18:343–358. Analysis of primate genomic variation reveals a In: Encyclopedia of Life Sciences. Chichester, 11 Goodier J, Kazazian H. 2008. Retrotranspo- repeat-driven expansion of the human genome. UK: John Wiley & Sons. sons revisited: the restraint and rehabilitation Genome Res 13:358–368. 52 Jurka J, Smith T. 1988. A fundamental divi- of parasites. Cell 135:23–35. 32 Boissinot S, Roos C, Furano A. 2004. Differ- sion in the Alu family of repeated sequences. 12 Kazazian HH Jr. 2004. Mobile elements: driv- ent rates of LINE-1 (L1) retrotransposon ampli- Proc Natl Acad Sci USA 85:4775–4778. ers of genome evolution. Science 303:1626–1632. fication and evolution in New World monkeys. 53 Shen MR, Batzer MA, Deininger PL. 1991. J Mol Evol 58:122–130. 13 Batzer MA, Deininger PL. 2002. Alu repeats Evolution of the master Alu gene(s). J Mol Evol and human genomic diversity. Nat Rev Genet 33 Boissinot S, Furano A. 2005. The recent evo- 33:311–320. 3:370–379. lution of human L1 retrotransposons. Cytoge- 54 Zietkiewicz E, Richer C, Sinnett D, Labuda net Genome Res 110:402–406. 14 Cordaux R, Batzer MA 2009. The impact of D. 1998. Monophyletic origin of Alu elements in retrotransposons on human genome evolution. 34 Liu G, Alkan C, Jiang L, Zhao S, Eichler E. primates. J Mol Evol 47:172–182. Nat Rev Genet 10:691–703. 2009. Comparative analysis of Alu repeats in 55 Schmitz J, Ohme M, Zischler H. 2001. SINE primate genomes. Genome Res 19:876–885. 15 Ostertag EM, Kazazian HH Jr. 2001. Biology insertions in cladistic analyses and the phyloge- of mammalian L1 retrotransposons. Annu Rev 35 Smit AF. 1999. Interspersed repeats and netic affiliations of Tarsius bancanus to other Genet 35:501–538. other mementos of transposable elements in primates. Genetics 157:777–784. mammalian genomes. Curr Opin Genet Dev 16 Feng Q, Moran JV, Kazazian HH Jr., Boeke 56 Schmitz J, Roos C, Zischler H. 2005. Pri- 9:657–663. JD. 1996. Human L1 retrotransposon encodes a mate phylogeny: molecular evidence from retro- conserved endonuclease required for retrotrans- 36 Dewannieux M, Heidmann T. 2005. LINEs, posons. Cytogenet Genome Res 108:26–37. position. Cell 87:905–916. SINEs and processed pseudogenes: parasitic 57 Batzer MA, Deininger PL, Hellmann-Blum- 17 Szak ST, Pickeral OK, Makalowski W, strategies for genome modeling. Cytogenet Ge- berg U, Jurka J, Labuda D, Rubin CM, Schmid Boguski MS, Landsman D, Boeke JD. 2002. Mo- nome Res 110:35–48. CW, Zietkiewicz E, Zuckerkandl E. 1996. Stand- lecular archeology of L1 insertions in the 37 Han J, Boeke J. 2005. LINE-1 retrotranspo- ardized nomenclature for Alu repeats. J Mol human genome. Genome Biol 3:research0052. sons: modulators of quantity and quality of mam- Evol 42:3–6. 18 Dombroski BA, Mathias SL, Nanthakumar malian gene expression? Bioessays 27:775–784. 58 Kapitonov V, Jurka J. 1996. The age of Alu E, Scott AF, Kazazian HH Jr. 1991. Isolation of 38 Nachman MW, Crowell SL. 2000. Estimate subfamilies. J Mol Evol 42:59–65. an active human transposable element. Science of the mutation rate per nucleotide in humans. 59 Han K, Konkel MK, Xing J, Wang H, Lee J, 254:1805–1808. Genetics 156:297–304. Meyer TJ, Huang CT, Sandifer E, Hebert K, 19 Esnault C, Maestre J, Heidmann T. 2000. 39 Miyamoto MM, Slightom JL, Goodman M. Barnes EW. 2007. Mobile DNA in Old World Human LINE retrotransposons generate proc- 1987. Phylogenetic relations of humans and monkeys: a glimpse through the rhesus maca- essed pseudogenes. Nat Genet 24:363–367. African apes from DNA sequences in the psi que genome. Science 316:238–240. 20 Wei W, Gilbert N, Ooi SL, Lawler JF, Oster- eta-globin region. Science 238:369–373. 60 Khan H, Smit A, Boissinot S. 2006. Molecu- tag EM, Kazazian HH, Boeke JD, Moran JV. 40 Labuda D, Striker G. 1989. Sequence conser- lar evolution and tempo of amplification of 2001. Human L1 retrotransposition: cis prefer- vation in Alu evolution. Nucleic Acids Res 17: human LINE-1 retrotransposons since the ori- ence versus trans complementation. Mol Cell 2477–2491. gin of primates. Genome Res 16:78–87. Biol 21:1429–1439. 41 Batzer MA, Kilroy GE, Richard PE, Shaikh 61 Furano AV, Hayward BE, Chevret P, Catze- 21 Kulpa DA, Moran JV. 2006. Cis-preferential TH, Desselle TD, Hoppens CL, Deininger PL. flis F, Usdin K. 1994. Amplification of the an- LINE-1 reverse transcriptase activity in ribonu- 1990. Structure and variability of recently cient murine Lx family of long interspersed cleoprotein particles. Nat Struct Mol Biol inserted Alu family members. Nucleic Acids Res repeated DNA occurred during the murine radi- 13:655–660. 18:6793–6798. ation. J Mol Evol 38:18–27. 22 Dewannieux M, Esnault C, Heidmann T. 42 Xing J, Hedges DJ, Han K, Wang H, Cor- 62 Boissinot S, Entezam A, Young L, Munson 2003. LINE-mediated retrotransposition of daux R, Batzer MA. 2004. Alu element mutation PJ, Furano AV. 2004. The insertional history of marked Alu sequences. Nat Genet 35:41–48. spectra: molecular clocks and the effect of DNA an active family of L1 retrotransposons in 23 Luan DD, Korman MH, Jakubczak JL, Eick- methylation. J Mol Biol 344:675–682. humans. Genome Res 14:1221–1231. bush TH. 1993. Reverse transcription of R2Bm 43 Schmid CW. 1991. Human Alu subfamilies 63 Pascale E, Liu C, Valle E, Usdin K, Furano RNA is primed by a nick at the chromosomal and their methylation revealed by blot hybrid- AV. 1993. The evolution of long interspersed target site: a mechanism for non-LTR retro- ization. Nucleic Acids Res 19:5613–5617. repeated DNA (L1, LINE 1) as revealed by the transposition. Cell 72:595–605. 44 Furano AV, Duvernell DD, Boissinot S. 2004. analysis of an ancient rodent L1 DNA family. 24 Cost GJ, Feng Q, Jacquier A, Boeke JD. L1 (LINE-1) retrotransposon diversity differs J Mol Evol 36:9–20. 2002. Human L1 element target-primed reverse dramatically between mammals and fish. 64 Boissinot S, Furano AV. 2001. Adaptive evo- transcription in vitro. Embo J 21:5899–5910. Trends Genet 20:9–14. lution in LINE-1 retrotransposons. Mol Biol 25 Morrish TA, Gilbert N, Myers JS, Vincent 45 Daniels GR, Fox GM, Loewensteiner D, Evol 18:2186–2194. BJ, Stamato TD, Taccioli GE, Batzer MA, Schmid CW, Deininger PL. 1983. Species-spe- 65 Pascale E, Valle E, Furano AV. 1990. Ampli- Moran JV. 2002. DNA repair mediated by endo- cific homogeneity of the primate Alu family of fication of an ancestral mammalian L1 family nuclease-independent LINE-1 retrotransposi- repeated DNA sequences. Nucleic Acids Res of long interspersed repeated DNA occurred tion. Nat Genet 31:159–165. 11:7579–7593. just before the murine radiation. Proc Natl 26 Sen SK, Huang CT, Han K, Batzer MA. 46 Rogers J. 1985. The origin and evolution of Acad Sci USA 87:9481–9485. 2007. Endonuclease-independent insertion pro- retroposons. Genome Evol Prokaryotes Eukar- 66 Britten RJ. 1994. Evidence that most human vides an alternative pathway for L1 retrotrans- yotes 93:187–279. Alu sequences were inserted in a process that position in the human genome. Nucleic Acids 47 Batzer MA, Rubin CM, Hellmann-Blumberg ceased about 30 million years ago. Proc Natl Res 35:3741–3751. U, Alegria-Hartman M, Leeflang EP, Stern JD, Acad Sci USA 91:6148–6150. 27 Srikanta D, Sen S, Huang C, Conlin E, Rhodes Bazan HA, Shaikh TH, Deininger PL, Schmid 67 Watanabe H, Fujiyama A, Hattori M, Taylor R, Batzer M. 2009. An alternative pathway for Alu CW. 1995. Dispersion and insertion polymor- TD, Toyoda A, Kuroki Y, Noguchi H, BenKahla retrotransposition suggests a role in DNA double- phism in two small subfamilies of recently ampli- A, Lehrach H, Sudbrak R. 2004. DNA sequence strand break repair. Genomics 93:205–212. fied human Alu repeats. J Mol Biol 247:418–427. and comparative analysis of chimpanzee chro- 28 Cordaux R, Lee J, Dinoso L, Batzer MA. 48 Kass DH, Batzer MA, Deininger PL. 1995. mosome 22. Nature 429:382–388. 2006. Recently integrated Alu retrotransposons Gene conversion as a secondary mechanism of 68 Hedges DJ, Callinan PA, Cordaux R, Xing J, are essentially neutral residents of the human short interspersed element (SINE) evolution. Barnes E, Batzer MA. 2004. Differential alu mo- genome. Gene 373:138–144. Mol Cell Biol 15:19–25. bilization and polymorphism among the human 248 Konkel et al. ARTICLES

and chimpanzee lineages. Genome Res 14: 88 Shedlock AM, Okada N. 2000. SINE inser- 106 Osterholz M, Walter L, Roos C. 2009. Ret- 1068–1075. tions: powerful tools for molecular systematics. ropositional events consolidate the branching 69 Hedges DJ, Batzer MA. 2005. From the mar- Bioessays 22:148–160. order among New World monkey genera. Mol gins of the genome: mobile elements shape pri- 89 Batzer MA, Stoneking M, Alegria-Hartman Phylogenet Evol 50:507–513. mate evolution. Bioessays 27:785–794. M, Bazan H, Kass DH, Shaikh TH, Novick GE, 107 Xing J, Wang H, Han K, Ray D, Huang C, 70 Furano AV. 2000. The biological properties Ioannou PA, Scheer WD, Herrera RJ. 1994. Chemnick L, Stewart C, Disotell T, Ryder O, and evolutionary dynamics of mammalian African origin of human-specific polymorphic Batzer M. 2005. A mobile element based phy- LINE-1 retrotransposons. Prog Nucleic Acid Alu insertions. Proc Natl Acad Sci USA logeny of Old World monkeys. Mol Phylogenet Res Mol Biol 64:255–294. 91:12288–12292. Evol 37:872–880. 71 Hulme AE, Kulpa DA, Perez JLG, Moran JV. 90 Minghetti P, Dugaiczyk A. 1993. The emer- 108 Xing J, Wang H, Zhang Y, Ray D, Tosi A, 2006. The Impact of LINE-1 retro transposition on gence of new DNA repeats and the divergence of Disotell T, Batzer M. 2007. A mobile element- the human genome. Genomic Disorders 35–55. primates. Proc Nat Acad Sci USA 90:1872–1876. based evolutionary history of guenons (tribe Cercopithecini). BMC Biology 5:5. 72 Bogerd H, Wiegand H, Hulme A, Garcia- 91 Xing J, Witherspoon DJ, Ray DA, Batzer Perez J, O’shea K, Moran J, Cullen B. 2006. Cel- MA, Jorde LB. 2007. Mobile DNA elements in 109 Herke S, Xing J, Ray D, Zimmerman J, lular inhibitors of long interspersed element 1 primate and . Am J Phys Cordaux R, Batzer M. 2007. A SINE-based di- and Alu retrotransposition. Proc Nat Acad Sci Anthropol Suppl 45(suppl):2–19. chotomous key for primate identification. Gene 390:39–51. USA 103:8780–8785. 92 Konkel MK, Wang J, Liang P, Batzer MA. 73 Schumann G. 2007. APOBEC3 proteins: 2007. Identification and characterization of 110 Schmitz J, Ohme M, Suryobroto B, Zisch- major players in intracellular defence against novel polymorphic LINE-1 insertions through ler H. 2002. The colugo (Cynocephalus variega- LINE-1-mediated retrotransposition. Biochem comparison of two human genome sequence tus, Dermoptera): the primates’ gliding sister? Soc Trans 35:637. assemblies. Gene 390:28–38. Mol Biol Evol 19:2308–2312. 74 Farkash EA, Prak ET. 2006. DNA damage 93 Vincent BJ, Myers JS, Ho HJ, Kilroy GE, 111 Schmitz J, Zischler H. 2003. A novel family and L1 retrotransposition. J Biomed Biotechnol Walker JA, Watkins WS, Jorde LB, Batzer MA. of tRNA-derived SINEs in the colugo and two 2006:37285–37292. 2003. Following the LINEs: an analysis of pri- new retrotransposable markers separating der- mate genomic variation at human-specific LINE- mopterans from primates. Mol Phylogenet Evol 75 Han K, Xing J, Wang H, Hedges DJ, Garber 28:341–349. RK, Cordaux R, Batzer MA. 2005. Under the 1 insertion sites. Mol Biol Evol 20:1338–1348. genomic radar: the stealth model of Alu amplifi- 94 Ryan SC, Dugaiczyk A. 1989. Newly arisen 112 Nishihara H, Terai Y, Okada N. 2002. cation. Genome Res 15:655–664. DNA repeats in primate phylogeny. Proc Natl Characterization of novel Alu- and tRNA-related Acad Sci USA 86:9360–9364. SINEs from the tree shrew and evolutionary 76 Leeflang EP, Liu WM, Chesnokov IN, implications of their origins. Mol Biol Evol Schmid CW. 1993. Phylogenetic isolation of a 95 Ray DA, Xing J, Salem AH, Batzer MA. 19:1964. human Alu founder gene: drift to new subfam- 2006. SINEs of a nearly perfect character. Syst ily identity [corrected]. J Mol Evol 37:559–565. Biol 55:928–935. 113 Batzer MA, Gudi VA, Mena JC, Foltz DW, Herrera RJ, Deininger PL. 1991. Amplification 77 Brouha B, Schustak J, Badge RM, Lutz- 96 Batzer MA, Deininger PL. 1991. A human- dynamics of human-specific (HS) Alu family Prigge S, Farley AH, Moran JV, Kazazian HH specific subfamily of Alu sequences. Genomics members. Nucleic Acids Res 19:3619–3623. Jr. 2003. Hot L1s account for the bulk of retro- 9:481–487. 114 Ray DA, Walker JA, Hall A, Llewellyn B, transposition in the human population. Proc 97 Stoneking M, Fontius JJ, Clifford SL, Soo- Natl Acad Sci USA 100:5280–5285. Ballantyne J, Christian AT, Turteltaub K, Batzer dyall H, Arcot SS, Saha N, Jenkins T, Tahir MA. 2005. Inference of human geographic ori- 78 LeeJ,CordauxR,HanK,WangJ,HedgesDJ, MA, Deininger PL, Batzer MA. 1997. Alu inser- gins using Alu insertion polymorphisms. Foren- Liang P, Batzer MA. 2007. Different evolutionary tion polymorphisms and human evolution: evi- sic Sci Int 153:117–124. fates of recently integrated human and chimpan- dence for a larger population size in Africa. Ge- zee LINE-1 retrotransposons. Gene 390:18–27. nome Res 7:1061–1071. 115 Watkins WS, Rogers AR, Ostler CT, Wood- ing S, Bamshad MJ, Brassington AM, Carroll 79 Deininger PL, Batzer MA, Hutchison CA 98 Myers JS, Vincent BJ, Udall H, Watkins WS, ML, Nguyen SV, Walker JA, Prasad BV. 2003. 3rd, Edgell MH. 1992. Master genes in mamma- Morrish TA, Kilroy GE, Swergold GD, Henke J, Genetic variation among world populations: lian repetitive DNA amplification. Trends Genet Henke L, Moran JV. 2002. A comprehensive inferences from 100 Alu insertion polymor- 8:307–311. analysis of recently integrated human Ta L1 phisms. Genome Res 13:1607–1618. elements. Am J Hum Genet 71:312–326. 80 Roy AM, West NC, Rao A, Adhikari P, Ale- 116 Bamshad MJ, Wooding S, Watkins WS, man C, Barnes AP, Deininger PL. 2000. 99 Ho HJ, Ray DA, Salem AH, Myers JS, Batzer Ostler CT, Batzer MA, Jorde LB. 2003. Human Upstream flanking sequences and transcription MA. 2005. Straightening out the LINEs: LINE-1 population genetic structure and inference of of SINEs. J Mol Biol 302:17–25. orthologous loci. Genomics 85:201–207. group membership. Am J Hum Genet 72:578–589. 81 Chesnokov I, Schmid CW. 1996. Flanking 100 Salem AH, Ray DA, Batzer MA. 2005. Iden- 117 Bamshad M, Kivisild T, Watkins WS, sequences of an Alu source stimulate transcrip- tity by descent and DNA sequence variation of Dixon ME, Ricker CE, Rao BB, Naidu JM, Pra- tion in vitro by interacting with sequence-spe- human SINE and LINE elements. Cytogenet sad BV, Reddy PG, Rasanayagam A. 2001. cific transcription factors. J Mol Evol 42:30–36. Genome Res 108:63–72. Genetic evidence on the origins of Indian caste 82 Roy-Engel AM, Salem AH, Oyeniran OO, 101 van de Lagemaat LN, Gagnier L, Med- populations. Genome Res 11:994–1004. Deininger L, Hedges DJ, Kilroy GE, Batzer MA, strand P, Mager DL. 2005. Genomic deletions 118 Mikkelsen TS, Wakefield MJ, Aken B, Ame- Deininger PL. 2002. Active Alu element ‘‘A-tails’’: and precise removal of transposable elements miyaCT,ChangJL,DukeS,GarberM,GentlesAJ, size does matter. Genome Res 12:1333–1344. mediated by short identical DNA segments in Goodstadt L, Heger A. 2007. Genome of the marsu- 83 Comeaux M, Roy-Engel A, Hedges D, Dein- primates. Genome Res 15:1243–1249. pial Monodelphis domestica reveals innovation in inger P. 2009. Diverse cis factors controlling 102 Li J, Han K, Xing J, Kim H, Rogers J, non-coding sequences. Nature 447:167–177. Alu retrotransposition: what causes Alu ele- Ryder O, Disotell T, Yue B, Batzer M. 2009. 119 Warren WC, Hillier LW, Marshall Graves ments to die? Genome Res 19:545–555. Phylogeny of the macaques (Cercopithecidae: JA, Birney E, Ponting CP, Grutzner F, Belov K, 84 Bennett E, Keller H, Mills R, Schmidt S, Macaca) based on Alu elements. Gene 448: Miller W, Clarke L, Chinwalla AT. 2008. Ge- Moran J, Weichenrieder O, Devine S. 2008. 242–249. nome analysis of the platypus reveals unique Active Alu retrotransposons in the human ge- 103 Salem AH, Ray DA, Xing J, Callinan PA, signatures of evolution. Nature 453:175–183. nome. Genome Res 18:1875–1883. Myers JS, Hedges DJ, Garber RK, Witherspoon 120 Witherspoon D, Marchani E, Watkins W, 85 Wallace N, Wagstaff B, Deininger P, Roy- DJ, Jorde LB, Batzer MA. 2003. Alu elements Ostler C, Wooding S, Anders B, Fowlkes J, Bois- Engel A. 2008. LINE-1 ORF1 protein enhances and hominid phylogenetics. Proc Natl Acad Sci sinot S, Furano A, Ray D. 2006. Human popula- Alu SINE retrotransposition. Gene 419:1–6. USA 100:12787–12791. tion genetic structure and diversity inferred 86 Murata S, Takasaki N, Saitoh M, Okada N. 104 Singer SS, Schmitz J, Schwiegk C, Zischler from polymorphic L1 (LINE-1) and Alu inser- 1993. Determination of the phylogenetic rela- H. 2003. Molecular cladistic markers in New tions. Hum Hered 62:30–46. tionships among Pacific salmonids by using World monkey phylogeny (Platyrrhini, Prima- 121 Daniels GR, Deininger PL. 1985. Repeat short interspersed elements (SINEs) as tempo- tes). Mol Phylogenet Evol 26:490–501. sequence families derived from mammalian ral landmarks of evolution. Proc Natl Acad Sci 105 Ray DA, Xing J, Hedges DJ, Hall MA, tRNA genes. Nature 317:819–822. USA 90:6995–6999. Laborde ME, Anders BA, White BR, Stoilova N, 122 Daniels GR, Deininger PL. 1991. Character- 87 Okada N, Shedlock AM, Nikaido M. 2004. Fowlkes JD, Landry KE. 2005. Alu insertion loci ization of a third major SINE family of repeti- Retroposon mapping in molecular systematics. and platyrrhine primate phylogeny. Mol Phylo- tive sequences in the galago genome. Nucleic Methods Mol Biol 260:189–226. genet Evol 35:117–126. Acids Res 19:1649–1656. ARTICLES LINEs and SINEs of Primate Evolution 249

123 Wang H, Xing J, Grover D, Hedges DJ, sion of human segmental duplications. Am J 140 McClintock B. 1950. The origin and behav- Han K, Walker JA, Batzer MA. 2005. SVA ele- Hum Genet 73:823–834. ior of mutable loci in maize. Proc Natl Acad Sci ments: a hominid-specific retroposon family. J 132 Sen SK, Han K, Wang J, Lee J, Wang H, USA 36:344–355. Mol Biol 354:994–1007. Callinan PA, Dyer M, Cordaux R, Liang P, Bat- 141 Babushok DV, Kazazian HH Jr. 2007. 124 Ostertag EM, Goodier JL, Zhang Y, Kaza- zer MA. 2006. Human genomic deletions medi- Progress in understanding the biology of zian HH Jr. 2003. SVA elements are nonautono- ated by recombination between Alu elements. the human mutagen LINE-1. Hum Mutat 28: mous retrotransposons that cause disease in Am J Hum Genet 79:41–53. 527–539. humans. Am J Hum Genet 73:1444–1451. 133 Han K, Lee J, Meyer TJ, Remedios P, Good- 142 Shen L, Wu LC, Sanlioglu S, Chen R, 125 Callinan P, Batzer, MA. 2006. Retrotrans- win L, Batzer MA. 2008. L1 recombination-associ- Mendoza AR, Dangel AW, Carroll MC, Zipf WB, posable elements and human disease. Genome ated deletions generate human genomic variation. Yu CY. 1994. Structure and genetics of the par- Dynamics 1:104–115. Proc Natl Acad Sci USA 105:19366–19371. tially duplicated gene RP located immediately 126 Chen JM, Stenson PD, Cooper DN, Ferec C. 134 Nakayama K, Ishida T. 2006. Alu-mediated upstream of the complement C4A and the C4B 2005. A systematic analysis of LINE-1 endonucle- 100-kb deletion in the primate genome: the loss genes in the HLA class III region. Molecular ase-dependent retrotranspositional events causing of the agouti signaling protein gene in the cloning, exon-intron structure, composite retro- human genetic disease. Hum Genet 117:411–427. lesser apes. Genome Res 16:485. poson, and breakpoint of gene duplication. J Biol 127 Xing J, Zhang Y, Han K, Salem A, Sen S, 135 Lev-Maor G, Sorek R, Shomron N, Ast G. Chem 269:8466–8476. Huff C, Zhou Q, Kirkness E, Levy S, Batzer M. 2003. The birth of an alternatively spliced exon: 143 Jurka J. 1997. Sequence patterns indicate 2009. Mobile elements create structural varia- 30 splice-site selection in Alu exons. Science an enzymatic involvement in integration of tion: analysis of a complete human genome. 300:1288–1291. mammalian retroposons. Proc Natl Acad Sci Gen Res 19:1516–1526. 136 Krull M, Brosius J, Schmitz J. 2005. Alu- USA 94:1872–1877. 128 Ostertag EM, DeBerardinis RJ, Goodier JL, SINE exonization: en route to protein-coding 144 Mathias SL, Scott AF, Kazazian HH Jr., Zhang Y, Yang N, Gerton GL, Kazazian HH Jr. function. Mol Biol Evol 22:1702–1711. Boeke JD, Gabriel A. 1991. Reverse transcrip- 2002. A mouse model of human L1 retrotrans- 137 Cordaux R, Udit S, Batzer MA, Feschotte C. tase encoded by a human transposable element. position. Nat Genet 32:655–660. 2006. Birth of a chimeric primate gene by cap- Science 254:1808–1810. 129 Callinan PA, Wang J, Herke SW, Garber RK, ture of the transposase gene from a mobile ele- 145 Wang J, Song L, Gonder M, Azrak S, Ray Liang P, Batzer MA. 2005. Alu retrotransposi- ment. Proc Natl Acad Sci USA 103:8101–8106. DA, Batzer M, Tishkoff S, Liang P. 2006. Whole tion-mediated deletion. J Mol Biol 348:791–800. 138 Xing J, Wang H, Belancio VP, Cordaux R, genome computational comparative genomics: 130 Han K, Sen SK, Wang J, Callinan PA, Lee Deininger PL, Batzer MA. 2006. Emergence of a fruitful approach for ascertaining Alu inser- J, Cordaux R, Liang P, Batzer MA. 2005. new primate genes by retrotransposon-medi- tion polymorphisms. Gene 365:11–20. Genomic rearrangements by LINE-1 insertion- ated sequence transduction. Proc Natl Acad Sci 146 Boissinot S, Chevret P, Furano AV. 2000. mediated deletion in the human and chimpan- USA 103:17608–17613. L1 (LINE-1) retrotransposon evolution and zee lineages. Nucleic Acids Res 33:4040–4052. 139 Sorek R, Ast G, Graur D. 2002. Alu-con- amplification in recent human history. Mol Biol 131 Bailey JA, Liu G, Eichler EE. 2003. An Alu taining exons are alternatively spliced. Genome Evol 17:915–928. transposition model for the origin and expan- Res 12:1060–1067. VC 2010 Wiley-Liss, Inc.