Nature Reviews Genetics | AOP, published online 9 December 2014; doi:10.1038/nrg3859 PERSPECTIVES

independent occasions via recruitment of OPINION non-homologous endonucleases. Owing to the ubiquity and high abun- Evolution of adaptive immunity from dance of MGEs, their co-evolution with cellular hosts is a perennial parasite–host transposable elements combined ‘arms race’ in which the two sides evolved extremely diverse and elaborate systems of defence and counter-defence18–22. Notably, with innate immune systems many defence systems — including restric- tion–modification enzymatic modules, Eugene V. Koonin and Mart Krupovic toxin–antitoxin and the clustered regularly interspaced short palindromic repeat– Abstract | Adaptive immune systems in prokaryotes and animals give rise to CRISPR-associated protein (CRISPR–Cas) long-term memory through modification of specific genomic loci, such as by systems in prokaryotes, and the apop- insertion of foreign (viral or plasmid) DNA fragments into clustered regularly tosis machinery in eukaryotes — seem interspaced short palindromic repeat (CRISPR) loci in prokaryotes and by V(D)J to be ‘guns for hire’; that is, they are also recombination of immunoglobulin genes in vertebrates. Strikingly, recombinases recruited by viruses and other MGEs for counter-defence23,24. derived from unrelated mobile genetic elements have essential roles in both All organisms have a plethora of innate prokaryotic and vertebrate adaptive immune systems. Mobile elements, which immunity mechanisms, and many also have are ubiquitous in cellular life forms, provide the only known, naturally evolved adaptive immunity25–27. In general, innate tools for genome engineering that are successfully adopted by both innate immunity covers all systems of defence immune systems and genome-editing technologies. In this Opinion article, we against a broad range of pathogens, whereas present a general scenario for the origin of adaptive immunity from mobile adaptive immunity is tailored towards a specific pathogen, and its essential feature elements and innate immune systems. is immunological memory, whereby an organism that survives an encounter with a All cellular organisms persist and evolve in the host genome via the ‘cut-and-paste’ particular pathogen is specifically protected under a perennial onslaught of mobile mechanism, whereby the transposon from that pathogen for the long term (often genetic elements (MGEs), such as transpo- is excised from its initial location and for the lifetime of the individual). Adaptive sons, viral sequences and plasmids. Many, if inserted into a new locus. Most of the class II immunity is highly specific and extremely not most, of these diverse, ‘selfish’ elements transposons have characteristic terminal efficient against many pathogens, despite insert into the chromosomes of the cellular inverted repeats (TIRs) but differ widely numerous powerful counter-defence hosts, either as an obligate part of their life with respect to the element size and gene strategies evolved by the pathogens18–22. cycles or at least sporadically. In multicellular content, the mechanisms of transposition In prokaryotes, innate immunity eukaryotes, MGEs constitute a substantial and the transposases encoded8,10,11. The mechanisms include the well-studied proportion of the host genome, for example, majority of the transposases belong to restriction–modification enzymatic modules >50% of the genome in mammals and >70% the DDE superfamily (which is named after and multiple less thoroughly characterized of the genome in some plants1–3. Integrated two aspartate residues and one glutamate systems28. Notable among the latter is the MGEs are also present in the genomes of residue that form the catalytic triad), but recently described mechanism that uses most bacteria and archaea4,5; although they several other unrelated families of trans- bacterial homologues of the eukaryotic are not as abundant as those in eukaryotes, posases have been identified8,10,11. Some Argonaute proteins — the key enzymes of these elements account for up to 30% of some transposons encode transposases that are RNA interference (RNAi) — to generate bacterial genomes6,7. homologous to the rolling-circle replication guide RNA or DNA molecules that are then Transposons are DNA segments that initiation endonucleases found in single- used to inactivate foreign genomes29–31. Until move from one location in the host genome stranded DNA viruses and plasmids12–14, recently, prokaryotes have not been thought to another. Most of the transposons can be whereas other transposases are homologous to have adaptive immunity. However, this grouped into two classes8,9. Class I elements to bacteriophage tyrosine or serine recom- perception was overturned by the discovery (also known as ) trans- binases15,16, or to eukaryotic APE1‑like DNA of the CRISPR–Cas systems that are repre- pose via an RNA intermediate which, prior repair endonucleases (which function in con- sented in most archaea and many to integration, is copied back to the DNA junction with reverse transcriptases)17. Such bacteria (FIG. 1a). CRISPR–Cas is an immu- form by the element-encoded reverse trans­ diversity of transposases strongly suggests nity mechanism that functions by incorpo- criptase. Class II DNA transposons move that transposons have emerged on multiple rating fragments of foreign (viral or plasmid)

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 1

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES a Adaptation cas genes Repeats Spacers Leader Cas1–Cas2 Viral DNA Integration of new protospacer

Transcription Expression

Transcript processing by Cas nucleases Interference

crRNA Cleavage of invading DNA or RNA b Germline configuration V segments D segments J segments Constant region

V1 V2 V3 V4 D1 D2 D3 D4 J1 J2 J3 C1 C2 C3 23-RSS 12-RSS

RAG1–RAG2 D3 D-to-J recombination J2 V1 V2 V3 V4 D1 D2 J3 C1 C2 C3 + DJ coding Signal joint D4 joint RAG1–RAG2 J1

V-to-DJ recombination D1 V1 V2 V3 D2 J3 C1 C2 C3 + V4 VDJ coding joint Signal joint

Transcription and splicing AAA

Translation and assembly Variable region

Figure 1 | Adaptive immune systems of prokaryotes and eukary- variable region of the immunoglobulin heavy chain is assembled by V(D)J otes. a | The prokaryotic clustered regularly interspaced short palindromic recombination from V (variable; purple rectangle),Nature D (diversity; Reviews green| Genetics rec- repeat–CRISPR-associated protein (CRISPR–Cas) locus consists of cas tangle) and J (joining; brown rectangle) gene segments. The immunoglobu- genes (blue arrows) that encode different Cas proteins, and CRISPR arrays lin light chain is assembled from V and J segments by VJ recombination (not composed of variable spacers (coloured hexagons) interspersed with direct shown). Multiple V, D, J and C (constant region; red rectangles) gene seg- repeats (red triangles). The leader sequence (grey rectangle) contains a pro- ments are available for recombination in the germline genome. The recom- moter for the transcription of the CRISPR array and marks the end where bination is carried out by the RAG1–RAG2 recombinase complex and new spacers are incorporated. Three stages of CRISPR–Cas immunity are involves two types of recombination signal sequences (RSSs), 23‑RSS (red depicted. During the adaptation stage, a Cas1–Cas2 heterohexamer triangles) and 12‑RSS (pink triangles), which flank each gene segment. uptakes a protospacer from the invading plasmid or viral DNA (green) and Joining of the DNA ends requires non-homologous end-joining (NHEJ) pro- incorporates it at the leader-proximal end of the CRISPR array. During the teins (not shown). Two rounds of recombination, D to J and V to DJ, produce expression stage, the CRISPR array is transcribed, and the transcript is pro- a VDJ coding joint and two circular molecules (signal joints); the latter do cessed into small CRISPR RNAs (crRNAs) by different Cas nucleases in a not have any further role and are discarded. Transcription across the VDJ CRISPR–Cas type-dependent manner. During the interference stage, coding joint, followed by splicing, produces the mature transcript of the crRNAs act as guides for the cleavage of invading viral or plasmid DNA or immunoglobulin heavy chain. Subsequent translation of the transcript, RNA that contains regions complementary to the crRNA. b | Lymphocyte assembly of the heavy chain and association with the light chain (beige antigen receptor diversification by V(D)J recombination is shown. The rectangles) complete the assembly of the immunoglobulin receptor.

2 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

Type I Type II Type III Type IV

Cas1 Cas2 Cas1 Cas2 Cas1 Cas2 Adaptation Spacer acquisition and potential Cas4 Cas4 involvement in programmed cell death

Cas6 RNase III Cas9 Cas6 Expression Transcript cleavage

Cas8 (LS) SS Cas10 (LS) SS Cas8 (LS) SS Interference Cascade complex: crRNA and Cas7 Cas5 Cas9 Cas7 Cas5 Cas7 Cas5 target binding HD-Cas3 Cas3 HD-Cas10 Target cleavage Optional components (function unclear) Csn2 COG1517 DinG

Figure 2 | A general scheme of the organization of CRISPR–Cas binding and target cleavage. Similarly, in TypeNature I and Type Reviews III CRISPR–Cas | Genetics systems. Protein names follow the current nomenclature and classifi- systems, Cas6 is a subunit of the Cascade (CRISPR-associated complex cation32. The general functions and the stages of the clustered regularly for antiviral defence) complex that is involved in both pre-crRNA pro- interspaced short palindromic repeat–CRISPR-associated protein cessing, as well as target recognition and inactivation. Note that RNase (CRISPR–Cas) immunity are shown on the right; the corresponding pro- III, which participates in cleavage of Type II CRISPR transcripts, has other teins in each type of CRISPR–Cas system are shown on the left and are roles in the processing of cellular RNA, particularly ribosomal RNA. Csn2 colour coded. Cas9 of Type II CRISPR–Cas is a multifunctional protein is predicted to be functionally analogous (but not homologous) to Cas4 involved in several stages of the immune response, including processing and participates in spacer acquisition33. HD, histidine–aspartate family of the primary CRISPR transcript into CRISPR RNAs (crRNAs), target nuclease; LS, large subunit; SS, small subunit.

DNA into CRISPR cassettes and then using from an enormous pre-existing repertoire this proposal into a complete evolutionary the transcripts of these unique spacers to of cells with diverse receptors51,52. In con- scenario in which CRISPR–Cas was derived target and inactivate the cognate trast to the prokaryotic adaptive immune from a casposon and an innate immune genomes28–38. Although the immunological system, in vertebrates, immunological system, and discuss the striking parallels memory of the CRISPR–Cas system is short- memory is limited to somatic cells and has with the evolution of adaptive immunity in lived by evolutionary standards, extremely no transgenerational inheritance. Instead, animals, as well as general implications of efficient and specific immunity can be trans- the vast repertoire of immunoglobulin genes the naturally evolved genome engineering mitted across many thousands of genera- is generated via dedicated diversification capacity of MGE-encoded recombinases. tions39. Thus, the CRISPR–Cas system fully processes known as V(D)J recombination satisfies the definition of adaptive immu- — in which variable (V), diversity (D) Evolutionary origin of CRISPR–Cas nity and is also a mechanism of bona fide and joining (J) segments are recombined Prokaryotes have evolved two analogous Lamarckian adaptive evolution40. — and hypermutation53,54 (FIG. 1b). mechanisms of immunity — namely Eukaryotes encompass a variety of Paradoxically, insights into the origins of CRISPR–Cas and Argonaute-based innate and adaptive immunity mechanisms adaptive immune systems in both eukary- systems — that rely on short guide RNA or of their own; some of these mechanisms otes and prokaryotes come from the least DNA molecules for targeting and seem to have their roots in prokaryotes, expected field of research — namely, studies inactivating of the nucleic acids of invading whereas others are eukaryote-specific. on MGEs. It was demonstrated that RAG1, MGEs29,30. Despite similar mechanisms of All eukaryotes seem to have some form which encodes the key enzyme of V(D)J action, the CRISPR–Cas system is adaptive, of RNAi, a powerful defence system that recombination, is derived from a eukary- whereas the Argonaute-centred system is uses RNA guides to inactivate invading otic transposon55,56. More recent studies an embodiment of innate immunity and nucleic acids, primarily those of RNA on bacterial and archaeal mobilomes have is homologous to eukaryotic RNAi. The viruses41–44. In addition, animals encompass provided clues regarding the origin of the key distinction between these adaptive and the paradigmatic system of antibacterial CRISPR–Cas system57. innate immune systems lies in the abil- innate immunity centred around Toll-like Specifically, we identified a novel family ity of the CRISPR–Cas system to keep a receptors45,46­, and vertebrates (and pos- of archaeal and bacterial MGEs that were record of past infections by incorporating sibly other deuterostomes) also have the named ‘casposons’ because they encode Cas1 spacer sequences derived from MGEs into equally well-characterized interferon anti- homologues that are implicated as the trans- the dedicated CRISPR loci28–38 (FIG. 1a). The viral response47,48. Historically, the most posase of these elements57. The discovery of immunization process, known as adapta- well-known form of anti-parasitic defence casposons puts a new twist on the origin tion, is mediated by the concerted action of is adaptive (acquired) immunity, which is of CRISPR–Cas, especially given that in two proteins, Cas1 and Cas2 (REFS 34,58,59). prominent in mammals and is also repre- phylogenetic trees casposon Cas1 does not These two proteins are conserved in the sented in all other vertebrates49,50. The spec- cluster with any particular group of CRISPR- three major types of functionally character- ificity of the adaptive immunity of jawed associated Cas1 proteins, which is compat- ized CRISPR–Cas systems (FIG. 2) and can vertebrates is achieved via proliferation ible with a basal position of casposons in the be considered the signature proteins of the of lymphocyte clones that carry immuno- phylogenetic tree of the Cas1 family. We pro- systems32,60,61. By contrast, other CRISPR– globulin receptors for antigens of the given posed that casposons could have been at the Cas components are mostly type-specific. pathogen and that are selected accordingly ‘root’ of CRISPR–Cas57. Below, we develop These other components include Cascade

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 3

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

(CRISPR-associated complex for antiviral systems lack Cas1, Cas2 and the CRISPR Casposons are large MGEs that, in defence), which mediates the processing of cassettes but, in this case, the respective addition to the genes encoding Cas1 and a primary CRISPR transcripts, generates the genomes do not typically encompass any family B DNA polymerase (PolB) that are mature guide CRISPR RNAs (crRNAs), and other CRISPR–Cas loci61,67. Thus, although present in each of them, encompass a broad loads them on the target DNA, and ‘execu- none of these ‘minimal’ CRISPR–Cas vari- diversity of protein-coding genes that are tor’ nucleases that are directly involved in ants have been functionally characterized found among different casposons57. Several the cleavage of the target DNA (FIGS 1a,2). so far, they clearly cannot provide adaptive of these dispensable casposon genes encode Consequently, it seems that the CRISPR– immunity via genome manipulation that various nucleases and helicases — enzymes Cas immunity mechanism emerged via the is characteristic of canonical CRISPR–Cas, that are common in CRISPR–Cas systems, fusion of originally independent functional but are most likely to represent a distinct including a homologue of Cas4, a nuclease modules — the block of genes encoding an innate immunity mechanism. In a close present in the majority of CRISPR–Cas sys- RNA- or DNA-guided innate immune sys- analogy to the small interfering RNA tems (FIG. 2). Notably, in some CRISPR–Cas tem, and a module responsible for the adap- (siRNA) branch of the eukaryotic RNAi systems Cas4 is fused to Cas1, suggesting tation process60. The ‘last piece in the puzzle’ system and the Argonaute-based bacterial that it could play a part in spacer sequence is the source of the CRISPR loci. Tracing innate immune systems30,68–70, the ‘solo- acquisition, although a role of Cas4 in the origins of these distinct components of Cascade’ modules might generate small programmed cell death has also been pro- CRISPR–Cas is thus expected to shed light guide RNAs from transcripts of invading posed60,67. We hypothesize that the casposon on the emergence of adaptive immunity in MGE genomes or guide DNAs directly at the origin of CRISPR–Cas incorporated prokaryotes. from such genomes, and use these guide several ancestral cas genes, in particular, cas1 molecules for the inactivation of the cognate and cas4 (FIG. 3). Cascades: the effector modules of CRISPR– foreign DNA. This putative form of innate The Cas2 protein is a homologue of Cas. The Cascade complexes of the Type I immunity could resemble the ancestral state VapD — a typical prokaryotic toxin that has and Type III CRISPR–Cas systems consist of the Cascade complex that was a key the activity of an mRNA interferase, which is of the Cas5 and Cas7 proteins, the large- contributor in the evolution of CRISPR–Cas. a nuclease that specifically cleaves ribosome- subunit protein Cas8 (in Type I) or Cas10 associated mRNAs to induce dormancy or (in Type III) and the small-subunit proteins; Cas1–Cas2: the immunization (informa- to kill the cell28,73,74. Accordingly, although in some Type I systems, Cas6 proteins, tional) module. Recently, the likely source either RNase or DNase activity has been the RNases directly responsible for guide of the CRISPR–Cas immunization module reported for Cas2 proteins from different RNA precursor processing, are also sub­ has also been uncovered. Cas1 and Cas2 prokaryotes71,72, Cas2 is most likely to have units of the Cascade complexes61–63. At the are endonucleases that form a heterohex- originated from a typical toxin–antitoxin heart of the Cascade complexes are RNA amer involved in the acquisition of the module, which could have already been recognition motif (RRM) domains, which protospacer sequences from the invading present in the casposon that gave rise to are common RNA-binding domains in all MGEs and insertion of the spacers into CRISPR–Cas (FIG. 3). Although none of the cellular organisms64–66. The Cas5, Cas6, the CRISPR loci58. It has been demon- currently known casposons carry recogniza- Cas7 and Cas10 proteins all contain one strated that the nuclease activity of Cas2 ble toxin–antitoxin systems, toxin–antitoxin or two RRM domains28,32,61. It has been (REFS 71,72) is not required for this process, modules are common in other bacterial and proposed that the Type III Cascade com- whereas the activity of Cas1 is essential; archaeal MGEs29,75,76. plex is the ancestral form that could have thus, Cas1 is the primary enzyme involved evolved from a simple double-RRM protein in immunization58. The endonuclease and CRISPR cassettes. The key Cas proteins through fusion with a histidine–aspartate DNA strand-rejoining activities of Cas1 might not be the only contribution of cas- (HD) nuclease domain and a series of RRM mechanistically resemble the respective posons to the emergence of CRISPR–Cas; domain duplications61,67. Once in exist- activities of MGE-encoded integrases and the CRISPR cassettes, which are perhaps the ence, the Cascade complex could initially transposases, although Cas1 is not homolo- most enigmatic component of the CRISPR– function as an innate immune system, gous to any of the known recombinase fam- Cas systems, might have also been derived analogous to the extant Argonaute proteins, ilies58,59. Indeed, transposon-like elements from casposons. By definition, are although sequence analysis has unequivo- of the casposon superfamily encode Cas1 clusters of short palindromic repeats that cally showed that the Argonaute-based and apparently use its endonuclease activ- are interspersed with unique spacer system is evolutionarily unrelated to the ity for integration into and excision out sequences. Although not universal, the Cascade complexes68. The credence to this of the cellular genome57 — a role strongly palindromic character of the repeats is wide- hypothesis is given by the fact that many reminiscent of that postulated for the Cas1– spread in the CRISPR cassettes from differ- Type III CRISPR–Cas loci, in particular Cas2 complex during spacer sequence ent organisms77. These repeats are thought those of subtype IIIB, are not associated acquisition in CRISPR–Cas. Deep branch- to be recognized by the Cas1–Cas2 complex with CRISPR cassettes or the Cas1–Cas2 ing of the casposon Cas1 sequences within that introduces a staggered cut to allow the module and apparently use, in trans, the the global Cas1 phylogeny has led to the incorporation of new spacer sequences into adaptation machinery of other Type I proposal that casposons could have played the CRISPR arrays78,79. In the case of the cas- or Type III systems present in the same a pivotal part in the emergence of CRISPR– posons, according to the proposed model57, genomes61,67. Furthermore, a recent com- Cas, specifically by providing the ancestral the Cas1 recognition site lies within the parative genomic analysis has uncovered a cas1 gene57. Under the proposed scenario, TIRs that are present at the extremities of all growing variety of Type IV (formerly Type U) CRISPR–Cas would emerge when a caspo- casposons. Similar to CRISPR repeats, TIRs CRISPR–Cas systems61 (FIG. 2). Similar son inserted into an archaeal genome next from some casposons display a palindromic to some of the Type III systems, Type IV to a solo-Cascade operon (FIG. 3). feature57 and even share sequence similarity

4 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

with CRISPR repeats from certain organisms cas2 vapX (FIG. 4a,b) . Although TIRs are variable in size Ancestral casposon Toxin–antitoxin 57 (25–602 bp) , their median length is around polB HNH HTH cas4 cas1 50 bp, which is within the reported size TIR TIR range of CRISPR repeats (20–50 bp)35. Thus, Toxin–antitoxin system acquisition casposon TIRs are similar to CRISPR repeats with regard to the sequence, size, secondary polB HNH HTH cas4 cas1 cas2 vapX structure and the (postulated) ability to bind TIR TIR to Cas1. Inactivation of a TIR at one of the Solo-Cascade extremities of an integrated casposon would immobilize the inserted casposon genes cas5 cas6 cas7 SS cas10 and produce a palindromic sequence that is Casposon insertion next reminiscent of a CRISPR unit. to a solo-Cascade

Emergence of CRISPR–Cas. Importantly, cas5 cas6 cas7 SS cas10 polB HNH HTH cas4 cas1 cas2 vapX the casposon Cas1 is expected to be capa- TIR TIR ble of recognizing and acting on its TIR Loss of one TIR and immobilization substrate in trans, similarly to the way in of casposon genes; gene loss which the Cas1–Cas2 complex operates on cas5 cas6 cas7 SS cas10 cas4 cas1 cas2 CRISPR cassettes. Indeed, physical cou- solo-TIR pling of the target sequence (casposon TIR) with the gene encoding the protein that rec- Proliferation of TIRs into repeat clusters ognizes it (casposon Cas1) is not necessary, as indicated by the ability of transposases cas5 cas6 cas7 SS cas10 cas4 cas1 cas2 to mobilize non-autonomous MGEs that cas CRISPR contain the cognate transposase-binding sites within their TIRs8,10. Consequently, Figure 3 | A scenario for the evolution of the CRISPR–Cas system fromNature a casposon, Reviews | aGenetics toxin– recognition of such ‘solo-TIRs’, and their antitoxin module and a solo-Cascade innate immune system. Casposon-derived genes are subsequent amplification within the same shown as dark blue rectangles, toxin–antitoxin genes are depicted in grey and ‘solo-Cascade’ (CRISPR-associated complex for antiviral defence) genes are shown in green. A generic organiza- locus, would eventually result in arrays of tion of a Type III Cascade operon is shown that does not depict any particular genomic locus. Most palindromic repeats, the putative ancestors of the Cascade genes encode proteins that are distinct arrangements of one or two RNA recogni- of CRISPRs (FIG. 3). Indeed, such physical tion motif (RRM) domains and that might have evolved from a simple double-RRM protein through uncoupling of the recombinase from its a series of RRM domain duplications and a fusion with a histidine–aspartate nuclease domain in target could have been a prerequisite for the Cas10 (REF. 61). Terminal inverted repeats (TIRs) are palindromic, which is reminiscent of the emergence of a stably inheritable immune CRISPR unit. polB, family B DNA polymerase; SS, small subunit. system. The scenario of the emergence of a CRISPR–Cas system from a casposon and a solo-Cascade then becomes less complex, requiring only integration of the casposon The Type II CRISPR–Cas system is most recombination brings them together in a next to the Cascade complex, proliferation likely to have evolved when a transposon single exon and, in the process, generates of the repeats originating from the encoding a Cas9 ancestor inserted into a numerous small insertions and deletions at casposon TIR and deletion of some of type I CRISPR–Cas locus and replaced the junctions, creating the enormous com- the casposon genes, in particular, the polB the genes for the Cascade subunits. binatorial diversity that is required to match gene (FIG. 3). Thus, the major components of Type II the vast diversity of antigens84,85. This process Type II CRISPR–Cas systems differ sub- CRISPR–Cas, the type that is used for is mediated by the RAG1–RAG2 recombi- stantially from Type I and Type III systems genome engineering83, apparently evolved nase complex (FIG. 1b), in which RAG1 is the in terms of the organization of the processing- through two transposon insertion events enzymatically active subunit86,87, whereas executive module, which in Type II systems such that this system seems to consist RAG2 acts as a regulatory subunit and is consists of a single large Cas9 protein. This entirely of transposon-derived genes. superficially similar to the function of Cas2 protein binds to the crRNA (which is pro- in the Cas1–Cas2 duet. cessed with the help of bacterial RNase III), MGEs in vertebrate adaptive immunity Strikingly, the recombinase domain mediates its annealing to the target DNA Similar to CRISPR–Cas, the classic ver- of RAG1 is derived from the recombinase of and cleaves the target via its two nuclease tebrate adaptive immunity also involves animal transposons of the Transib family56,88. domains, RuvC and HNH32,80,81. Strikingly, genome manipulation — namely, V(D) J The small Transib transposons are found the Cas9 protein is homologous to a family recombination (FIG. 1b) that, along with among diverse animal species but are absent of transposon-encoded proteins known as somatic hypermutation, generates the diver- in vertebrates, and encode a transposase TnpB (also known as Fanzors) that contain sity of the T cell receptors (TCRs) and B cell that belongs to a distinct family within the the RuvC-like nuclease domain but receptors (BCRs). The three segments DDE superfamily of transposases and that that are not required for transposition81. (V, D and J) of the variable portions of the is homologous to the catalytic core domain Several transposons encode only the TnpB TCRs and BCRs are each encoded in several of RAG1 (REF. 56). The RAG1 protein has protein and use a transposase in trans82. dozens of genomic copies. However, V(D)J undergone substantial evolution since

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 5

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

the proposed recruitment from a Transib immunoglobulin gene, which was an ele- transposon that donates both the recom- transposon, including an amino-terminal ment of innate immunity. In such models, bination sites and the recombinase (FIG. 5). fusion with a domain containing a RING two conditions would have to be met. First, The emergence of vertebrate adaptive finger ubiquitin ligase, but the mechanism the TIRs of the hypothetical non-autonomous immunity from innate immunity mecha- of DNA cleavage and rejoining during V(D)J transposable elements would have to be nisms following the recruitment of the recombination displays a striking similarity identical to those of the RAG transposon rearrangement machinery (RAG1–RAG2) to the transposition mechanism56. Moreover, in order to be specifically recognized by has been previously discussed93,94. the target site duplications (TSDs) gener- the RAG1–RAG2 transposase. Second, the Although V(D)J recombination, to the ated during the transposition reactions TIRs of the RAG transposon would have to best of the current knowledge, is limited mediated by Transibs and RAG1–RAG2 are be obliterated to ensure the immobilization to jawed vertebrates, the discovery of similar, and there is significant sequence of the RAG1–RAG2 gene pair. However, the RAG1–RAG2 locus in the sea urchin similarity between the TIRs of Transib and non-autonomous transposons carrying genome implies that this pair of closely the recombination signal sequences (RSSs) RSS-like TIRs have not been discovered so linked genes was already present in the of the immunoglobulin genes (FIG. 4c), far55. We propose that a more parsimonious genome of the last common ancestor of indicating that the RSS evolved via Transib scenario for the origin of V(D)J recombina- the Deuterostomes95. Moreover, homo- insertion56,88,89. tion would involve a single insertion of a logues of the RAG1 catalytic domain, The tight linkage between the RAG1 fully functional RAG transposon into an which might be evolutionary intermedi- and RAG2 genes in animal genomes sug- ancestral immunoglobulin gene, followed ates between the Transib transposases and gests that this gene pair was already present by externalization of the RAG1–RAG2 gene RAG1 proteins of Deuterostomes, have in the ancestral Transib-like transposon, block while leaving the native RSS-like been detected in cnidarian genomes56,96. although no such gene combination has TIRs within the immunoglobulin gene. An The function (or functions) of RAG1– been detected in the currently identified important consequence of the disconnec- RAG2 before the emergence of adaptive transposons56. Several variations of the tion of the TIRs from the transposon genes immunity is intriguing: is it possible that ‘RAG transposon’ hypothesis for the origin is that the TIRs would be immobilized there may be additional mechanisms of of adaptive immunity in animals have been within the genome, thus ensuring stable naturally evolved genome engineering that proposed55,90–92. Typically, these scenarios inheritance of the immune system (FIG. 5). remain to be discovered? postulate two independent transposition A general analogy to the scenario of events, whereby insertion of one trans- CRISPR–Cas evolution is apparent; in both MGEs and natural genome engineering poson (the RAG transposon) gave rise cases, the potential to form immunological Insertion of MGEs into the host genome, to the RAG1–RAG2 gene pair, whereas memory, which is the essence of adaptive by definition, modifies the content of these insertion of a related non-autonomous ele- immunity, is conferred on a pre-existing genomes. Given that MGEs are ubiquitous ment introduced the RSS into an ancestral innate immune system by insertion of a in cellular life forms and account for the

a TIR TIR c 23-TIR 13-TIR polB HNH HTH HTH cas1 MTase RAG1-like transposase

Heptamer Nonamer 13-TIR GGAGGGGGATATATATACATCCCCTCTTAAGTTCCCTTTT 40 bp CACAGTGGTTCCG------TCAGAGGCCAAAAATA 30 bp GTTTCCATCCC-TCTTAAGTTCGATTGAAAC 23-TIR CACAGTGGTTCCGCTTGTATGGAGCAGCGGCCATAAATA * * * * * * * * * * * * 12-RSS CACAGTGnnnnn------nnnnnnnACAAAAACC HD-cas5 cas5 cas7 cas8 cas2 cas1 cas4 cas6 23-RSS CACAGTGnnnnnnnnnnnnnnnnnnnnnnnACAAAAACC n = 19 b CRISPR repeat Figure 4 | Comparison between TIR, CRISPR and RSS. a | Schematic organization of the casposon from Aciduliprofundum boonei T469 (NC_013926, nucleotide coordinates: 380320-389403) is shown Casposon TIR T A T A at the top, whereas the clustered regularly interspaced short palindromic repeat–CRISPR-associated A T C G protein (CRISPR–Cas) system of Thermotoga thermarum DSM 5069 (NC_015707, nucleotide coordi- T A T T nates: 1706198-1717565) is shown at the bottom. cas genes are colour-coded according to the scheme A T C T provided in FIG. 2. The alignment of the corresponding casposon terminal (TIR) and C C CRISPR sequence is shown in the middle. Identical nucleotides are indicated by the black background. T A C G C b | Predicted secondary structures of the A. boonei casposon TIR (left) and T. thermarum CRISPR repeat A T A A (right) are shown. c | Comparison between the Transib TIRs and recombination signal sequences (RSSs) G T A T is shown. The Transib5 transposon from Drosophila melanogaster (top) is flanked by TIRs that consist G C C T of conserved heptamer and nonamer sequences separated by a variable spacer of either 13 bp (pink G C T C C G triangle) or 23 bp (red triangle). Sequence alignment of the Transib5 TIRs and the consensus recombi- G C T C T A nation recognition sequence (RSS) is depicted. The variable spacers in RSSs are marked by ‘n’. The G C G C T A most conserved nucleotides in the RSS heptamer and nonamer, which are necessary for efficient V(D)J A T A T T A recombination, are highlighted by the red background. The RSS and TIR sequences data are derived from REF. 56. HD, histidine–aspartate family nuclease; polB, family B DNA polymerase. 5′ G G C T T A T T T 3′ 5′ G C 3′ Nature Reviews | Genetics

6 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

CRISPR–Cas V(D)J recombination

cas1 cas2 RAG1 RAG2 Casposon or transposon TIR TIR TIR TIR + (TIR recognition in cis) +

Solo-Cascade V-type Ig gene Innate immunity module

Insertion adjacent to or Solo-Cascade cas1 cas2 into the innate immunity pre-V RAG1 RAG2 pre-J TIR TIR module TIR TIR Gene externalization; TIR inactivation in Solo-Cascade cas1 cas2 CRISPR–Cas (TIR and V J RAG1 RAG2 solo-TIR RSS recognition in trans) RSS RSS

TIR proliferation Solo-Cascade cas1 cas2 by tandem V V J J J RAG1 RAG2 duplication cas CRISPR Nature Reviews | Genetics Figure 5 | Comparison of the proposed evolutionary paths to the triangles) or 23 bp (red triangles). V and J represent variable and joining prokaryotic and eukaryotic versions of adaptive immu- gene segments of the immunoglobulin (Ig) gene, respectively. The nity. Terminal inverted repeats (TIRs) and recombination signal dashed line indicates that RAG1 and RAG2 proteins are not encoded sequences (RSSs) are depicted as triangles. In the case of V(D)J recom- in proximity of the cognate recombination sites. CRISPR–Cas, clus- bination, TIRs and RSSs consist of conserved heptamer and nonamer tered regularly interspaced short palindromic repeat–CRISPR- sequences separated by a variable spacer of either 12 bp (pink associated protein.

majority of the DNA in some genomes — system that is responsible for the mechan- ruled out. Focused searches for novel genome particularly those of many vertebrates and ics of the interaction with the target (the manipulation systems that exploit MGE- plants — the consequences of the genome Cascade complex and immunoglobulins) is encoded recombinases could be a promising modifications caused by MGEs are diverse derived from pre-existing innate immune research direction. and fundamental for cellular organisms97,98. systems. By contrast, transposons give rise The high specificity and genome engi- It is well known that MGE sequences are to the ‘informational’ module that consists neering capacity of adaptive immune often recruited for various cellular func- of the specific integration sites and the enzy- systems translate into almost unlimited tions, typically as regulatory elements and, matic machinery of recombination and/or potential for experimental tool develop- in some cases, as novel protein-coding integration. Further research on the molecu- ment. The utility of antibodies as tools for sequences99–101. Besides the recruitment of lar mechanisms at play during the CRISPR– protein detection is obvious. Recently, the MGE sequences for diverse functions, cellular Cas action and casposon mobility should immense promise of CRISPR–Cas has been organisms directly exploit the capacity of provide key details to help to refine the pro- realized, in particular for Type II systems, MGEs to modify the host genome. A major posed model on the origin of the prokaryotic in which the transposon-derived Cas9 pro- case in point is the catalytic subunit of telom- CRISPR–Cas immunity. tein is the only protein required for target erase, a reverse transcriptase derived from The finding that two unrelated classes recognition and cleavage83,107,108. This prop- prokaryotic retroelements (group II introns) of MGEs apparently gave rise to adaptive erty of CRISPR–Cas is a direct extension of and responsible for the replication of chro- immune systems in prokaryotes and animals its extremely high specificity achieved via mosomal ends () in most eukary- along strikingly similar evolutionary routes genome manipulation by an MGE-derived otes102,103. Remarkably, in organisms that have suggests that these two systems might not recombinase. lost the ancestral telomerase, such as insects, be the only versions of adaptive immunity Eugene V. Koonin is at the National Center for replication is mediated by reverse or, more broadly, of genome manipulation Biotechnology Information, National Library of transcriptases of other retrotransposons104. mechanisms that make use of MGE-derived Medicine, Bethesda, Maryland 20894, USA. Evolution of adaptive immunity can recombinases as naturally evolved devices for Mart Krupovic is at Institut Pasteur, Unité Biologie perhaps be considered as the pinnacle genome engineering. Given that genomes of Moléculaire du Gène chez les Extrêmophiles, 25 rue du of this strategy that makes exquisite use of almost all cellular organisms are replete with Docteur Roux, 75015 Paris, France. the ability of recombinases (transposases integrated MGEs, some of which are domesti- e-mails: [email protected]; or integrases) to insert foreign DNA into cated and conserved through long evolution- [email protected] specific sites in the host genome. The two ary timespans, it seems unlikely that these doi:10.1038/nrg3859 Published online 9 December 2014 best-characterized adaptive immune sys- two adaptive immune systems are unique. 1. Huda, A. & Jordan, I. K. Epigenetic regulation of tems, the prokaryotic CRISPR–Cas system For example, the HARBI1 protein that is mammalian genomes by transposable elements. and the immunoglobulin-centred adaptive conserved across vertebrates is a derivative of Ann. NY Acad. Sci. 1178, 276–284 (2009). 2. Lopez-Flores, I. & Garrido-Ramos, M. A. The repetitive immunity of jawed vertebrates, seem to the transposase of the widespread Harbinger DNA content of eukaryotic genomes. Genome Dyn. 7, have evolved completely independently, yet transposons105,106. The function of HARBI1 1–28 (2012). (FIG. 5) 3. Defraia, C. & Slotkin, R. K. Analysis of through strikingly similar scenarios . remains obscure, but a role in a yet unknown activity in plants. Methods Mol. Biol. 1112, 195–210 In both cases, the ‘executive’ part of the genome manipulation pathway cannot be (2014).

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 7

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

4. Cortez, D., Forterre, P. & Gribaldo, S. A hidden 30. Hur, J. K., Olovnikov, I. & Aravin, A. A. Prokaryotic 59. Wiedenheft, B. et al. Structural basis for DNase reservoir of integrative elements is the major source of Argonautes defend genomes against invasive DNA. activity of a conserved protein implicated in recently acquired foreign genes and ORFans in Trends Biochem. Sci. 39, 257–259 (2014). CRISPR-mediated genome defense. Structure 17, archaeal and bacterial genomes. Genome Biol. 10, 31. Swarts, D. C. et al. The evolutionary journey of 904–912 (2009). R65 (2009). Argonaute proteins. Nature Struct. Mol. Biol. 21, 60. Makarova, K. S., Aravind, L., Wolf, Y. I. & Koonin, E. V. 5. Makarova, K. S. et al. Dark matter in archaeal 743–753 (2014). Unification of Cas protein families and a simple genomes: a rich source of novel mobile elements, 32. Makarova, K. S. et al. Evolution and classification of scenario for the origin and evolution of CRISPR–Cas defense systems and secretory complexes. the CRISPR–Cas systems. Nature Rev. Microbiol. 9, systems. Biol. Direct 6, 38 (2011). Extremophiles 18, 877–893 (2014). 467–477 (2011). 61. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. The basic 6. Casjens, S. Prophages and bacterial genomics: what 33. van der Oost, J., Jore, M. M., Westra, E. R., building blocks and evolution of CRISPR–Cas systems. have we learned so far? Mol. Microbiol. 49, 277–300 Lundgren, M. & Brouns, S. J. CRISPR-based adaptive Biochem. Soc. Trans. 41, 1392–1400 (2013). (2003). and heritable immunity in prokaryotes. Trends 62. Brouns, S. J. et al. Small CRISPR RNAs guide antiviral 7. Busby, B., Kristensen, D. M. & Koonin, E. V. Biochem. Sci. 34, 401–407 (2009). defense in prokaryotes. Science 321, 960–964 Contribution of phage-derived genomic islands to the 34. Wiedenheft, B., Sternberg, S. H. & Doudna, J. A. (2008). virulence of facultative bacterial pathogens. Environ. RNA-guided genetic silencing systems in bacteria and 63. Rouillon, C. et al. Structure of the CRISPR interference Microbiol. 15, 307–312 (2013). archaea. Nature 482, 331–338 (2012). complex CSM reveals key similarities with cascade. 8. Wicker, T. et al. A unified classification system for 35. Sorek, R., Lawrence, C. M. & Wiedenheft, B. Mol. Cell 52, 124–134 (2013). eukaryotic transposable elements. Nature Rev. Genet. CRISPR-mediated adaptive immune systems in 64. Anantharaman, V., Koonin, E. V. & Aravind, L. 8, 973–982 (2007). bacteria and archaea. Annu. Rev. Biochem. 82, Comparative genomics and evolution of proteins 9. Kapitonov, V. V. & Jurka, J. A universal classification of 237–266 (2013). involved in RNA metabolism. Nucleic Acids Res. 30, eukaryotic transposable elements implemented in 36. Barrangou, R. & Marraffini, L. A. CRISPR–Cas 1427–1464 (2002). Repbase. Nature Rev. Genet. 9, 411–412 (2008). systems: prokaryotes upgrade to adaptive immunity. 65. Anantharaman, V., Aravind, L. & Koonin, E. V. 10. Jurka, J., Kapitonov, V. V., Kohany, O. & Jurka, M. V. Mol. Cell 54, 234–244 (2014). Emergence of diverse biochemical activities in Repetitive sequences in complex genomes: structure 37. Manica, A. & Schleper, C. CRISPR-mediated defense evolutionarily conserved structural scaffolds of and evolution. Annu Rev Genomics Hum Genet 8, mechanisms in the hyperthermophilic archaeal genus proteins. Curr Opin Chem Biol 7, 12–20 (2003). 241–259 (2007). Sulfolobus. RNA Biol 10, 671–678 (2013). 66. Clery, A., Blatter, M. & Allain, F. H. RNA recognition 11. Curcio, M. J. & Derbyshire, K. M. The outs and ins of 38. van der Oost, J., Westra, E. R., Jackson, R. N. & motifs: boring? Not quite. Curr Opin Struct Biol 18, transposition: from mu to kangaroo. Nature Rev. Mol. Wiedenheft, B. Unravelling the structural and 290–298 (2008). Cell Biol. 4, 865–877 (2003). mechanistic basis of CRISPR–Cas systems. Nature 67. Koonin, E. V. & Makarova, K. S. CRISPR–Cas: 12. Chandler, M. et al. Breaking and joining single- Rev. Microbiol. 12, 479–492 (2014). evolution of an RNA-based adaptive immunity system stranded DNA: the HUH endonuclease superfamily. 39. Weinberger, A. D. et al. Persisting viral sequences in prokaryotes. RNA Biol. 10, 679–686 (2013). Nature Rev. Microbiol. 11, 525–538 (2013). shape microbial CRISPR-based immunity. PLoS 68. Makarova, K. S., Wolf, Y. I., van der Oost, J. & 13. Ilyina, T. V. & Koonin, E. V. Conserved sequence motifs Comput. Biol. 8, e1002475 (2012). Koonin, E. V. Prokaryotic homologs of Argonaute in the initiator proteins for rolling circle DNA 40. Koonin, E. V. & Wolf, Y. I. Is evolution Darwinian or/and proteins are predicted to function as key components replication encoded by diverse replicons from Lamarckian? Biol. Direct 4, 42 (2009). of a novel system of defense against mobile genetic eubacteria, eucaryotes and archaebacteria. Nucleic 41. Wang, X. H. et al. RNA interference directs innate elements. Biol. Direct 4, 29 (2009). Acids Res. 20, 3279–3285 (1992). immunity against viruses in adult Drosophila. Science 69. Olovnikov, I., Chan, K., Sachidanandam, R., 14. Krupovic, M. Networks of evolutionary interactions 312, 452–454 (2006). Newman, D. K. & Aravin, A. A. Bacterial Argonaute underlying the polyphyletic origin of ssDNA viruses. 42. Shabalina, S. A. & Koonin, E. V. Origins and evolution samples the transcriptome to identify foreign DNA. Curr Opin Virol 3, 578–586 (2013). of eukaryotic RNA interference. Trends Ecol. Evol. 23, Mol. Cell 51, 594–605 (2013). 15. Goodwin, T. J. & Poulter, R. T. A new group of tyrosine 578–587 (2008). 70. Swarts, D. C. et al. DNA-guided DNA interference by a recombinase-encoding retrotransposons. Mol. Biol. 43. Carthew, R. W. & Sontheimer, E. J. Origins and prokaryotic Argonaute. Nature 507, 258–261 Evol. 21, 746–759 (2004). mechanisms of mi­RNAs and siRNAs. Cell 136, (2014). 16. Boocock, M. R. & Rice, P. A. A proposed mechanism 642–655 (2009). 71. Beloglazova, N. et al. A novel family of sequence- for IS607‑family serine transposases. Mob. DNA 4, 24 44. Zhou, R. & Rana, T. M. RNA-based mechanisms specific endoribonucleases associated with the (2013). regulating host–virus interactions. Immunol. Rev. 253, clustered regularly interspaced short palindromic 17. Weichenrieder, O., Repanas, K. & Perrakis, A. 97–111 (2013). repeats. J. Biol. Chem. 283, 20361–20371 (2008). Crystal structure of the targeting endonuclease of 45. Medzhitov, R. Approaching the asymptote: 20 years 72. Han, D. & Krauss, G. Characterization of the the human LINE‑1 retrotransposon. Structure 12, later. Immunity 30, 766–775 (2009). endonuclease SSO2001 from Sulfolobus solfataricus 975–986 (2004). 46. Kawai, T. & Akira, S. The role of pattern-recognition P2. FEBS Lett. 583, 771–776 (2009). 18. Aswad, A. & Katzourakis, A. Paleovirology and virally receptors in innate immunity: update on Toll-like 73. Makarova, K. S., Anantharaman, V., Aravind, L. & derived immunity. Trends Ecol. Evol. 27, 627–636 receptors. Nature Immunol. 11, 373–384 (2010). Koonin, E. V. Live virus-free or die: coupling of (2012). 47. Malmgaard, L. Induction and regulation of IFNs antivirus immunity and programmed suicide or 19. Feschotte, C. & Gilbert, C. Endogenous viruses: during viral infections. J. Interferon Cytokine Res. 24, dormancy in prokaryotes. Biol. Direct 7, 40 (2012). insights into viral evolution and impact on host 439–454 (2004). 74. Kwon, A. R. et al. Structural and biochemical biology. Nature Rev. Genet. 13, 283–296 (2012). 48. Le Page, C., Genin, P., Baines, M. G. & Hiscott, J. characterization of HP0315 from Helicobacter pylori 20. Duggal, N. K. & Emerman, M. Evolutionary conflicts Interferon activation and innate immunity. Rev as a VapD protein with an endoribonuclease activity. between viruses and restriction factors shape Immunogenet 2, 374–386 (2000). Nucleic Acids Res. 40, 4216–4228 (2012). immunity. Nature Rev. Immunol. 12, 687–695 49. Cooper, M. D. & Alder, M. N. The evolution of 75. Unterholzner, S. J., Poppenberger, B. & Rozhon, W. (2012). adaptive immune systems. Cell 124, 815–822 Toxin–antitoxin systems: biology, identification, 21. Forterre, P. & Prangishvili, D. The major role of viruses (2006). and application. Mob. Genet. Elements 3, e26219 in cellular evolution: facts and hypotheses. Curr Opin 50. Boehm, T. Design principles of adaptive immune (2013). Virol 3, 558–565 (2013). systems. Nature Rev. Immunol. 11, 307–317 (2011). 76. Krupovic, M., Gonnet, M., Hania, W. B., Forterre, P. & 22. Koonin, E. V. & Dolja, V. V. A virocentric perspective on 51. Davis, M. M. The evolutionary and structural ‘logic’ Erauso, G. Insights into dynamics of mobile genetic the evolution of life. Curr Opin Virol 3, 546–557 (2013). of antigen receptor diversity. Semin. Immunol. 16, elements in hyperthermophilic environments from five 23. Labrie, S. J., Samson, J. E. & Moineau, S. 239–243 (2004). new Thermococcus plasmids. PLoS ONE 8, e49044 Bacteriophage resistance mechanisms. Nature Rev. 52. Cannon, J. P., Haire, R. N., Rast, J. P. & Litman, G. W. (2013). Microbiol. 8, 317–327 (2010). The phylogenetic origins of the antigen-binding 77. Kunin, V., Sorek, R. & Hugenholtz, P. Evolutionary 24. Seed, K. D., Lazinski, D. W., Calderwood, S. B. & receptors and somatic diversification mechanisms. conservation of sequence and secondary structures in Camilli, A. A bacteriophage encodes its own Immunol. Rev. 200, 12–22 (2004). CRISPR repeats. Genome Biol. 8, R61 (2007). CRISPR/Cas adaptive response to evade host innate 53. Market, E. & Papavasiliou, F. N. V(D)J recombination 78. Datsenko, K. A. et al. Molecular memory of prior immunity. Nature 494, 489–491 (2013). and the evolution of the adaptive immune system. infections activates the CRISPR–Cas adaptive 25. Boehm, T. Evolution of vertebrate immunity. Curr. Biol. PLoS Biol. 1, e16 (2003). bacterial immunity system. Nature Commun. 3, 945 22, R722–732 (2012). 54. Jung, D. & Alt, F. W. Unraveling V(D)J recombination; (2012). 26. Rimer, J., Cohen, I. R. & Friedman, N. Do all creatures insights into gene regulation. Cell 116, 299–311 79. Yosef, I., Goren, M. G. & Qimron, U. Proteins and possess an acquired immune system of some sort? (2004). DNA elements essential for the CRISPR adaptation Bioessays 36, 273–281 (2014). 55. Fugmann, S. D. The origins of the Rag genes — from process in Escherichia coli. Nucleic Acids Res. 40, 27. Akira, S., Uematsu, S. & Takeuchi, O. transposition to V(D)J recombination. Semin. 5569–5576 (2012). Pathogen recognition and innate immunity. Cell 124, Immunol. 22, 10–16 (2010). 80. Chylinski, K., Le Rhun, A. & Charpentier, E. 783–801 (2006). 56. Kapitonov, V. V. & Jurka, J. RAG1 core and V(D)J The tracrRNA and Cas9 families of type II CRISPR–Cas 28. Makarova, K. S., Grishin, N. V., Shabalina, S. A., recombination signal sequences were derived from immunity systems. RNA Biol. 10, 726–737 (2013). Wolf, Y. I. & Koonin, E. V. A putative RNA-interference- Transib transposons. PLoS Biol. 3, e181 (2005). 81. Chylinski, K., Makarova, K. S., Charpentier, E. & based immune system in prokaryotes: computational 57. Krupovic, M., Makarova, K. S., Forterre, P., Koonin, E. V. Classification and evolution of type II analysis of the predicted enzymatic machinery, Prangishvili, D. & Koonin, E. V. Casposons: a new CRISPR–Cas systems. Nucleic Acids Res. 42, functional analogies with eukaryotic RNAi, and superfamily of self-synthesizing DNA transposons at 6091–6105 (2014). hypothetical mechanisms of action. Biol. Direct 1, 7 the origin of prokaryotic CRISPR–Cas immunity. BMC 82. Bao, W. & Jurka, J. Homologues of bacterial (2006). Biol. 12, 36 (2014). TnpB_IS605 are widespread in diverse eukaryotic 29. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. 58. Nunez, J. K. et al. Cas1–Cas2 complex formation transposable elements. Mob DNA 4, 12 (2013). Comparative genomics of defense systems in archaea mediates spacer acquisition during CRISPR–Cas 83. Kim, H. & Kim, J. S. A guide to genome engineering and bacteria. Nucleic Acids Res. 41, 4360–4377 adaptive immunity. Nature Struct. Mol. Biol. 21, with programmable nucleases. Nature Rev. Genet. 15, (2013). 528–534 (2014). 321–334 (2014).

8 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2014 Macmillan Publishers Limited. All rights reserved PERSPECTIVES

84. Tonegawa, S. Somatic generation of antibody diversity. 93. Du Pasquier, L. Speculations on the origin of the 103. Gladyshev, E. A. & Arkhipova, I. R. A widespread Nature 302, 575–581 (1983). vertebrate immune system. Immunol. Lett. 92, 3–9 class of reverse transcriptase-related cellular genes. 85. Papavasiliou, F. N. & Schatz, D. G. Somatic (2004). Proc. Natl Acad. Sci. USA 108, 20311–20316 hypermutation of immunoglobulin genes: merging 94. Du Pasquier, L. Innate immunity in early chordates (2011). mechanisms for genetic diversity. Cell 109, S35–S44 and the appearance of adaptive immunity. C. R. Biol. 104. Pardue, M. L. & De Baryshe, P. G. Retrotransposons (2002). 327, 591–601 (2004). provide an evolutionarily robust non-telomerase 86. Oettinger, M. A., Schatz, D. G., Gorka, C. & 95. Fugmann, S. D., Messier, C., Novack, L. A., mechanism to maintain telomeres. Annu. Rev. Genet. Baltimore, D. RAG‑1 and RAG‑2, adjacent genes that Cameron, R. A. & Rast, J. P. An ancient evolutionary 37, 485–511 (2003). synergistically activate V(D)J recombination. Science origin of the Rag1/2 gene locus. Proc. Natl Acad. Sci. 105. Kapitonov, V. V. & Jurka, J. Harbinger transposons and 248, 1517–1523 (1990). USA 103, 3728–3733 (2006). an ancient HARBI1 gene derived from a transposase. 87. Swanson, P. C. The bounty of RAGs: recombination 96. Hemmrich, G., Miller, D. J. & Bosch, T. C. DNA Cell Biol. 23, 311–324 (2004). signal complexes and reaction outcomes. Immunol. The evolution of immunity: a low-life perspective. 106. Sinzelle, L. et al. Transposition of a reconstructed Rev. 200, 90–114 (2004). Trends Immunol. 28, 449–454 (2007). Harbinger element in human cells and functional 88. Panchin, Y. & Moroz, L. L. Molluscan mobile elements 97. Bowen, N. J. & Jordan, I. K. Transposable elements homology with two transposon-derived cellular genes. similar to the vertebrate recombination-activating and the evolution of eukaryotic complexity. Proc. Natl Acad. Sci. USA 105, 4715–4720 (2008). genes. Biochem. Biophys. Res. Commun. 369, Curr. Issues Mol. Biol. 4, 65–76 (2002). 107. Mali, P. et al. RNA-guided human genome engineering 818–823 (2008). 98. Kazazian, H. H. Jr. Mobile elements: drivers of genome via Cas9. Science 339, 823–826 (2013). 89. Sakano, H., Huppi, K., Heinrich, G. & Tonegawa, S. evolution. Science 303, 1626–1632 (2004). 108. Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. Sequences at the somatic recombination sites of 99. Jordan, I. K., Rogozin, I. B., Glazko, G. V. & & Doudna, J. A. DNA interrogation by the CRISPR immunoglobulin light-chain genes. Nature 280, Koonin, E. V. Origin of a substantial fraction of human RNA-guided endonuclease Cas9. Nature 507, 62–67 288–294 (1979). regulatory sequences from transposable elements. (2014). 90. Thompson, C. B. New insights into V(D)J Trends Genet. 19, 68–72 (2003). recombination and its role in the evolution of the 100. Conley, A. B., Piriyapongsa, J. & Jordan, I. K. Acknowledgements immune system. Immunity 3, 531–539 (1995). Retroviral promoters in the human genome. The authors thank K. Makarova for critical reading of the 91. Roth, D. B. & Craig, N. L. VDJ recombination: a Bioinformatics 24, 1563–1567 (2008). manuscript and comments. E.V.K is supported by intramural transposase goes to work. Cell 94, 411–414 101. Jurka, J. Conserved eukaryotic transposable elements funds of the US Department of Health and Human Services (1998). and the evolution of gene regulation. Cell. Mol. Life (to the National Library of Medicine). 92. Schatz, D. G. Antigen receptor genes and the Sci. 65, 201–204 (2008). evolution of a recombinase. Semin. Immunol. 16, 102. Nakamura, T. M. & Cech, T. R. Reversing time: origin Competing interests statement 245–256 (2004). of telomerase. Cell 92, 587–590 (1998). The authors declare no competing interests.

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 9

© 2014 Macmillan Publishers Limited. All rights reserved