1 A superfamily of single stranded DNA nucleases in the life of mobile genetic 2 elements 3 4 5 Michael Chandler 1*, Fernando de la Cruz 2, Fred Dyda 3, Alison B. Hickman 3, Gabriel Moncalian 2, 6 Bao Ton-Hoang 1 7 8 9 1 Laboratoire de Microbiologie et Génétique Moléculaires, Centre National de Recherche 10 Scientifique, Unité Mixte de Recherche 5100, 118 Rte de Narbonne, F31062 Toulouse Cedex, 11 France 12 13 2 Departamento de Biología Molecular e Instituto de Biomedicina y Biotecnología de Cantabria, 14 Universidad de Cantabria-Consejo Superior de Investigaciones Científicas-SODERCAN, 15 Santander, Spain; 16 17 3Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney 18 Diseases, NIH, Bethesda, MD 19 USA 20 21 22 23 24 25 26 27 28 29 30 * corresponding author 31 32 Michael Chandler: [email protected] 33 Fernando de la Cruz: [email protected] 34 Fred Dyda: [email protected] 35 Alison B. Hickman: [email protected] 36 Gabriel Moncalian: [email protected] 37 Bao Ton-Hoang: [email protected]

1 1 Abstract: HUH endonucleases are numerous and widespread in all three domains of life. 2 The major role of these is in processing a variety of by 3 catalysing cleavage and rejoining of single-stranded DNA using an 4 residue to make a transient 5' phosphotyrosine bond with the DNA substrate. They play a 5 role in rolling circle and bacteriophage replication, in plasmid transfer, in the 6 replication of various eukaryotic and in different types of transposition. They have 7 also been appropriated for cellular processes such as intron homing and processing of 8 bacterial repeated extragenic palindromes. We provide an overview of these fascinating 9 enzymes, their different roles, functions and the mechanisms used in their different 10 activities. We also explore the functional diversity of HUH enzymes using well- 11 characterised examples of Rep proteins, relaxase proteins and transposases. 12

2 1 While the double helix is certainly the most iconic feature of DNA, during replication, 2 transcription, , as well as a variety of repair and recombination 3 processes, DNA must assume a transient single-stranded form. Single-stranded DNA 4 (ssDNA) also serves as the packaged genetic material in some viruses and 5 bacteriophages. It is therefore no surprise that there is a dedicated endonuclease 6 superfamily, the HUH endonucleases, named after a characteristic double- motif, 7 whose members exclusively process ssDNA using particular recognition and reaction 8 mechanisms for site-specific ssDNA cleavage and ligation. 9 HUH endonucleases are numerous and widespread in all three domains of life. Two major 10 classes within this superfamily are the Rep (for replication) and relaxase (Mob for 11 mobilisation) proteins involved in DNA processing during plasmid replication and 12 conjugation, respectively. However, HUH enzymes have also been identified in other DNA 13 transactions, such as replication of certain bacteriophages and eukaryotic viruses, different 14 types of transposition, and have also been appropriated for cellular processes such as 15 intron homing and processing of bacterial repeated extragenic palindromes. Here, we 16 provide an overview of these fascinating enzymes, their different roles, functions and the 17 mechanisms used in their different activities. We will explore the functional diversity of 18 HUH enzymes using well-characterised examples of Rep proteins, relaxase proteins and 19 transposases. 20 21 Mechanism and Overall Protein Architecture. 22 The first member of the HUH superfamily, protein A (gpA) of bacteriophage φX174, was 23 identified in early studies of phage replication (see 1). Subsequent bioinformatic 24 approaches 2,3 revealed many related proteins forming a vast superfamily, which includes 25 members involved in the of viral and plasmid rolling circle replication (RCR), 26 conjugative plasmid transfer, and DNA transposition 4,5,6,7,8,9. This familial relationship is 27 based on several conserved protein motifs, including the "HUH" motif composed of two 28 histidine (H) residues separated by a bulky hydrophobic (U) residue, and the Y-motif 29 containing either one or two tyrosine (Tyr) residues (found in Y1 and Y2 enzymes 30 respectively). 31 Y1 HUH enzymes (Fig. 1C) include Rep proteins of some that replicate 32 using ssDNA intermediates (such as pUB110 10 ), a wide range of eukaryotic viruses 11 , 33 most relaxases, transposases of ISCRs (insertion sequences related to IS 91 8) and of the

3 1 IS 200 /IS 605 insertion sequence family 6,7. Y2 enzymes include φX174 gpA itself, Rep 2 proteins of other isometric ssDNA and dsDNA phages (such as P2 12 ), some 3 cyanobacterial and archaeal plasmids and parvoviruses (e.g. adeno-associated , 4 4 AAV) as well as transposases of the IS 91 and helitron families , and MOB F family 5 relaxases. In some cases, both Y residues are mechanistically important while for others, 6 only one of the pair appears to be essential. 7 HUH enzymes catalyse ssDNA breakage and joining with a unique mechanism. 8 They use a Y-motif tyrosine to create a 5'-phosphotyrosine intermediate and a free 3'-OH 9 at the cleavage site (Fig. 1A). Subsequently, the 3'-OH can be used for different tasks. The 10 most obvious task is to prime replication, as observed for HUH domains in Rep proteins of 11 single-stranded bacteriophages, RCR plasmids and conjugative relaxases. The 3'-OH 12 group can also act as the nucleophile for strand transfer to resolve the phosphotyrosine 13 intermediate in the termination step of RCR replication, conjugative transfer and 14 transposition (see below). 15 The cleavage polarity of HUH enzymes is inverse to that of tyrosine recombinases, 16 which make 3' phosphotyrosine intermediates and generate free 5'-OH groups that cannot 17 be used as replication primers 13 . A further contrast to the -independent tyrosine 18 recombinases is that HUH activities require a divalent metal ion to facilitate 19 cleavage by localizing and polarising the scissile . Depending on the 20 enzyme, Mg 2+ , Mn 2+ or other divalent metal ions can be used in vitro 14,15,16,17,18,19 although 21 it is often presumed that Mg 2+ or Mn 2+ are the physiological cofactors. The histidine pair of 22 the HUH motif provides two of the three ligands necessary for metal ion coordination (Fig. 23 1A). The location and identity of the third, invariably polar (Glu, Asp, His or Gln) residue 24 varies across the superfamily. 25 Three-dimensional structures of several Rep and relaxase HUH domains have been 26 determined with and without bound DNA (see references in Table S1), and have been 27 crucial to understanding their mechanisms. The common element of the HUH fold (Fig. 28 1B) is a central, four or five stranded antiparallel β-sheet generally sandwiched between α- 29 helices. The HUH motif is invariably located on a single β-strand whereas the catalytic Tyr 30 residue(s) is(are) located on a nearby α-helix (Fig. 1B). The order of HUH and Y motifs 31 varies in the primary sequence: in the Relaxase group, the Y-motif is upstream of the 32 HUH-motif whereas in the Rep group it is downstream (Fig. 1B and C). This circular 33 permutation 3,20,21 of the motifs relative to the primary sequence (Fig. 1C) changes the

4 1 topology of the domains (Fig. 1B). Nevertheless, the three-dimensional constellation of 2 active site residues is virtually identical across the superfamily. 3 Given the diverse functions of HUH proteins, it is not surprising that other domains 4 are often appended to the HUH domain (Fig. 1C). In many cases, these additional 5 domains are of unknown function. However, ATP dependent , zinc binding, 6 and multimerisation domains are recurring themes 4,22,23,24,25,26,27 . For example, 7 the ssDNA substrates needed by HUH enzymes can be generated by a dedicated DNA 8 helicase domain C-terminal to the HUH domain 4,23,28,29 or alternatively by recruitment of a 9 host helicase 24,25,26,27 . RCR processes use 3´-5´helicase activity acting on the template 10 strand to facilitate DNA unwinding at the replication fork while in conjugation, (as 11 part of the relaxase) are transported into the recipient cell and track 5´to 3´on the 12 transported ssDNA. 13 14 DNA Recognition 15 A feature of many HUH nucleases is that they can recognize and bind DNA hairpin 16 structures and the cleavage sites are located either within the hairpin or in the ssDNA 17 located on the 5' or 3' side of the stem. The crucial role of hairpins has been firmly 18 established in many systems including plasmid conjugation, eukaryotic viral and plasmid 19 replication and transposition 6,7,15,30,31,32,33 . In other systems, palindromic sequences that 20 can form DNA hairpins are present near the probable HUH nuclease cleavage sites 34 . 21 Such hairpins can be formed in vivo under a number of physiological conditions (see 35 ). 22 Structural studies on HUH nucleases revealed that small DNA hairpins can be 23 recognized in several different ways (Fig. 2): sequence-specific recognition of the dsDNA 24 stem; structure-specific recognition of irregularities in the stem; or sequence-specific 25 recognition of the hairpin loop 7,14,19 ,21,22,32 . 26 The hairpin-flanking DNA - in many cases in single-stranded form - is also often 27 important for recognition. Relaxases, for example, make extensive contacts with the bases 28 extending between the hairpin and cleavage site 19,21,36 (Fig. 2), and for a relative of

29 IS 200 /IS 608 transposases, TnpA REP , it has been shown that nucleotides on the 5' side of 30 the hairpin are crucial for binding and for sequence-specific recognition 32 . Other family 31 members 22,37 , have more complex binding modes (see below). 32 Below, we explore the functional diversity of HUH enzymes using well-characterised 33 examples of various Rep and Relaxase proteins and transposases. 34

5 1 Rep proteins: rolling circle replication in phages, plasmids and viruses 2 RCR (Fig. 3), proposed over 40 years ago 38 for single-stranded phage replication, is now 3 also known to be a well-established mechanism adopted by certain eukaryotic viruses and 4 bacterial plasmids and has been extensively reviewed 1,34,39 . 5 6 At the origin: bacteriophage φφφX174 7 Studies on φX174 gpA protein revealed several functional details that are 8 recapitulated in other HUH domain-containing systems, and φX174 provides a general 9 framework for understanding DNA transactions involving HUH enzymes. 10 When φX174 infects its E. coli host, its circular ss (+) strand DNA genome is injected into 11 the cell and converted to a duplex form resembling supercoiled dsDNA plasmids by host 12 enzymes. RCR is initiated when gpA recognizes the 3' region of a short (approximately 13 30bp) sequence at the (Fig 3), unwinds DNA upstream and nicks the 14 (+) strand at a specific site upstream of the recognition sequence to form a covalent 5' 15 phosphotyrosine intermediate and a free 3'-OH 40,42 . Although the DNA recognition 16 sequence and cleavage site are known, surprisingly, no detailed analysis of how gpA 17 recognises and binds its target DNA has been published. Replication is initiated using the 18 3'-OH end, and the (+) strand is "peeled off". As replication proceeds, the recognition and 19 site are regenerated. gpA recognizes this regenerated sequence and nicks the (+) 20 strand using its second Tyr in the Y-motif 43 . The newly generated 3'-OH then attacks the 21 first phosphotyrosine linkage, generating a free ssDNA circle. Alternating use of the active 22 site Tyr residues allows gpA to spin off many ssDNA circles 44 . 23 This process led to a model in which the Tyr residues, located on the same face of 24 an α-helix, can be positioned alternatively to attack the scissile phosphate that is localized 25 and polarized by the HUH bound metal ion 42,44 . Alternating use of Tyr residues probably 26 necessitates conformational changes that may be a crucial feature of the catalytic cycle of 27 other HUH enzymes (e.g. 12,45 ). 28 29 RCR plasmids 30 Plasmid RCR replication closely mirrors that of phage. However, the lifestyle of 31 viruses benefits from the synthesis of multiple genomic copies, whereas that of bacterial 32 plasmids requires close coordination with host cell growth. Plasmids must therefore 33 possess mechanisms to prevent unregulated premature reinitiation of replication.

6 1 Several models have been proposed to explain how this is achieved. For the 2 monomeric Y1 Rep protein of pC194, the final step in replication can be accomplished by 3 an essential active site glutamate (E) which replaces the second Y. This permits cleavage 4 of the phosphotyrosine bond by a bound water molecule (Fig. 3) and release of the 5 enzyme. Importantly, an E-Y mutant, which emulates the phage Rep protein sequence, 6 generates multimeric ssDNA plasmid copies 46 . An alternative, suicide, mechanism was 7 proposed for other plasmid Y1 RCR systems, where an inactive Rep containing one active 8 and one inactive Rep monomer is generated upon termination by covalent modification 47 . 9 The only example of an RCR plasmid Rep structure is that of the Y1 RepB of 10 pMV158 16 . In addition to an N-terminal HUH domain, RepB carries a small C-terminal 11 oligomerisation domain which assembles six nuclease domains into a ring with a small 12 central channel, potentially providing multiple second for catalysis. RepB DNA 13 recognition is complex, with two sets of binding sites, the distal and proximal direct 14 repeats, downstream of the hairpin-containing cleavage site 37 . 15 16 Viral Rep proteins 17 Eukaryotic parvoviral replication provides an intriguing variation of the RCR cycle 48 . 18 Parvoviruses such as the adeno-associated virus (AAV) have linear ssDNA genomes, and 19 replicate by a "rolling hairpin" mechanism (Fig. 4). Crucial for this are the final ~160 nts on 20 each end of the ss viral genome, which can fold into fully complementary three-way DNA 21 junctions known as Inverted Terminal Repeats (ITRs; represented two-dimensionally as a 22 T-shaped folded structure; Fig. 4 insert). As a result, replication of one end is 23 straightforward: one ITR provides a free 3'-OH group from which leading strand synthesis 24 can be initiated using host cell enzymes. Replication of the other viral end requires a site- 25 specific nick to generate a second 3'-OH at the "terminal resolution site" ( trs ). Cleavage is 26 catalyzed by the viral Rep protein which possesses an N-terminal HUH domain and a C- 27 terminal superfamily 3 (SF3) hexameric 3'-5' helicase domain separated by a functionally 28 important linker region 49,50 (Fig 1C). Once a nick is introduced at the second ITR, the 29 cellular replication apparatus completes viral genome replication. 30 Structural characterization of the AAV serotype 5 Rep N-terminal HUH domain with 31 and without portions of its ITR substrate revealed that several AAV5 Rep monomers 32 recognise and bind tandem tetranucleotide repeats within the Rep located on 33 one branch of the ITR (RBS; Fig. 4 insert), to form a spiral curling around the RBS site 34 22,51 . In this case, the HUH nuclease domain recognizes dsDNA via major and minor

7 1 groove interactions and contacts two adjacent tetranucleotide RBS repeats. The tip of one 2 hairpin branch (RBE', for Rep Binding Element) is also recognized by AAV Rep, although 3 via a different surface of the HUH domain 22 (Fig. 2), an interaction important for AAV 4 replication 30 . Rep helicase activity is thought to generate ssDNA for trs hairpin formation 5 (Fig. 4) 28 . After trs cleavage, AAV Rep remains covalently attached to the viral DNA 6 through its active site Tyr. In addition, the helicase domain also directs packaging of the 7 newly replicated genomes into capsids 52 . Although AAV Rep is a Y2 nuclease, there is no 8 evidence that the second tyrosine is required for replication. 9 The HUH nuclease domains of several other eukaryotic viral Rep proteins 10 belonging to viruses with circular ssDNA genomes have been structurally characterized. 11 The replication of these presumably proceeds using an RCR mechanism similar to that 12 used by bacteriophages and plasmids. These Rep proteins include those of tomato yellow 13 leaf curl virus (TYLCV), a geminivirus 53 ; thenanovirus Faba Bean Necrotic Yellows Virus 14 (FBNYV); and porcine circovirus (PCV) 54,55 . The structure of the dimeric Rep protein from 15 the archaeal Rudivirus SIRV1 has also been determined and its biochemical activity 16 characterised. The SIRV1 genome is linear and does not replicate by RCR but could share 17 some mechanistic aspects with the parvoviruses 56 . All these viral Rep proteins have only 18 one conserved Tyr at their active sites, raising the possibility that a second is provided by 19 another protomer or that an alternate second nucleophile such as water is used, as 20 proposed for RCR plasmid copy number control 46,57,58,59 . 21 22 Plasmid Relaxases 23 Plasmid conjugation (Fig. 5), which was discovered over sixty years ago 60 , involves 24 transfer of a ssDNA plasmid copy from one cell to another through a pore formed by a 25 specialized T4SS secretion system 61 . Conjugation is a key process primarily responsible 26 for , for example of antibiotic resistance genes between bacterial 27 species. More recently a large family of non-autonomous elements, Conjugative 28 transposons or Integrative Conjugative Elements (ICE), has been identified. These 29 elements are extremely widespread 62 . They are integrated into the host genome and are 30 excised as circular intermediates and transferred by conjugation. Like conjugative 31 plasmids, they encode relaxases with a potential divalent metal ion binding cluster (but not 32 necessarily HUH ligands) and a conserved Y, first identified in Tn 916 63 and subsequently

33 in many examples from clinically important gram positive bacteria (see 9). 34

8 1 General Mechanism 2 The biochemical signal for initiating conjugative DNA processing is unknown but is 3 thought to be triggered by donor-recipient cell interaction. Conjugation begins with nicking 4 of the circular dsDNA plasmid at a specific site ( nic site) within its origin of transfer ( oriT ) 5 by the HUH nuclease of the plasmid-encoded relaxase protein (Fig.5). This initiates 6 "conjugative RCR", which peels off the strand-specific transferred ssDNA copy of the 7 plasmid. Relaxase remains covalently attached to this ssDNA and is transported into the 8 recipient cell where it tracks 5'-3' along the ssDNA 64 . Relaxase catalyzes recircularization 9 of the transferred ssDNA plasmid in the recipient cell where complementary strand 10 synthesis by a host cell polymerase converts the plasmid into its dsDNA form. Donor 11 plasmid replication occurs in the donor cell concomitantly with transfer, but is not essential 12 for the transfer process 65 . Termination and generation of a circular plasmid copy can 13 occur in several ways depending on whether the second Tyr is carried in the same 14 relaxase molecule or not and whether the termination occurs in the donor or in the 15 recipient cell (see below). 16 Catalytic centre 17 Relaxases (Fig. 1C) have been classified into six families based on their N-terminal 9 18 HUH domains . Four, MOB F, MOB Q, MOB P and MOB V, generally include the HUH motif 19 and a third residue for metal coordination, usually another His, and conserved Tyr

20 residue(s) whereas two, MOB H and MOB C, show a different architecture. Thus, as with 21 rolling-circle plasmid replication (where not all RCR plasmids use HUH Rep proteins), 22 there may be variations in the way strand cleavage and strand transfer occurs for

23 conjugative plasmid transfer. Of the four HUH relaxase families, the MOB F family

24 comprises Y2 enzymes, whereas MOB Q, MOB P and MOB V are all Y1 relaxases. To date, 17,21,66 25 crystallographic structures of the nuclease domains of three MOB F enzymes and 19,67 26 two MOB Q proteins have been determined. In addition to the HUH and Y motifs a third 27 motif with a conserved Asp or Glu might enhance the histidine triad interaction with the 28 divalent cation 51,68 . 29 DNA binding 30 HUH relaxases are thought to initially recognize the dsDNA form of an inverted 31 repeat close to the nic site in the supercoiled plasmid (Fig. 5 insert) 69,70 and, with the help 32 of accessory proteins 61, locally melt the DNA around the nic site (Fig. 5 insert). In the 33 recipient cell, both arms of the inverted repeat in the transferred ssDNA will form a hairpin 34 that is again recognized by the relaxase (Fig. 5 insert).

9 21 1 In the MOB F TrwC-DNA co-crystal structure (Fig. 2), the hairpin is firmly

2 embraced by the protein through a β-ribbon conserved in all three available MOB F 19 3 structures while, in the MOB Q co-crystal, it is bound by two loops . Three C-terminal α-

4 helices form the “fingers” of MOB F relaxases which position the ssDNA U-turn with the nic

5 site in the active site. These fingers are absent in MOB Q proteins and the nic site is instead 6 positioned by burying an adjacent nucleotide in the protein.

7 For MOB F (Y2) relaxases, one Tyr is involved in RCR initiation in the donor cell 8 whereas the other catalyzes termination in the recipient cell (Fig 5 left) 71 . The regenerated 9 nic site in the transferred ssDNA is recognized by the initiating relaxase which remains

10 covalently linked to the 5’ ssDNA end. In all MOB F relaxases, the C-terminal half contains 11 a RecD-like domain which could serve to track on the incoming ssDNA and to 12 reload the nic site onto the relaxase catalytic center in the recipient cell (Fig. 2B). Then, a 13 second nic site cleavage circularises the transferred ssDNA but also joins the relaxase to 14 the newly synthesised nic site (Fig. 5 left). This, in principle, might lead to transfer of a 15 second plasmid copy. It is uncertain how relaxase is removed from the ssDNA to terminate 16 the reaction. A second model proposes that the second cleavage occurs in the donor cell, 17 so the termination reaction will simply be the reversal of the cleavage reaction (Fig. 5 right 18 72 ).

19 MOB Q, MOB P and MOB V HUH family relaxases have variable C-terminal domains 20 73,74 , such as primase, helicase or DNA binding domains. For plasmid RSF1010, the 21 primase domain of the relaxase (Fig. 1) is required for initiation of replication in the 22 recipient cell. 23 24 HUH enzymes as transposases 25 HUH enzymes also act as transposases for members of the prokaryotic IS 200 /IS 605 6, 26 IS 91 75 and ISCR 8 insertion sequence families and the eukaryotic helitrons 4 (Box.1). Of 27 the HUH transposases, the best understood are those of the IS 200 /IS 605 family. 28 IS200/IS605 family 29 This family includes two subgroups: IS 200 and IS 605 . IS 200 group members carry 30 only the transposase gene, tnpA . Those from the IS 605 group carry both tnpA and a 31 second orf , tnpB , (Fig. 6A) whose function is unclear but is not required for transposition. 32 The first family member, IS 200 , was identified about 30 years ago 76 in Salmonella . 33 However, the family paradigms are the more recently identified IS 608 (in Helicobacter 34 pylori 77 ) and IS Dra2 (in Deinococcus radiodurans 78 ). These IS family members use

10 1 obligatory ssDNA intermediates for their mobility. The transposon excises as an ssDNA 2 circle with abutted left (LE) and right (RE) ends and inserts 3' to a conserved element- 3 specific penta- or tetra-nucleotide target. While not part of the IS, this sequence is 4 essential for further transposition. The transposons do not carry terminal inverted repeats 5 but include hairpins in the LE and RE ends recognized and bound by the transposase, 6 TnpA (Fig. 6A). 7 The IS family members excise from, and insert preferentially into, the lagging strand 8 template of chromosome and plasmid replication forks (Fig. 6B). Their lagging strand 9 preference generates an insertion bias reflecting the mode of replication (uni- or bi- 10 directional) of the target 79 . Transposition of IS Dra2 is strongly induced upon 11 recovery of the highly radiation resistant D. radiodurans host from irradiation 80 , certainly 12 owing to the large amount of ssDNA generated during the reassembly of the shattered D. 13 radiodurans genome 81 . 14 IS 200/ IS 605 family transposases are single domain proteins containing only the

15 essential HUH motif and a single catalytic Tyr (Y1 transposases, Fig. 1C). Both TnpA IS 608

16 and TnpA IS Dra2 are obligatory dimers and the active sites are believed to adopt two 17 functionally important conformations, one in which each is composed of the HUH motif 18 from one monomer and the Tyr residue carried by an alpha-helix ( αD) from the other ( trans 19 configuration), and the other in which both motifs are contributed by the same monomer 20 (cis configuration). Only the former has been observed crystallographically. 21 Like relaxases, TnpA also binds hairpins and cleaves ssDNA at some distance. LE 22 and RE cleavage sites (which differ; "C" in Fig 6D) are not recognised directly by the 23 protein but by particular base interactions with short “guide” sequences ("G" in Fig 6D) 5' 24 to each hairpin (Fig. 6D). Cleavage-site recognition requires a network of base interactions 25 14,45,82 which also stabilise the nucleoprotein complex, the transpososome, within which the 26 DNA strand cleavages and transfers are carried out 82 . Changes in the guide sequences 27 generate predictable changes in insertion site specificity 83 . 28 Excision and integration probably occur by alternating the catalytic sites between 29 “trans” and “cis” configurations (Fig. 6C). On cleavage in the “trans” dimer, LE and the RE 30 flank are attached by their 5' P to the attacking Tyr residue, leaving a 3’OH at the LE flank 31 and at RE. Strand joining probably occurs by reciprocal rotation of the two Tyr-carrying 32 alpha-helices and attack of the phosphotyrosine bonds by the 3'OH carried by the bound 33 RE and by the LE flank. This joins LE to RE and the RE flank to the LE flank. The αD

11 1 helices assume a “ cis” configuration at this stage and must be reset to the “trans” 2 configuration for integrating the circular transposon 82 . 3 RCR transposons: IS91, ISCR and Helitron families 4 The earliest identified transposable elements with HUH domain transposases were 5 members of the IS 91 family 5. These are significantly larger than Y1 transposases (Fig. 1), 6 carry a Y2 motif and include an N-terminal zinc binding motif and additional domains of yet 7 unidentified function. 8 More recently, a group of related elements, the ISCRs, has been described (see 8). 9 The CR is an orf resembling IS 91 family transposases but with only a single Tyr (Fig. 1C). 10 ISCRs are often associated with a variety of antibiotic resistance genes. In addition, 11 eukaryotic relatives, the Helitrons, have been identified by bioinformatic approaches (Box 12 1). 13 There is little information concerning either ISCR or Helitron transposition but IS 91 14 is thought to transpose by an RCR mechanism related to RCR plasmid replication. Both 15 Tyr residues are essential for IS 91 transposition in vivo. Although IS 91 ends contain two 16 short flanking inverted repeat sequences, it is not clear whether this is a general feature of

17 the family. Like IS 200 /IS 605 family members, IS 91 inserts with the right end (IR R) 3' to a 18 specific tetranucleotide also required for further transposition (see 84,85 ). Transposition is

19 postulated to initiate at IR R (called ori-IS) by transposase-mediated cleavage and the 3' 20 OH generated in the donor molecule would act as a primer. Transfer of the single strand 21 donor to a target is driven by replication in the donor where displacement of the active 22 transposon strand would be driven by leading strand replication. Transfer of a ssDNA IS

23 copy into the recipient is accomplished when IR L (called ter-IS) is reached and cleaved 24 (reviewed in 5). Termination fails at a relatively high frequency (1% of all insertions). This is 25 known as one-ended transposition since it results in insertion of additional flanking donor 26 plasmid genes 3' to ter-IS 75 . While ori-IS is essential for activity, ter-IS is not: its removal 27 simply increases the relative frequency of one-ended transposition. Acquisition of flanking 28 antibiotic resistance genes by ISCRs could be a manifestation of this property. 29 Despite this attractive model, in vivo studies have identified both ssDNA and dsDNA 30 transposon circles containing covalently joined termini without the flanking tetranucleotide. 31 Circle formation depended on an intact transposase Y2 motif 85 , and thus they may be 32 transposition intermediates analogous to the IS 608 ssDNA circular transposition 33 intermediates. The dsDNA circles could result from replication restart or by extension of a 34 trapped Okazaki fragment on the excised ssDNA circle.

12 1 The relationship between IS 91 transposition and replication of the donor and target 2 replicons is not known. 3 4 Domestication, diversity and versatility. 5 Domestication. 6 A number of studies, particularly among , have provided many examples 7 of transposable elements evolving to assume alternative roles within their host 86 . There 8 are few prokaryotic examples of such "domestication", yet two of these involve HUH 9 enzymes closely related to IS 200 /IS 605 family transposases (Box 2). These are: TnpA-like

10 transposases (TnpA REP ) which can cleave and rejoin so-called Repeated Extragenic 11 Palindromes (REP) in vitro and are thought to be responsible for their dissemination within 12 bacterial genomes; and TnpA-like endonucleases present in some group I introns, called 13 IStrons (Box 2) which presumably play the role of homing endonucleases. 14 Diversity in the catalytic centre 15 HUH enzymes require divalent metal ions and bind these using a triad of amino 16 acids: the HUH His pair and a third residue variably Glu, Asp, His or Gln. In addition to 17 variation in the third metal ion ligand, certain viral Reps with almost identical Rep folds e.g. 18 circoviruses and nanoviruses 54,55,87 include variations in the HUH motif itself as do certain 88 19 Mob P relaxases . In the circoviruses and nanoviruses the motif becomes HUQ . The 20 relationship of these enzymes with HUH nucleases could not be recognised in 21 bioinformatic searches. Indeed, there are even proteins catalysing plasmid RCR or 3,9,39 22 conjugation that do not contain an HUH motif at all (e.g. ). For example, MOB H 23 relaxases contain an HHH motif and a motif (both potential metal divalent metal

24 ion binding clusters) and an upstream conserved Y, while MOB C relaxases contain a 25 conserved, a potential metal binding DEE triad together with the Y motif 9. Their 26 relationship to HUH nucleases must await three dimensional structural information. Thus 27 the family of nucleases which use the "HUH" mechanism of strand cleavage and rejoining 28 is likely to extend significantly further than the HUH group. 29 Similarities with other enzymes 30 HUH nucleases are structurally similar to the replication origin-binding domains, 31 OBD, of some dsDNA viruses 51,53 , and also show strong similarity with the RNA 32 recognition motif, RRM 51,53,89 , suggesting intricate and ancient evolutionary relationships 33 between these proteins. 34 Helicases and the sources of ssDNA

13 1 A central question in the activity of HUH nucleases is how their substrates become 2 available in the ssDNA form. This role could be fulfilled by the associated helicase 3 domains of many HUH enzymes (Fig. 1C) or by their interaction with cellular helicases 4 capable of unwinding dsDNA. For AAV Rep, the associated SF3 helicase activity is 5 needed for cleavage, suggesting that it contributes to formation of the trs -containing 6 hairpin (Fig. 4 insert). 7 In prokaryotic systems, where plasmids are supercoiled, a certain level of 8 supercoiling can lead to hairpin extrusion without the need for helicase activity. φX174 9 replication is absolutely dependent on the cellular Rep helicase (a SF1 3'-5' helicase) 90 , 10 but its role is in unwinding the phage dsDNA form during RCR after gpA-mediated nicking. 11 The 5'-3' helicase activity of relaxases on the other hand is thought to promote 12 tracking of the enzyme along the single transferred strand to position it correctly for the 13 termination step (Fig. 5). While relaxase is capable of binding to one arm of the dsDNA 14 site at oriT , additional accessory proteins are involved in melting this region to enable 15 hairpin formation and nicking (Fig. 5 insert). Prokaryotic ssDNA transposases of the 16 IS 200 /IS 605 family rely on sources of ssDNA 79 such as the replication fork lagging-strand 17 template, ssDNA intermediates generated during DNA repair and possibly R-loops 18 generated during intensive transcription, and therefore do not require a dedicated helicase. 19 However, the situation is not clear for IS 91 , and by extension, for the related ISCRs and 20 for the eukaryotic helitrons which are thought to require transposon-specific replication 5. 21 While helitrons encode a 5'-3' SF1 superfamily helicase, both IS 91 and ISCR 22 transposases have a C-terminal domain of unknown function, and it is tempting to 23 speculate that these might interact with a cellular helicase. 24 Interchangeable functions? 25 As an illustration of the versatility of HUH nucleases, members of the Rep and 26 relaxase families can also promote intermolecular strand transfer leading to integration. 27 AAV serotype 2 can integrate site-specifically into the human chromosome (chr19) at a 28 locus, AAVS1 , carrying an RBS-like tandem repeat sequence and a nearby trs site 91 . Site- 29 specific integration is Rep-dependent but the mechanism is unclear. Synapsis between the 30 viral ITR and AAVS1 may be facilitated for instance by Rep protomers assembled on 31 AAVS1 as hexa- or octameric rings 92,93 . In sharp contrast to Y1 transposases, integration 32 is not precise 94 . Most events occur hundreds of bps away from AAVS1 in an apparently 33 asymmetric fashion.

14 1 Some relaxases can also site-specifically integrate the transferred strand into the 2 genome of a recipient bacterial cell 64 , provided a second oriT is present in the target 3 genome. Both the relaxase and a second TrwC domain in the 600 N-terminal residues are 4 required for this activity. The reaction involves a complex sequence of events to resolve 5 the presumed intermediates 95,96,97 . This site-specific integration reaction has wide 6 potential applications in biotechnology and biomedicine to directly transfer DNA from 7 bacteria to eukaryotic cells 98,99 . In this regard, relaxase can be specifically targeted to the 8 nucleus 100 although relaxase-mediated, site-specific integration of the transferred DNA in 9 the recipient chromosome has yet to be demonstrated. 10 11 Conclusions 12 HUH enzymes discussed here use the same basic catalytic mechanism to perform a 13 diverse set of biological functions. They all use tyrosine residues as nucleophiles and form 14 covalent 5'-phosphotyrosine bonds with the cleaved DNA strand. The phosphotyrosine 15 bond stores energy that can be harnessed by a 3'OH located at the other end of the 16 tyrosine-linked strand or by a 3'OH from a different ssDNA strand allowing for either intra- 17 or inter-molecular strand-transfer. 18 While HUH enzymes are involved in a wide variety of biological processes in 19 nature, this is not a consequence of different catalytic reactions. It is principally a 20 consequence of the different functional modules appended to the HUH domain or recruited 21 as separate entities. The choice between RCR, conjugation or transposition is dictated by 22 the topological and temporal coordination between these various alternative functional 23 modules. 24 There remain many unanswered questions concerning this widespread class of 25 enzymes. It is still unknown whether and how these proteins might interact with the 26 replication apparatus at the fork and, even in the case of φX174, the paradigm of rolling 27 circle replication studies, no structural data is available. In addition, ISCRs represent an 28 important vector of multiple antibiotic resistance which clearly impact on public health 8. An 29 understanding of the mechanism of their acquisition and transmission of these genes is 30 clearly a principal concern. Finally, the impact of helitrons particularly in shaping plant 31 genomes, makes study of their detailed mechanism an area of priority for understanding 32 genome structure.

15 1 Figure legends 2 Figure 1. Reaction mechanism and organisation of selected HUH proteins. A) 3 Conserved polarity of ssDNA cleavage by HUH nuclease domains: the cleavage 4 mechanism . Protein residues are shown in blue together with the divalent metal ion ligand 5 (in this case Mg 2+ ) and DNA in green. Orange DNA in the right panel is the leaving group 6 that results from DNA nicking and covalent intermediate formation. B) Circular 7 permutation of HUH and Y motifs. Structural comparison of the Rep domain of AAV5 8 Rep (PDB id: 1M55) and the Relaxase domain of TrwC (PDB id: 1OMH). Both have a 5- 9 stranded β-sheet with the HUH motif located on the central strand. The core of the fold is a 10 βαββ structure with the HUH motif on the third β-strand. Due to the circular permutation, 11 this motif is in different places in the central β-sheet in Reps when compared to Relaxases. 12 The secondary stucture topology for Reps is βαββ αβαβ and for Relaxases, βαβα βαββ . 13 The N-termini of both proteins are colored in salmon. C) Organization of representative 14 HUH domain-containing proteins. The HUH domains are indicated in beige boxes: 15 helicase domains as red elipses; the oligomerization domain (OD) in magenta; and the 16 proposed Zn-binding domains in different shades of blue since they are not necessarily 17 structurally related. The length of each protein is indicated in numbers of amino acids at 18 the right. Those proteins for which HUH domain structures are available are indicated by 19 "*" . The HUH is motif 2 3, motif III 9. The Y motif is motif 3 3, motif I 9). The Helitron 20 HeliBat1 is a consensus sequence from bioinformatic prediction and the only protein which 21 does not have experimental support 101 . The assigned domain organizations are taken 22 from: ΦX174 gpA 16 ; AAV Rep78 92 ; TYLCV Rep 16 ; pMV158 RepB 16 ; R388 TrwC 21 ; 67 7 7 23 RSF1010 MobA ; TnpA IS 608 ; IS 91 TnpA ; ISCR1 (S. Messing, personal 24 communication); HeliBat1 101 . 25 Fig. 2 : Structures of various HUH enzymes with their substrate DNA. Structures are 26 as follows: A) Top. Rep AAV interaction with the 20 bp Rep binding site (RBS) (PDB 27 1RZ9) and (bottom) with the Rep binding element (RBE') of the ITR hairpin (PDB 1UUT); 28 B) TrwC interaction with a 25 nt oligonucleotide containing both the "nic" site and the 29 recognition hairpin (PDB 2CDM); C) NES (Nicking Enzyme in Staphylococcus ) interaction 30 with its oriT, represented by an oligonucleotide of 30 nts (PDB 4HT4); D) TnpA of IS 608

31 interaction with the transposon Right End (RE) (PDB 2VHG); E) TnpA REP interaction with a 32 REP sequence (PDB 4ER8 ).

16 1 Fig.3 RCR replication. Replication is shown subsequent to formation of the dsDNA 2 circular form. The (+) strand is green. The small filled circles represent the nic site and the 3 5' recognition hairpin. The Rep protein is indicated in brown with two active-site tyrosine 4 residues (1 and 2). Hairpin recognition (i) is followed by cleavage at the nic site 3' to the 5 haipin (green circle) and formation of a photphotyrosine bond with tyrosine residue #1 (ii). 6 (iii) and (iv) recruitment of the replication apparatus (beige elipse) and initiation of 7 replication (pink line and arrowhead). (v) replication reconstitutes a new nic site 8 (pink/green circle). (vi left) the active-site tyrosine #2 cleaves the new nic site to form a 9 new phosphotyrosine bond and the resulting 3' OH attacks the original phosphotyrosine 10 bond to create a single strand circular phage or plasmid copy (green) and reconstitutes the 11 replication cycle (vii left). In the case of copy number-regulated plasmid replication (vi and 12 vii right), no second phosphotyrosine bond is formed and the plasmid regains its initial 13 double strand circular form awaiting replication reinitiation. This could arise from use of a 14 second nucleophile such as water instead of a tyrosine 46 . Failure to terminate at this stage 15 would lead to multimeric single strand products. 16 Fig.4 Rolling hairpin replication of AAV. For simplicity secondary structures at the viral 17 genome ends (Inverted terminal Repeats; ITR) are shown as "T" junctions with 18 complementary base pairing. Parental DNA is green and newly synthesised DNA is pink. 19 (i) Replication is initiated using host replication functions (beige elipse) the 3'OH as a 20 primer and continues to the end (ii) duplicating the terminal structure (iii). For AAV1, five

21 Rep molecules bind to the 20nt (GCTC 5) Rep Binding Site (RBS) and contact the RBE' 22 hairpin tip (iv) provoking a conformational change in DNA at the palindromic terminal 23 resolution site and cleavage to generate the phosphotyrosine bond ( trs ) (v). Recruitment of 24 the host replication machinery then allows replication of the terminal structure (a step 25 called terminal resolution) (vi). Refolding of the ends (viii) would generate structures (viii) 26 resembling those shown in (i). 27 The insert shows a schematic organisation of the ITR with the RBS, RBE' hairpin and trs 28 site indicated. The blue dotted lines and circles indicate the interaction of these sequences 29 with the Rep protein(s). 30 Fig. 5 RCR conjugation. DNA colour schemes are as for Fig. 3. Conjugative transfer 31 occurs through a T4SS pore between the donor (left) and recipient (right) cells shown in 32 black. It is initiated by recognition of one side of a double strand inverted repeat structure 33 (insert left) proximal to a 3' nic site by the relaxase. Nucleophilic attack of the nic site 34 generates the phosphotyrosine bond (i) which is transferred with the relaxase into the

17 1 recipient (ii). Replication (pink) is initiated in the donor (iii) using host replication machinery 2 (beige ellipse) and continues until the nic site is replicated (iv) and cleaved (v) to generate 3 a single strand circular copy in the recipient (vi). Replication in the donor is generally 4 concomitant with transfer but is not essential. Replication of the transferred single strand is 5 performed by the replication machinery of the recipient cell. 6 Fig. 6 Single strand transposons. A) Genetic organisation of IS 608 , a member of the 7 IS 200 /IS 605 family. Grey arrows represent the tnpA and tnpB orfs coding for the Y1 8 transposase, TnpA, and a protein of unknown function, TnpB. LE and RE, left and right 9 ends, are shown in red and blue respectively. The potential secondary structures within 10 each end are schematised. The specific target tetranucleotide, TTAC, required for 11 integration and further transposition is incidatred in black. Note that for the other well 12 characterised IS of this family, IS Dra2 , the target is a pentanucleotide, TTGAT. B) 13 Coupling of IS 200 /IS 605 transposition to the replication fork: preferential excision 14 and integration from and into lagging strand template. Left: Excision. The lagging 15 strand template is shown in green and the IS as a looped-out elipse with the 16 tetranucleotide TTAC shown in green. The dotted pink line represents an Okazaki 17 fragment. The leading strand template is shown in blue and the newly synthesised leading 18 strand as pink. the transposon ends are shown in all cases as filled half circles of the 19 appropriate colour. The dimeric TnpA transposase is shown in brown and the catalytic 20 tyrosine residues as black sticks. Excision results in the deletion of the single-strand IS 21 copy on the lagging stand template as a ssDNA circle. Right: Insertion. The ssDNA IS 22 circle (green) then attacks the lagging strand template in the target molecule (light blue) 23 and inserts 3' to the TTAC target tetranucleotide. C) TnpA Cis/Trans conformational 24 changes during the transposition cycle. Each monomer of the IS 608 transposase TnpA 25 dimer (pale green and pale orange) is shown with its HUH motif (H) on the protein body 26 and the catalytic Tyrosine residue (Y) situated on αD-helix (dark green and dark orange 27 cylinders). Excision : i) Binding and Cleavage of LE and RE by dimeric TnpA with the 28 active sites in the trans configuration, composed of an HUH motif from one TnpA monomer 29 and a Tyrosine residue (Y) situated on αD-helix of the other. ii) Formation of circular ss 30 transposon and donor joint involving reciprocal rotation (indicated by large arrows) of the 31 two αD-helices. iii) Formation of a cis active site configuration. iv) Reset of the cis- 32 configuration to trans . Integration: v) trans configuration: cleavage of the ss transposon 33 junction and target. vi) Strand transfer by reciprocal rotation of the two αD-helices. vii) 34 Transposon integration: return to a cis active site configuration. D) Recognition of cleavage

18 1 sites (C L and C R) by complementarity with guide sequences (G L and G R): interaction of C L

2 with G L and C R with G R. The canonical (Watson and Crick) and non-canonical base 3 interactions were determined from the co-crystal structures 4 5 Boxes 6 7 Box 1 8 Helitrons 9 All HUH transposases described above are specific to bacteria and Archaea. Another 10 mobile element family, the helitrons, occurs in eukaryotic genomes. Helitrons are found in 11 high numbers in Arabidopsis thaliana , Zea maize , Caenorhabditis elegans 102 and the little 12 brown bat, Myotis lucifugus 101 where they represent ~3% of the entire genome. 13 Bioinformatic analysis suggests that they may also be mobilised by RCR transposition 102 . 14 It has been proposed that they may have evolved from integrated geminiviruses 103 15 although their helicase domains differ from the SF3 helicases generally associated with 16 geminiviruses. 17 Helitrons encode a particularly long orf with both HUH and Y2 motifs together with 18 putative N-terminal Zn-finger and C-terminal SF1-like helicase domains (Fig. 1C). Helitrons 19 have relatively conserved ends, generally composed of 5'-TC and often CTAG-3', and 20 short 16- to 20-bp subterminal hairpins at the right end but no inverted repeats. Helitrons 21 are often flanked by 5'T-A3'. A characteristic feature, their ability to sequester additional 22 genes, might result from occasional failure to terminate the rolling circle process, and 23 might also explain the occurrence of tandem multimers 101 . 24 25 26 Box 2 27 28 IS 200 /IS 605 family transposase domestication 29 30 REP sequences 31 REP sequences (Repeated Extragenic Palindromes not to be confused with the 32 Rep proteins), identified 30 years ago 104 , are inverted DNA repeats able to form hairpin 33 structures and almost exclusively located in intergenic regions of bacterial genomes 104 . 34 REPs are often present in high copy numbers (~590 copies in E. coli K12, >1600 copies in 35 Stenotrophomonas maltophilia ) grouped into pairs in structures called BIMES (Bacterial 36 Interspersed Mosaic Elements). BIMES are composed of a REP and a copy of an inverted

19 1 REP (iREP) 105 . They may play several key roles in host physiology including organizing 2 genome structure, regulating gene expression, and genome plasticity (see 31 ). 106 31 3 Recently, a group of REP-associated proteins, RAYTS also called TnpA REP ,

4 was identified and found to be closely related to IS 200 /IS 605 family transposases. tnpA REP

5 is generally present in a single copy located between flanking REP sequences. TnpA REP 6 catalyses REP cleavage and joining in vitro 31 but is less specific than IS 200 /IS 605 family 7 transposases: it cleaves at dinucleotide CT sequences located either 5' or 3' to a REP 8 hairpin.

9 TnpA REP is structurally closely related to TnpA IS608 and TnpA ISDra2 but unlike the 32 10 dimeric TnpAs, is a monomer (Fig. 2) . In the structure of TnpA REP bound to a REP 11 hairpin extended on its 5' side by 4 nts (analogous to the IS 608 /IS 200 guide sequence; 12 Fig. 6D), the double-stranded hairpin stem and ssDNA "guide sequence" is extensively

13 contacted by the protein. As in the case of TnpA ISDra2 , hairpin recognition requires a 14 mismatch in the stem, and the 5' guide sequence is important for cleavage 32 . In spite of

15 evident catalytic properties of TnpA REP and the high prevalence of REPs, only scant

16 information is available about the origin and behaviour of REPs or the role of TnpA REP in 17 vivo. Recently, bioinformatic approaches detected a BIME excision event in the 18 Pseudomonas fluorescence GW25 genome suggesting that BIMES are in fact mobile 19 elements 107 , however, there is no direct evidence for RAYT involvement in this. As

20 TnpA REP -mediated cleavage and rejoining requires only a single REP (the iREP is 31 21 refractory to TnpA REP activity ), it is intriguing that REPs are grouped in inverted pairs 22 BIMEs. This paired configuration might be linked to genome replication because there the 23 iRep sequence on the lagging strand would become a Rep on the leading strand. The 24 mechanism by which REP sequences invade and propagate within genomes remains to 25 be addressed. 26 Group I introns. 27 A final example of known TnpA-like proteins occurs in certain group I bacterial introns, the 28 IStrons. These elements rich in secondary structure carry a TnpA-like potential homing 29 endonuclease 108 similar to TnpA of IS Dra2 and all IStron copies analysed so far are found 30 to be inserted 3' to the same pentanucleotide, TTGAT, which is the target sequence used 31 also by IS Dra2 . TTGAT is complementary to the intron internal guide sequence (IGS) and 32 presumably interacts at the RNA level in the splicing reaction. There is little knowledge 33 about IStron behaviour. 34

20 1 Acknowledgements. We would like to thank C. Guynet, S. Messing and I. Molineau for 2 discussions. This work was supported by intramural CNRS (MC, BTH) and NIH (AZBH, 3 FD) funding and grants from the Agence National de Recherche (France) Mobising (ANR- 4 12-BSV8-0009-01) (BTH), Spanish Ministry of Science and Innovation BIO2010-14809 5 (G.M.), Spanish Ministry of Education (BFU2011-26608) (FdlC) and the European VII 6 Framework Program 248919/FP7-ICT-2009-4 and 282004/FP7-HEALTH.2011.2.3.1-2 7 (FdlC).

21 1 2 3 GLOSSARY 4 5 RCR: Rolling Circle Replication is a process of unidirectional DNA replication of circular 6 DNA molecules such as plasmids and bacteriophages in which a single strand product is 7 "peeled" from a circular DNA template. 8 9 IS elements : Insertion Sequences are short transposable DNA segments including one or 10 more genes involved in their mobility. 11 12 ISCR: Insertion Sequence with a Common Region. This refers to a common open reading 13 frame which shows similarity to that of the IS 91 transposase. These elements appear to be 14 able to sequester additional neighbouring genes during their transposition. 15 16 helitron: A type of eukaryotic transposon thought to transpose using a rolling-circle 17 replication mechanism similar to IS 91 . 18 19 helicase superfamilies: Helicases are classified into 6 major groups (SF1-SF6) based on 20 the motifs and consensus sequences shared by the molecules and by their activities (e.g. 21 5' to 3' or 3' to 5' directionality). These major groups constitute the helicase superfamilies. 22 23 R-loop: A region of DNA in which the two strands are separated by a short RNA segment 24 which forms a complementary RNA/DNA hybrid with one of the DNA strands. 25 26 ICE: Integrative Conjugative Elements are mobile genetic elements sharing the transfer 27 properties of conjugative plasmids but which are generally unable to replicate 28 autonomously and possess dedicated integration systems which allow them to be 29 maintained by integration into the host chromosome.

22 1 References. 2 3 1 Kornberg, A. & Baker, T. A. DNA Replication . (W.H.Freeman, 1992). 4 2 Ilyina, T. V. & Koonin, E. V. motifs in the initiator proteins for rolling 5 circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and 6 archaebacteria. Nucleic.Acids.Res. 20 , 3279-3285 (1992). 7 3 Koonin, E. V. & Ilyina, T. V. Computer-assisted dissection of rolling circle DNA 8 replication. Biosystems 30 , 241-268 (1993). 9 4 Kapitonov, V. V. & Jurka, J. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci U 10 S A 98 , 8714-8719, doi:10.1073/pnas.151269298 (2001). 11 5 Garcillan-Barcia, M. P., Bernales, I., Mendiola, M. V. & De la Cruz, F. in Mobile DNA Vol. 12 II (eds N. L. Craig, R. Craigie, M. Gellert, & A. Lambowitz) 891-904 (ASM press, 2002). 13 6 Ton-Hoang, B. et al. Transposition of ISHp608, member of an unusual family of bacterial 14 insertion sequences. Embo J 24 , 3325-3338 (2005). 15 7 Ronning, D. R. et al. Active site sharing and subterminal hairpin recognition in a new class 16 of DNA transposases. Mol Cell 20 , 143-154 (2005). 17 8 Toleman, M. A., Bennett, P. M. & Walsh, T. R. ISCR Elements: Novel Gene-Capturing 18 Systems of the 21st Century? Microbiol Mol Biol Rev 70 , 296-316 (2006). 19 9 Garcillan-Barcia, M. P., Francia, M. V. & de la Cruz, F. The diversity of conjugative 20 relaxases and its application in plasmid classification. FEMS Microbiol Rev 33 , 657-687 21 (2009). 22 10 Gruss, A. & Ehrlich, S. D. The family of highly interrelated single-stranded 23 deoxyribonucleic acid plasmids. Microbiol Rev 53 , 231-241 (1989). 24 11 Rosario, K., Duffy, S. & Breitbart, M. A field guide to eukaryotic circular single-stranded 25 DNA viruses: insights gained from metagenomics. Arch Virol 157 , 1851-1871, 26 doi:10.1007/s00705-012-1391-y (2012). 27 12 Odegrip, R. & Haggard-Ljungquist, E. The two active-site tyrosine residues of the a protein 28 play non-equivalent roles during initiation of rolling circle replication of bacteriophage p2. J 29 Mol Biol 308 , 147-163, doi:10.1006/jmbi.2001.4607 (2001). 30 13 Grindley, N. D., Whiteson, K. L. & Rice, P. A. Mechanisms of site-specific recombination. 31 Annu Rev Biochem 75 , 567-605 (2006). 32 14 Hickman, A. B. et al. DNA recognition and the precleavage state during single-stranded 33 DNA transposition in D. radiodurans. Embo J 29 , 3840-3852 (2010). 34 15 Boer, R. et al. Unveiling the molecular mechanism of a conjugative relaxase: The structure 35 of TrwC complexed with a 27-mer DNA comprising the recognition hairpin and the 36 cleavage site. J Mol Biol 358 , 857-869, doi:10.1016/j.jmb.2006.02.018 (2006). 37 16 Boer, D. R. et al. Plasmid replication initiator RepB forms a hexamer reminiscent of ring 38 helicases and has mobile nuclease domains. EMBO J 28 , 1666-1678, 39 doi:10.1038/emboj.2009.125 (2009). 40 17 Datta, S., Larkin, C. & Schildbach, J. F. Structural insights into single-stranded DNA 41 binding and cleavage by F factor TraI. Structure 11 , 1369-1379 (2003). 42 18 Larkin, C. et al. Inter- and intramolecular determinants of the specificity of single-stranded 43 DNA binding and cleavage by the F factor relaxase. Structure 13 , 1533-1544 (2005). 44 19 Edwards, J. S. et al. Molecular basis of antibiotic multiresistance transfer in Staphylococcus 45 aureus. Proc Natl Acad Sci U S A , doi:10.1073/pnas.1219701110 (2013). 46 20 Dyda, F. & Hickman, A. B. A mob of reps. Structure (Camb) 11 , 1310-1311 (2003). 47 21 Guasch, A. et al. Recognition and processing of the origin of transfer DNA by conjugative 48 relaxase TrwC. Nat Struct Biol 10 , 1002-1010, doi:10.1038/nsb1017 (2003).

23 1 22 Hickman, A. B., Ronning, D. R., Perez, Z. N., Kotin, R. M. & Dyda, F. The nuclease 2 domain of adeno-associated virus rep coordinates replication initiation using two distinct 3 DNA recognition interfaces. Mol Cell 13 , 403-414 (2004). 4 23 Clerot, D. & Bernardi, F. DNA helicase activity is associated with the replication initiator 5 protein rep of tomato yellow leaf curl geminivirus. J Virol 80 , 11322-11330, 6 doi:10.1128/JVI.00924-06 (2006). 7 24 Odegrip, R., Schoen, S., Haggard-Ljungquist, E., Park, K. & Chattoraj, D. K. The 8 interaction of bacteriophage P2 B protein with Escherichia coli DnaB helicase. J Virol 74 , 9 4057-4063 (2000). 10 25 Petit, M. A. et al. PcrA is an essential DNA helicase of Bacillus subtilis fulfilling functions 11 both in repair and rolling-circle replication. Mol Microbiol 29 , 261-273 (1998). 12 26 Bruand, C. & Ehrlich, S. D. UvrD-dependent replication of rolling-circle plasmids in 13 Escherichia coli. Mol Microbiol 35 , 204-210 (2000). 14 27 Chang, T. L. et al. Biochemical characterization of the Staphylococcus aureus PcrA helicase 15 and its role in plasmid rolling circle replication. J Biol Chem 277 , 45880-45886, 16 doi:10.1074/jbc.M207383200 (2002). 17 28 Brister, J. R. & Muzyczka, N. Rep-mediated nicking of the adeno-associated virus origin 18 requires two biochemical activities, DNA helicase activity and transesterification. J Virol 19 73 , 9325-9336 (1999). 20 29 Im, D. S. & Muzyczka, N. The AAV origin binding protein Rep68 is an ATP-dependent 21 site-specific endonuclease with DNA helicase activity. Cell 61 , 447-457 (1990). 22 30 Brister, J. R. & Muzyczka, N. Mechanism of Rep-mediated adeno-associated virus origin 23 nicking. J Virol 74 , 7762-7771 (2000). 24 31 Ton-Hoang, B. et al. Structuring the bacterial genome: Y1-transposases associated with 25 REP-BIME sequences. Nucleic Acids Res 40 , 3596-3609, doi:10.1093/nar/gkr1198 (2012). 26 32 Messing, S. A. et al. The processing of repetitive extragenic palindromes: the structure of a 27 repetitive extragenic palindrome bound to its associated nuclease. Nucleic Acids Res 40 , 28 9964-9979, doi:10.1093/nar/gks741 (2012). 29 33 Orozco, B. M. & Hanley-Bowdoin, L. A DNA structure is required for geminivirus 30 replication origin function. J Virol 70 , 148-158 (1996). 31 34 del Solar, G., Giraldo, R., Ruiz-Echevarria, M. J., Espinosa, M. & Diaz-Orejas, R. 32 Replication and control of circular bacterial plasmids. Microbiol Mol Biol Rev 62 , 434-464 33 (1998). 34 35 Bikard, D., Loot, C., Baharoglu, Z. & Mazel, D. Folded DNA in action: hairpin formation 35 and biological functions in . Microbiol Mol Biol Rev 74 , 570-588 (2011). 36 36 Larkin, C., Datta, S., Nezami, A., Dohm, J. A. & Schildbach, J. F. Crystallization and 37 preliminary X-ray characterization of the relaxase domain of F factor TraI. Acta Crystallogr 38 D Biol Crystallogr 59 , 1514-1516 (2003). 39 37 Ruiz-Maso, J. A., Lurz, R., Espinosa, M. & del Solar, G. Interactions between the RepB 40 initiator protein of plasmid pMV158 and two distant DNA regions within the origin of 41 replication. Nucleic Acids Res 35 , 1230-1244, doi:10.1093/nar/gkl1099 (2007). 42 38 Gilbert, W. & Dressler, D. DNA replication: the rolling circle model. Cold Spring Harb 43 Symp Quant Biol 33 , 473-484 (1968). 44 39 Khan, S. A. Plasmid rolling-circle replication: recent developments. Mol Microbiol 37 , 477- 45 484 (2000). 46 40 Brown, D. R. et al. DNA structures required for phi X174 A-protein-directed initiation and 47 termination of DNA replication. Cold Spring Harb Symp Quant Biol 47 Pt 2 , 701-715 48 (1983).

24 1 41 Brown, D. R., Schmidt-Glenewinkel, T., Reinberg, D. & Hurwitz, J. DNA sequences which 2 support activities of the bacteriophage phi X174 gene A protein. J Biol Chem 258 , 8402- 3 8412 (1983). 4 42 Roth, M. J., Brown, D. R. & Hurwitz, J. Analysis of bacteriophage phi X174 gene A 5 protein-mediated termination and reinitiation of phi X DNA synthesis. II. Structural 6 characterization of the covalent phi X A protein-DNA complex. Journal of Biological 7 Chemistry 259 , 10556-10568 (1984). 8 43 Hanai, R. & Wang, J. C. The mechanism of sequence-specific DNA cleavage and strand 9 transfer by phi X174 gene A* protein. J.Biol.Chem. 268 , 23830-23836 (1993). 10 44 van Mansfeld, A. D., van Teeffelen, H. A., Baas, P. D. & Jansz, H. S. Two juxtaposed 11 tyrosyl-OH groups participate in phi X174 gene A protein catalysed cleavage and ligation of 12 DNA. Nucleic Acids Res 14 , 4229-4238 (1986). 13 45 Barabas, O. et al. Mechanism of IS200/IS605 family DNA transposases: activation and 14 transposon-directed target site selection. Cell 132 , 208-220 (2008). 15 46 Noirot-Gros, M. F. & Ehrlich, S. D. Change of a catalytic reaction carried out by a DNA 16 replication protein. Science 274 , 777-780 (1996). 17 47 Novick, R. P. Contrasting lifestyles of rolling-circle phages and plasmids. Trends Biochem 18 Sci 23 , 434-438 (1998). 19 48 Tattersall, P. & Ward, D. C. Rolling hairpin model for replication of parvovirus and linear 20 chromosomal DNA. Nature 263 , 106-109 (1976). 21 49 James, J. A. et al. Crystal structure of the SF3 helicase from adeno-associated virus type 2. 22 Structure 11 , 1025-1035 (2003). 23 50 Smith, R. H. & Kotin, R. M. The Rep52 gene product of adeno-associated virus is a DNA 24 helicase with 3'-to-5' polarity. J Virol 72 , 4874-4881 (1998). 25 51 Hickman, A. B., Ronning, D. R., Kotin, R. M. & Dyda, F. Structural unity among viral 26 origin binding proteins: crystal structure of the nuclease domain of adeno-associated virus 27 Rep. Mol Cell 10 , 327-337 (2002). 28 52 King, J. A., Dubielzig, R., Grimm, D. & Kleinschmidt, J. A. DNA helicase-mediated 29 packaging of adeno-associated virus type 2 genomes into preformed capsids. EMBO J 20 , 30 3282-3291, doi:10.1093/emboj/20.12.3282 (2001). 31 53 Campos-Olivas, R., Louis, J. M., Clerot, D., Gronenborn, B. & Gronenborn, A. M. The 32 structure of a replication initiator unites diverse aspects of nucleic acid metabolism. Proc 33 Natl Acad Sci U S A 99 , 10310-10315, doi:10.1073/pnas.152342699 (2002). 34 54 Vega-Rocha, S., Gronenborn, B., Gronenborn, A. M. & Campos-Olivas, R. Solution 35 structure of the endonuclease domain from the master replication initiator protein of the 36 nanovirus faba bean necrotic yellows virus and comparison with the corresponding 37 geminivirus and circovirus structures. Biochemistry 46 , 6201-6212, doi:10.1021/bi700159q 38 (2007). 39 55 Vega-Rocha, S., Byeon, I. J., Gronenborn, B., Gronenborn, A. M. & Campos-Olivas, R. 40 Solution structure, divalent metal and DNA binding of the endonuclease domain from the 41 replication initiation protein from porcine circovirus 2. J Mol Biol 367 , 473-487, 42 doi:10.1016/j.jmb.2007.01.002 (2007). 43 56 Oke, M. et al. A dimeric Rep protein initiates replication of a linear archaeal virus genome: 44 implications for the Rep mechanism and viral replication. J Virol 85 , 925-931, 45 doi:10.1128/JVI.01467-10 (2011). 46 57 Noirot-Gros, M. F., Bidnenko, V. & Ehrlich, S. D. Active site of the replication protein of 47 the rolling circle plasmid pC194. EMBO J 13 , 4412-4420 (1994). 48 58 Rasooly, A. & Rasooly, R. S. How rolling circle plasmids control their copy number. Trends 49 Microbiol 5, 440-446, doi:10.1016/S0966-842X(97)01143-8 (1997).

25 1 59 Laufs, J. et al. Geminivirus replication: genetic and biochemical characterization of Rep 2 protein function, a review. Biochimie 77 , 765-773 (1995). 3 60 Lederberg, J. & Tatum, E. L. Sex in bacteria; genetic studies, 1945-1952. Science 118 , 169- 4 175 (1953). 5 61 de la Cruz, F., Frost, L. S., Meyer, R. J. & Zechner, E. L. Conjugative DNA metabolism in 6 Gram-negative bacteria. FEMS Microbiol Rev 34 , 18-40 (2010). 7 62 Guglielmini, J., Quintais, L., Garcillan-Barcia, M. P., de la Cruz, F. & Rocha, E. P. The 8 repertoire of ICE in prokaryotes underscores the unity, diversity, and ubiquity of 9 conjugation. PLoS Genet 7, e1002222, doi:10.1371/journal.pgen.1002222 (2011). 10 63 Rocco, J. M. & Churchward, G. The integrase of the conjugative transposon Tn916 directs 11 strand- and sequence-specific cleavage of the origin of conjugal transfer, oriT, by the 12 endonuclease Orf20. J Bacteriol 188 , 2207-2213 (2006). 13 64 Draper, O., Cesar, C. E., Machon, C., de la Cruz, F. & Llosa, M. Site-specific recombinase 14 and integrase activities of a conjugative relaxase in recipient cells. Proc Natl Acad Sci U S A 15 102 , 16385-16390, doi:10.1073/pnas.0506081102 (2005). 16 65 Kingsman, A. & Willetts, N. The requirements for conjugal DNA synthesis in the donor 17 strain during flac transfer. J Mol Biol 122 , 287-300 (1978). 18 66 Nash, R. P., Habibi, S., Cheng, Y., Lujan, S. A. & Redinbo, M. R. The mechanism and 19 control of DNA transfer by the conjugative relaxase of resistance plasmid pCU1. Nucleic 20 Acids Res 38 , 5929-5943, doi:10.1093/nar/gkq303 (2010). 21 67 Monzingo, A. F., Ozburn, A., Xia, S., Meyer, R. J. & Robertus, J. D. The structure of the 22 minimal relaxase domain of MobA at 2.1 A resolution. J Mol Biol 366 , 165-178, 23 doi:10.1016/j.jmb.2006.11.031 (2007). 24 68 Larkin, C., Haft, R. J. F., Harley, M. J., Traxler, B. & Schildbach, J. F. Roles of Active Site 25 Residues and the HUH Motif of the F Plasmid TraI Relaxase 10.1074/jbc.M703210200. J. 26 Biol. Chem. 282 , 33707-33713 (2007). 27 69 Lucas, M. et al. Relaxase DNA binding and cleavage are two distinguishable steps in 28 conjugative DNA processing that involve different sequence elements of the nic site. J Biol 29 Chem 285 , 8918-8926, doi:10.1074/jbc.M109.057539 (2010). 30 70 Kim, Y. J., Lin, L. S. & Meyer, R. J. Two domains at the origin are required for replication 31 and maintenance of broad-host-range plasmid R1162. J Bacteriol 169 , 5870-5872 (1987). 32 71 Gonzalez-Perez, B. et al. Analysis of DNA processing reactions in bacterial conjugation by 33 using suicide oligonucleotides. EMBO J 26, 3847-3857, doi:10.1038/sj.emboj.7601806 34 (2007). 35 72 Dostal, L., Shao, S. & Schildbach, J. F. Tracking F plasmid TraI relaxase processing 36 reactions provides insight into F plasmid transfer. Nucleic Acids Res 39 , 2658-2670, 37 doi:10.1093/nar/gkq1137 (2011). 38 73 Meyer, R. Replication and conjugative mobilization of broad host-range IncQ plasmids. 39 Plasmid 62 , 57-70, doi:10.1016/j.plasmid.2009.05.001 (2009). 40 74 Geibel, S., Banchenko, S., Engel, M., Lanka, E. & Saenger, W. Structure and function of 41 primase RepB' encoded by broad-host-range plasmid RSF1010 that replicates exclusively in 42 leading-strand mode. Proc Natl Acad Sci U S A 106 , 7810-7815, 43 doi:10.1073/pnas.0902910106 (2009). 44 75 Mendiola, M. V., Bernales, I. & de la Cruz, F. Differential roles of the transposon termini in 45 IS91 transposition. Proc.Natl.Acad.Sci.U.S.A. 91 , 1922-1926 (1994). 46 76 Lam, S. & Roth, J. R. IS200: a Salmonella-specific insertion sequence. Cell 34 , 951-960 47 (1983). 48 77 Kersulyte, D. et al. ISHp608 of Helicobacter pylori: nonrandom 49 geographic distribution, functional organization, and insertion specificity. J.Bacteriol. 184 , 50 992-1002 (2002).

26 1 78 Islam, S. M. et al. Characterization and distribution of IS8301 in the radioresistant 2 bacterium Deinococcus radiodurans. Genes Genet Syst 78 , 319-327 (2003). 3 79 Ton-Hoang, B. et al. Single-stranded DNA transposition is coupled to host replication. Cell 4 142 , 398-408 (2010). 5 80 Pasternak, C. et al. Irradiation-induced Deinococcus radiodurans genome fragmentation 6 triggers transposition of a single resident insertion sequence. PLoS Genet 6, e1000799 7 (2010). 8 81 Zahradka, K. et al. Reassembly of shattered chromosomes in Deinococcus radiodurans. 9 Nature 443 , 569-573 (2006). 10 82 He, S. et al. Reconstitution of a functional IS608 single-strand transpososome: role of non- 11 canonical base pairing. Nucleic Acids Res 39 , 8503-8512 (2011). 12 83 Guynet, C. et al. Resetting the site: redirecting integration of an insertion sequence in a 13 predictable way. Mol Cell 34 , 612-619 (2009). 14 84 Garcillan-Barcia, M. P. & de la Cruz, F. Distribution of IS91 family insertion sequences in 15 bacterial genomes: evolutionary implications. FEMS Microbiol Ecol 42 , 303-313, 16 doi:10.1111/j.1574-6941.2002.tb01020.x (2002). 17 85 Pilar Garcillan-Barcia, M., Bernales, I., Mendiola, M. V. & De La, C. F. Single-stranded 18 DNA intermediates in IS91 rolling-circle transposition. Mol.Microbiol. 39 , 494-502 (2001). 19 86 Sinzelle, L., Izsvak, Z. & Ivics, Z. Molecular domestication of transposable elements: from 20 detrimental parasites to useful host genes. Cell Mol Life Sci 66 , 1073-1093 (2009). 21 87 Vega-Rocha, S., Gronenborn, A. M., Gronenborn, B. & Campos-Olivas, R. 1H, 13C, and 22 15N NMR assignment of the master Rep protein nuclease domain from the nanovirus 23 FBNYV. J Biomol NMR 38 , 169, doi:10.1007/s10858-006-9085-y (2007). 24 88 Varsaki, A., Lucas, M., Afendra, A. S., Drainas, C. & de la Cruz, F. Genetic and 25 biochemical characterization of MbeA, the relaxase involved in plasmid ColE1 conjugative 26 mobilization. Mol Microbiol 48 , 481-493 (2003). 27 89 Burd, C. G. & Dreyfuss, G. Conserved structures and diversity of functions of RNA-binding 28 proteins. Science 265 , 615-621 (1994). 29 90 Scott, J. F., Eisenberg, S., Bertsch, L. L. & Kornberg, A. A mechanism of duplex DNA 30 replication revealed by enzymatic studies of phage phi X174: catalytic strand separation in 31 advance of replication. Proc Natl Acad Sci U S A 74 , 193-197 (1977). 32 91 Kotin, R. M. et al. Site-specific integration by adeno-associated virus. Proc Natl Acad Sci U 33 S A 87 , 2211-2215 (1990). 34 92 Smith, R. H., Spano, A. J. & Kotin, R. M. The Rep78 gene product of adeno-associated 35 virus (AAV) self-associates to form a hexameric complex in the presence of AAV ori 36 sequences. J Virol 71 , 4461-4471 (1997). 37 93 Mansilla-Soto, J. et al. DNA structure modulates the oligomerization properties of the AAV 38 initiator protein Rep68. PLoS Pathog 5, e1000513, doi:10.1371/journal.ppat.1000513 39 (2009). 40 94 McCarty, D. M., Young, S. M., Jr. & Samulski, R. J. Integration of adeno-associated virus 41 (AAV) and recombinant AAV vectors. Annu Rev Genet 38 , 819-845, 42 doi:10.1146/annurev.genet.37.110801.143717 (2004). 43 95 Cesar, C. E., Machon, C., de la Cruz, F. & Llosa, M. A new domain of conjugative relaxase 44 TrwC responsible for efficient oriT-specific recombination on minimal target sequences. 45 Mol Microbiol 62 , 984-996 (2006). 46 96 Cesar, C. E. & Llosa, M. TrwC-Mediated Site-Specific Recombination Is Controlled by 47 Host Factors Altering Local DNA Topology 10.1128/JB.01152-07. J. Bacteriol. 189 , 9037- 48 9043 (2007).

27 1 97 Agundez, L., Gonzalez-Prieto, C., Machon, C. & Llosa, M. Site-specific integration of 2 foreign DNA into minimal bacterial and human target sequences mediated by a conjugative 3 relaxase. PLoS ONE 7, e31047, doi:10.1371/journal.pone.0031047 (2012). 4 98 Schröder, G., Schuelein, R., Quebatte, M. & Dehio, C. Conjugative DNA transfer into 5 human cells by the VirB/VirD4 type IV secretion system of the bacterial pathogen 6 Bartonella henselae. Proceedings of the National Academy of Sciences 108 , 14643-14648, 7 doi:10.1073/pnas.1019074108 (2011). 8 99 Fernandez-Gonzalez, E. et al. Transfer of R388 derivatives by a pathogenesis-associated 9 type IV secretion system into both bacteria and human cells. J Bacteriol 193 , 6257-6265, 10 doi:10.1128/JB.05905-11 (2011). 11 100 Agundez, L. et al. Nuclear targeting of a bacterial integrase that mediates site-specific 12 recombination between bacterial and human target sequences. Appl Environ Microbiol 77 , 13 201-210, doi:10.1128/AEM.01371-10 (2011). 14 101 Pritham, E. J. & Feschotte, C. Massive amplification of rolling-circle transposons in the 15 lineage of the bat Myotis lucifugus. Proc Natl Acad Sci U S A 104 , 1895-1900, 16 doi:10.1073/pnas.0609601104 (2007). 17 102 Kapitonov, V. V. & Jurka, J. Helitrons on a roll: eukaryotic rolling-circle transposons. 18 Trends Genet 23 , 521-529, doi:10.1016/j.tig.2007.08.004 (2007). 19 103 Feschotte, C. & Wessler, S. R. Treasures in the attic: rolling circle transposons discovered in 20 eukaryotic genomes. Proc Natl Acad Sci U S A 98 , 8923-8924, doi:10.1073/pnas.171326198 21 (2001). 22 104 Higgins, C. F., Ames, G. F., Barnes, W. M., Clement, J. M. & Hofnung, M. A novel 23 intercistronic regulatory element of prokaryotic operons. Nature 298 , 760-762 (1982). 24 105 Bachellier, S., Clément, J.-M. & Hofnung, M. Short palindromic repetitive DNA elements 25 in enterobacteria: a survey. Research in Microbiology 150 , 627-639 (1999). 26 106 Nunvar, J., Huckova, T. & Licha, I. Identification and characterization of repetitive 27 extragenic palindromes (REP)-associated tyrosine transposases: implications for REP 28 and dynamics in bacterial genomes. BMC Genomics 11 , 44 (2010). 29 107 Bertels, F. & Rainey, P. B. Within-Genome Evolution of REPINs: a New Family of 30 Miniature Mobile DNA in Bacteria. PLoS Genet 7, e1002132 (2011). 31 108 Braun, V. et al. A chimeric ribozyme in clostridium difficile combines features of group I 32 introns and insertion elements. Mol Microbiol 36 , 1447-1459 (2000). 33

28 N

A B Rep Mob C

N

C N R388 TrwC AAV5 Rep

C Protein class Example Domain Organization Rep ˓X174 gp A all ˞ HUH Y2 all ˞ -513

AAV Rep78* HUH Y2 SF3 helicase Zn -621

TYLCV Rep* HUH Y1 ˞ SF3 helicase -359

pMV158 RepB* HUH Y1 OD -212

Mob/relaxase R388 TrwC* Y2 HUH SF1 helicase -966

MobA RSF1010 Y1 HUH primase -709

Transposase TnpAIS608* HUH Y1 -157

IS91 TnpA Zn HUH Y2 -426

ISCR ISCR1 Zn HUH Y1 -513

Helitron HeliBat1 Zn HUH Y2 SF1 helicase ~1500

1 2

i)

2

ii)

2

iii)

2

iv)

2

v)

Phage Plasmid

2 2

vi)

1 1 2 vii) RBE' hairpin 3'OH

ITR

RBS trs 20nt 7nt

i) 3'OH 5'

ii)

RBS

iii) 3'OH 5' trs

3'OH iv) 5' trs

v) 3'OH 5'

trs

3'OH vi) 5'

vii) 3'OH 5'

5' 3'OH viii)

3'OH 5' nic

nic nic

i)

ii)

iii)

iv)

v)

vi) A

TTAC TCAA LE tnpA tnpB RE

B Excision Insertion

TTAC TTAC 2 1

C HuH Y Trans vii) HuH i) Y

Y HuH HuH Y Cis

Cleavage

Y Y

HuH vi) ii) HuH

Strand HuH HuH Transfer Y

Y Strand Transfer

iii) v) Cleavage

Cis Y Trans HuH HuH Y

Y HuH

Y HuH Y

iv) HuH

Cis to Trans HuH

Y

D

Y Supplementary Table 1

group protein Ys HUH Biochem Structure

(element)

phages GpA (φX174) Y343 yes Y343 and Y347 involved (Van Mansfeld et al., 1986) NO Y347 Flip-flop mechanism (Hanai and Wang, 1993)

A (P2) Y454 yes Initiation (Y454) and termination (Y450) Tyr are NO Y450 different (Odegrip and Haggård-Ljungquist, 2001)

GpII (M13) Y197 no No stable covalent complex (Asano et al., 1999) NO

SIRV1 Rep Y108 yes Stable covalent complex (OKE et al., 2011) 2X3G (Oke et al., 2011) dimer

REPs viruses Rep (AAV5) Y156 yes Site-specific endonuclease activity (Im and Muzyczka, 1M55 (Hickman et al., 2002) 1990) 1RZ9 (Hickman et al., 2004)

Rep (TYLCV) Y103 yes Cleavage and joining domain (Heyraud-Nitschke et al., 1L2V (Campos-Olivas et al., 2002) 1995)

Rep (FBNYV) Y79 HUQ In vitro endonuclease activity (Vega-Rocha et al., 2HWT (Vega-Rocha et al., 2007b) 2007b)

Rep (PCV) Y96 HUQ In vitro endonuclease activity (Vega-Rocha et al., 2HW0 (Vega-Rocha et al., 2007a) 2007a)

plasmids RepB Y99 Yes Chiral phosphorothioate (Moscoso et al., 1997) 3DKY (Boer et al., 2009) pMV158 Supplementary Table 1

RepA pC194 Y214 yes Termination hydrolysis by Glu (Noirot-Gros et al., NO E210 1994) Flip-flop with E210Y (Noirot-Gros and Ehrlich, 1996)

RepC pT181 Y191 No Short oligonuclotide inactivation (Rasooly and Novick, NO 1993)

MOBF TrwC (R388) Y18 yes Initiation and termination Tyr (Grandoso et al., 2000) 1OMH (Guasch et al., 2003) Y26 Intramolecular reaction (Gonzalez-Perez et al., 2007) TrwC transfer to the recipient (Draper et al., 2005) Termination reaction in the recipient (Garcillán-Barcia et al., 2007)

TraI (F) Y16 Two contiguous catalytic Tyr (Street et al., 2003) 1P4D, (Datta et al., 2003) Y17 Differential metal affinity (Larkin et al., 2005)

TraI (pCU1) Y18-19 yes Tyrosine partners (Nash et al., 2011) 3L57 (Nash et al., 2010) Y26-27

MOBs MOBQ MobA Y25 yes In vitro SC and SS DNA cleavage (Scherzinger et al., 2NS6 (Monzingo et al., 2007) (RSF1010) 1992)

MOBP TraI (RP4) Y22 Yes Three motifs (Pansegrau et al., 1994) NO

MbeA (ColE1) Y19 HEN Divergent HUH motif (Varsaki et al., 2003) NO

MOBV MobM ? yes In vitro relaxase activity (Guzmán and Espinosa, 1997) NO (pMV158) Supplementary Table 1

MOBC MobC ? No No covalent complex (Núñez and De La Cruz, 2001) NO (CloDF13)

MOBH TraI Y93 HHH Covalent intermediate (Salgado-Pabón et al., 2007) NO

(Neisseria Y201 chromosome)

Tpases TnpA IS608 Y127 HUH Covalent intermediate (Ton-Hoang et al., 2005) (Ronning et al., 2005)

(H. pylori) (Barabas et al., ) dimer

TnpA ISDra2 (D. HUH Covalent intermediate (Hickman et al., 2010) (Hickman et al., 2010) dimer radiodurans)

IS..2 ? HUH

REP TnpA E.coli HUH Covalent intermediate (Ton Hoang et al., 2012) (Messing et al., 2012) monomer REP

Tpase IS91 HUH NO

Tpase ISCR HUH NO

Supplementary Table 1

REFERENCES

Asano, S., Higashitani, A., and Horiuchi, K. (1999). Filamentous phage replication initiator protein gpII forms a covalent complex with the 5′ end of the nick it introduced. Nucl. Acids Res. 27, 1882–1889.

Boer, D.R., Ruíz-Masó, J.A., López-Blanco, J.R., Blanco, A.G., Vives-Llàcer, M., Chacón, P., Usón, I., Gomis-Rüth, F.X., Espinosa, M., Llorca, O., et al. (2009). Plasmid replication initiator RepB forms a hexamer reminiscent of ring helicases and has mobile nuclease domains. EMBO J. 28, 1666–1678.

Campos-Olivas, R., Louis, J.M., Clerot, D., Gronenborn, B., and Gronenborn, A.M. (2002). The structure of a replication initiator unites diverse aspects of nucleic acid metabolism. Proc. Natl. Acad. Sci. U.S.A. 99, 10310–10315.

Datta, S., Larkin, C., and Schildbach, J.F. (2003). Structural insights into single-stranded DNA binding and cleavage by F factor TraI. Structure 11, 1369–1379.

Draper, O., César, C.E., Machón, C., De la Cruz, F., and Llosa, M. (2005). Site-specific recombinase and integrase activities of a conjugative relaxase in recipient cells. Proc. Natl. Acad. Sci. U.S.A. 102, 16385–16390.

Garcillán-Barcia, M.P., Jurado, P., González-Pérez, B., Moncalián, G., Fernández, L.A., and De la Cruz, F. (2007). Conjugative transfer can be inhibited by blocking relaxase activity within recipient cells with intrabodies. Mol. Microbiol. 63, 404–416.

Gonzalez-Perez, B., Lucas, M., Cooke, L.A., Vyle, J.S., De la Cruz, F., and Moncalián, G. (2007). Analysis of DNA processing reactions in bacterial conjugation by using suicide oligonucleotides. EMBO J. 26, 3847–3857.

Grandoso, G., Avila, P., Cayón, A., Hernando, M.A., Llosa, M., and De la Cruz, F. (2000). Two active-site tyrosyl residues of protein TrwC act sequentially at the origin of transfer during plasmid R388 conjugation. J. Mol. Biol. 295, 1163–1172.

Guasch, A., Lucas, M., Moncalián, G., Cabezas, M., Pérez-Luque, R., Gomis-Rüth, F.X., De la Cruz, F., and Coll, M. (2003). Recognition and processing of the origin of transfer DNA by conjugative relaxase TrwC. Nat. Struct. Biol. 10, 1002–1010.

Guzmán, L.M., and Espinosa, M. (1997). The mobilization protein, MobM, of the streptococcal plasmid pMV158 specifically cleaves supercoiled DNA at the plasmid oriT. J. Mol. Biol. 266, 688–702. Supplementary Table 1

Hanai, R., and Wang, J.C. (1993). The mechanism of sequence-specific DNA cleavage and strand transfer by phi X174 gene A* protein. J. Biol. Chem. 268, 23830–23836.

Heyraud-Nitschke, F., Schumacher, S., Laufs, J., Schaefer, S., Schell, J., and Gronenborn, B. (1995). Determination of the origin cleavage and joining domain of geminivirus Rep proteins. Nucleic Acids Res. 23, 910–916.

Hickman, A.B., Ronning, D.R., Kotin, R.M., and Dyda, F. (2002). Structural unity among viral origin binding proteins: crystal structure of the nuclease domain of adeno-associated virus Rep. Mol. Cell 10, 327–337.

Hickman, A.B., Ronning, D.R., Perez, Z.N., Kotin, R.M., and Dyda, F. (2004). The nuclease domain of adeno-associated virus rep coordinates replication initiation using two distinct DNA recognition interfaces. Mol. Cell 13, 403–414.

Im, D.S., and Muzyczka, N. (1990). The AAV origin binding protein Rep68 is an ATP-dependent site-specific endonuclease with DNA helicase activity. Cell 61, 447–457.

Larkin, C., Datta, S., Harley, M.J., Anderson, B.J., Ebie, A., Hargreaves, V., and Schildbach, J.F. (2005). Inter- and Intramolecular Determinants of the Specificity of Single-Stranded DNA Binding and Cleavage by the F Factor Relaxase. Structure 13, 1533–1544.

Van Mansfeld, A.D., Van Teeffelen, H.A., Baas, P.D., and Jansz, H.S. (1986). Two juxtaposed tyrosyl-OH groups participate in phi X174 gene A protein catalysed cleavage and ligation of DNA. Nucleic Acids Res. 14, 4229–4238.

Monzingo, A.F., Ozburn, A., Xia, S., Meyer, R.J., and Robertus, J.D. (2007). The Structure of the Minimal Relaxase Domain of MobA at 2.1 Å Resolution. J Mol Biol 366, 165–178.

Moscoso, M., Eritja, R., and Espinosa, M. (1997). Initiation of replication of plasmid pMV158: mechanisms of DNA strand-transfer reactions mediated by the initiator RepB protein. J. Mol. Biol. 268, 840–856.

Nash, R.P., Habibi, S., Cheng, Y., Lujan, S.A., and Redinbo, M.R. (2010). The mechanism and control of DNA transfer by the conjugative relaxase of resistance plasmid pCU1. Nucleic Acids Res. 38, 5929–5943.

Nash, R.P., Niblock, F.C., and Redinbo, M.R. (2011). Tyrosine partners coordinate DNA nicking by the Salmonella typhimurium plasmid pCU1 relaxase enzyme. FEBS Lett. 585, 1216–1222. Supplementary Table 1

Noirot-Gros, M.F., Bidnenko, V., and Ehrlich, S.D. (1994). Active site of the replication protein of the rolling circle plasmid pC194. EMBO J. 13, 4412–4420.

Noirot-Gros, M.F., and Ehrlich, S.D. (1996). Change of a catalytic reaction carried out by a DNA replication protein. Science 274, 777–780.

Núñez, B., and De La Cruz, F. (2001). Two atypical mobilization proteins are involved in plasmid CloDF13 relaxation. Molecular Microbiology 39, 1088–1099.

Odegrip, R., and Haggård-Ljungquist, E. (2001). The two active-site tyrosine residues of the a protein play non-equivalent roles during initiation of rolling circle replication of bacteriophage p2. J. Mol. Biol. 308, 147–163.

Pansegrau, W., Schröder, W., and Lanka, E. (1994). Concerted action of three distinct domains in the DNA cleaving-joining reaction catalyzed by relaxase (TraI) of conjugative plasmid RP4. J. Biol. Chem. 269, 2782–2789.

Rasooly, A., and Novick, R.P. (1993). Replication-specific inactivation of the pT181 plasmid initiator protein. Science 262, 1048–1050.

Salgado-Pabón, W., Jain, S., Turner, N., Van der Does, C., and Dillard, J.P. (2007). A novel relaxase homologue is involved in chromosomal DNA processing for type IV secretion in Neisseria gonorrhoeae. Mol Microbiol 66, 930–947.

Scherzinger, E., Lurz, R., Otto, S., and Dobrinski, B. (1992). In vitro cleavage of double- and single-stranded DNA by plasmid RSF1010-encoded mobilization proteins. Nucleic Acids Res. 20, 41–48.

Street, L.M., Harley, M.J., Stern, J.C., Larkin, C., Williams, S.L., Miller, D.L., Dohm, J.A., Rodgers, M.E., and Schildbach, J.F. (2003). Subdomain organization and catalytic residues of the F factor TraI relaxase domain. Biochimica Et Biophysica Acta (BBA) - Proteins and Proteomics 1646, 86–99.

Varsaki, A., Lucas, M., Afendra, A.S., Drainas, C., and De la Cruz, F. (2003). Genetic and biochemical characterization of MbeA, the relaxase involved in plasmid ColE1 conjugative mobilization. Mol. Microbiol. 48, 481–493.

Vega-Rocha, S., Byeon, I.-J.L., Gronenborn, B., Gronenborn, A.M., and Campos-Olivas, R. (2007a). Solution Structure, Divalent Metal and DNA Binding of the Endonuclease Domain from the Replication Initiation Protein from Porcine Circovirus 2. Journal of Molecular Biology 367, 473–487.

Vega-Rocha, S., Gronenborn, B., Gronenborn, A.M., and Campos-Olivas, R. (2007b). Solution Structure of the Endonuclease Domain from the Master Replication Initiator Protein of the Nanovirus Faba Bean Necrotic Yellows Virus and Comparison with the corresponding Geminivirus and Circovirus Structures. Biochemistry 46, 6201–6212.