Vol. 48 No. 4/2001

935–967

QUARTERLY

Review

Understanding the evolution of restriction-modification sys- tems: Clues from sequence and structure comparisons.

½ Janusz M. Bujnicki

Bioinformatics Laboratory, International Institute of Molecular and Cell Biology, ks. Trojdena 4, 02-109 Warszawa, Poland; BioInfoBank Institute, Limanowskiego 24A, 60-744 Poznañ, Poland Received: 24 September, 2001; accepted: 3 December, 2001

Key words: methyltransferases, , protein structure, molecular evolution, bioinformatics

Restriction-modification (RM) systems comprise two opposing enzymatic activities: a restriction , that targets specific DNA sequences and performs endonucleolytic cleavage, and a modification methyltransferase that renders these se- quences resistant to cleavage. Studies on molecular genetics and biochemistry of RM systems have been carried out over the past four decades, laying foundations for mod- ern and providing important models for mechanisms of highly spe- cific protein–DNA interactions. Although the number of known, relevant sequences 3D structures of RM proteins is growing steadily, we do not fully understand their functional diversities from an evolutionary perspective and we are not yet able to en- gineer new sequence specificities based on rational approaches. Recent findings on the evolution of RM systems and on their structures and mechanisms of action have led to a picture in which conserved modules with defined function are shared between different RM proteins and other enzymes involved in nucleic acid biochemistry. On the other hand, it has been realized that some of the modules have been replaced in the evolution by unrelated domains exerting similar function. The aim of this review is to give a survey on the recent progress in the field of structural phylogeny of RM en- zymes with special emphasis on studies of sequence–structure–function relationships and emerging potential applications in biotechnology.

. The author’s research on RM enzymes is supported by the State Committee for Scientific Research (KBN, Poland) grant 6P04 B00519. ½ tel: (48 22) 668 5384; fax: (48 22) 668 5288; e-mail [email protected] Abbreviations: RM, restriction-modification; MTase, methyltransferase; ENase, endonuclease; M/HsdM, R/HsdR, S/HsdS, protein subunits responsible for: modification, restriction and sequence recognition; TRD, target recognition domain; m6A, N6-methyladenine; m4C, N4-methylcytosine; m5C, 5 C5-methylcytosine; hm C, C5-hydroxymethylcytosine; AdoMet, S-adenosyl-L-methionine. 936 J.M. Bujnicki 2001

Restriction-modification (RM) systems oc- the scope of this review to fully cover all re- cur exclusively in unicellular organisms and search articles on the biochemistry and genet- their viruses. They comprise opposing intra- ics of RM systems; instead I will focus on the cellular enzyme activities: DNA endo- recent studies of their sequences and struc- deoxyribonuclease (ENase), that recognizes tures and rather recommend several excellent and cleaves its target site, and a DNA reviews that can provide a complementary methyltransferase (MTase), that transfers viewpoint to that presented in this article methyl group from S-adenosyl-L-methionine [6–10]. (AdoMet) onto specific nucleobases within the target, thereby protecting it from the action of the ENase. Methylation occurs either at ade- CLASSIFICATION OF RM SYSTEMS nine or cytosine, yielding N6-methyladenine (m6A), N4-methylcytosine (m4C) or C5-me- RM systems were subdivided into three ba- thylcytosine (m5C). In symmetrical se- sic types (I, II, and III) based on the number quences, the same base is methylated on both and organization of subunits, regulation of strands. The methyl groups lie in the major their expression, cofactor requirements, enzy- groove of the DNA helix, in positions that do matic mechanism, and sequence specificity not interfere with base-pairing, but that [1]. However, further types and subtypes have change the “epigenetic” information content been proposed as new, distinct RM systems of DNA. For instance, methylation of only one have been discovered. The biochemical prop- strand of the target (hemimethylation) is usu- erties of the “novel” systems are intermediate ally sufficient to prevent cleavage by sterically to those of the “old” ones, and their most strik- hindering binding of the ENase to the target. ing feature is that they seem to combine pro- This guarantees that after DNA replication tein domains originated from the “old” sys- hemimethylated daughter duplexes eventu- tems in unprecedented structural contexts. ally become fully re-methylated rather than Recently, a novel nomenclature for the “sub- being cleaved [1]. types” has been proposed at the “DNA En- RM systems were originally suggested to zymes: Structures & Mechanisms” conference evolve as a defense mechanism against phage in Bangalore (December 2000) [11]. infection and other types of DNA invasion [2], and serve evolutionary purposes by producing Type I RM systems -size fragments of foreign DNA to be inte- grated into the host chromosome via recombi- Type I are the most complex systems — they nation [3]. There are also cases of seemingly comprise three subunits: S for sequence rec- quite typical RM proteins known, which are ognition, M for modification and R for involved in quite sophisticated physiological restriction (reviewed in refs. [7, 8, 12]). The S processes, such as regulating competence for and M subunits form a DNA: m6A MTase DNA uptake [4]. Moreover, some DNA repair (with a stoichiometry of M2S1), which recog- systems in Eubacteria can be regarded as de- nizes and modifies DNA within the specific se- scendants of RM systems or vice versa. Pres- quence and exhibiting a strong preference for ently there is a large body of evidence that hemimethylated DNA, which is quite unusual many RM systems are highly mobile elements among prokaryotic MTases. The complex of involved in various genome rearrangements, all three subunits (R2M2S1) becomes a potent and that many of them exhibit ”selfish” behav- [13]. A schematic diagram ior, regardless of potential benefits for the showing the complex architecture of type I host they may confer (reviewed in ref. [5]). RM protein is presented in Fig. 1. If the type I With the abundance of literature it is beyond ENase encounters unmodified target, it Vol. 48 Restriction-modification systems 937

considerable distances from the unmethyl- ated target sequences, they have so far failed to provide useful analytical reagents for mod- ern molecular biology.

Type III RM systems Type III systems were initially grouped to- gether with type I systems as one family of ATP-dependent restriction enzymes [17]. However, once it was recognized that they comprised only two subunits (termed M or Mod for modification and R or Res for restric- tion), their recognition sites were only 5–6 bp long and not bipartite, and they cleaved at about 25 bp downstream of the recognition se- quence, they were classified as a novel type [18]. Nevertheless, they are mechanistically similar to type I enzymes: the M subunit alone acts as a MTase and in a complex with the R subunit elicits ATP-dependent DNA trans- Figure 1. Schematic organization of typical type I location and cleavage [19]. A schematic dia- RM enzymes, exemplified by EcoKI [12]. gram showing the domain architecture of type a) The M (HsdM) subunit comprising a single MTase III RM proteins is shown in Fig. 2. AdoMet is module with N- and C- terminal extensions, b) the S required for methylation, but also for the effi- (HsdS) subunit that exhibits circular pseudosymmetry, cient cleavage [20]. Type III ENases do not di- comprising variable TRDs and conserved spacer do- gest the substrate completely, leaving some mains, c) the R (HsdR) subunit comprising modules fraction of sites always uncut. Another pecu implicated in DNA cleavage, DNA translocation and - liarity of type III systems is that they binding to the M2S complex, d) proposed architecture of the M2R2S complex recognizing its bipartite target methylate only one strand of the target, which using two TRDs, generating DNA loops and cleaving leads to generation of unmethylated targets DNA at a distance. For the sake of clarity, only one after each round of chromosome replication. M2R2S complex is shown, although dimerization is nec- However, it has been found that cleavage by essary for DNA translocation and cleavage to occur type III enzymes requires two copies of the [16], and the aspect of other possible interactions be- tween the domains cleavage is ignored. target sequence in a head to head orientation. In contrast only one sequence copy is needed for methylation to occur, which promotes re-methylation rather than degradation of the dimerizes rapidly [14] and initiates an unmethylated strand [21]. It has been re- ATP-dependent translocation of DNA towards cently shown that type III enzymes exhibit itself simultaneously from both directions R2M2 stoichiometry and that two such com- [15]. This process causes the extrusion or con- plexes cooperate in double stranded (ds) DNA traction of DNA loops and results in extensive cleavage on the 3¢ side of either recognition supercoiling of DNA. Cleavage is elicited at site [22]. Interestingly, the top strand is cut by variable distance from the recognition se- the ENase proximal to the cleavage site, while quence once translocation stalls [16]. Since the bottom strand is cut by the distal ENase in type I systems cleave DNA nonspecifically at the collision complex. 938 J.M. Bujnicki 2001

MTases have been intensively studied from the structure–function perspective — they are the only RM proteins for which crystal struc- tures have been solved to date (September 2001: atomic coordinates for 12 ENases and 7 MTases are available; see Table 1).

Table 1. Structurally characterized ENases and MTases

ENase PDB MTase PDB BamHI 2bam M.DpnM* 2dpm BglI 1dmu M.HaeIII 2dct BglII 1d2i M.HhaI 5mht BsoBI 1dc1 M.MboII* 1g60 Cfr10I 1cfr M.PvuII* 1boo EcoRI 1ckq M.RsrI* 1eg2 EcoRV 1az0 M.TaqI 1g38 FokI 1fok MunI 1d02 NaeI 1iaw NgoMIV 1fiu Figure 2. Schematic organization of typical type III RM enzymes, exemplified by EcoPI [22]. PvuII 1pvi a) The M (Mod) subunit comprising a MTase module The most representative entry from the Protein Data Bank with the TRD localized within an insert, b) the R (Res) (PDB) [23] (http://www.rcsb.org) has been chosen for each enzyme, with the preference for protein-DNA complexes and subunit comprising modules implicated in DNA cleav- structures solved at possibly highest resolution. * indicates age and DNA translocation, c) proposed architecture of the enzymes for which protein-DNA cocrystal structures are the M2R2 complex comprising two enzymes bound to not available. sites in a head to head orientation. For the sake of clar- ity only one R and one M subunit in each complex inter- acts with the DNA and possible contacts between ele- Type II MTases (Fig. 3a) are the most di- ments other than the ENase domains are ignored. verse — though DNA: m6A MTases are com- mon to all major types of RM systems, so far all Prokaryotic m4C and m5C-generating en- zymes were classified as bona fide type II, with Type II RM systems only few exceptions among “solitary” en- zymes believed to be very closely related to Type II systems are the simplest and most type II MTases. Type II m5C MTases became a abundant of the RM systems, with MTase and paradigm for nucleic acid enzymes that in- ENase activity exerted by two distinct en- duce ”flipping” of the target base into the cata- zymes encoded by gene pairs. The archetypal lytic pocket [24, 25]. They also served as a (“orthodox”) type II enzymes recognize short model for the studies on mechanism of palindromic sequences 4 to 8 bp in length and AdoMet-dependent methylation of nucleic ac- methylate or cleave within or immediately ad- ids [26–28] and helped to understand the jacent to the recognition sequence, however mode of action of different types of DNA numerous exceptions to that rule have been MTases from Prokaryota [6] and Eukaryota identified (see below). Type II ENases and [29]. They usually function as monomers that Vol. 48 Restriction-modification systems 939

(like EcoRI [32] or BamHI [33]), or 3¢-over- hangs (like BglI [34]). Over time, several sub- types of type II RM enzymes with distinct properties have been identified (shown in Figs. 3c–e, 4, 5 and 6).

Figure 4. Type IIS RM enzymes. a) the MTase component comprises two type II-like Figure 3. Type II RM enzymes MTase domains fused within a single polypeptide or two separate enzymes (a dotted line shows the pres- a) The “standalone” MTase comprising a MTase mod- ence of a possible linker sequence) or a single MTase ule with the TRD localized within an insert or fused to able to methylate different sequences on both strands its C-terminus, b) the orthodox type II ENase of the target, b) the type IIS ENase homodimer bound homodimer, c) the type IIT heterodimer, d) the type IIE to two targets [43] and generating a ds break at a fixed homodimer that uses two pairs of distinct domains for distance in respect to one of the sites (compare with binding two identical sequences, e) the type IIF Fig. 2c). homotetramer that cleaves two sites in a concerted re- catalyze methylation of the specific base in Type IIT restriction endonucleases are com- both strands of the palindromic target in two posed of two different subunits (Fig. 3c). For separate reactions. instance, Bpu10I is a heterodimer that recog- Type II ENases, owing to their outstanding nizes an asymmetric sequence (it probably sequence specificity, became an indispensable evolved from an orthodox homodimeric type tool in recombinant DNA technology, with ap- II enzyme in which two subunits diverged, or plications in both basic science and molecular is a hybrid of two related type II systems) [35]. medicine. They have been also used as a On the other hand, BslI is a heterotetrameric a b model system for studying aspects of specific enzyme ( 2 2) that recognizes a palindromic protein–DNA interactions and mechanisms sequence [36]. of Mg2+-dependent phosphodiester hydrolysis Type IIE ENases, like EcoRII or NaeI are (which, ironically, have not yet been estab- allosterically activated by binding of a second lished for any RM enzyme) [10]. The orthodox recognition sequence and therefore require type II ENases are homodimers (Fig. 3b) that two recognition sites for cleavage (Fig. 3d). cleave DNA in two strands producing a They have two separate binding sites for the 5¢-phosphate and a 3¢-OH end; depending on identical “target” and “effector” DNA se- orientation of the two subunits in respect to quences [37, 38]. each other and to the recognized sequence, Type IIF enzymes, like type IIE require they can produce blunt ends like EcoRV [30] binding of two identical sequences for cleav- or PvuII [31] or sticky ends with 5¢-overhangs age, however they cleave them both in a con- 940 J.M. Bujnicki 2001 certed reaction (Fig. 3e). Those proteins char- in the case of NgoBVIII [49], or one fusion pro- acterized to date are tetrameric, for example tein with two MTase domains with distinct NgoMIV [39] and SfiI [40]. specificities, like in the case of FokI (GGATG Type IIS ENases cut at a fixed distance near and CATCC), [50]. Another possibility is to their short, asymmetric target site [41]. This employ a MTase, which recognizes a degener- makes them similar to type III enzymes, but ated sequence and is able to methylate both type IIS ENases do not require ATP or strands, like it has been suggested for AdoMet or the presence of the MTase subunit GASTC-specific (S=G or C) M.BstNBI (unpub- for cleavage. They exist as monomers, with lished data cited in ref. [48]) or for the hypo- the DNA recognition and cleavage functions thetical SSATSS-specific ancestor of the located on distinct domains (Fig. 4); however C-terminal MTase domain of M.FokI [50]. a dimerization of cleavage domains from two Type IIG (formerly type IV) RM systems DNA-bound complexes is obligatory for ds are composed of two MTases, of which one DNA cleavage, as demonstrated for FokI [42, modifies both strands of the asymmetric sub- 43]. Since the TRDs of type IIS ENases effec- strate, while the other modifies only one tively interact with two sites, of which only strand, but in addition exhibits also the ENase one is cut is a single catalytic event, they can activity (Fig. 5), cutting the target 16/14 bp in be regarded as a subclass of type IIE enzymes. Because of the unusual bipartite structure, type IIS ENases have proven particularly use- ful in creating chimeric enzymes by attaching the nonspecific cleavage domain to the DNA-binding domain of transcription factors [44–46]. The enzyme N.BstNBI related to type IIS ENases has been characterized as a “nicking” Figure 5. Schematic organization of type IIG RM ENase, which cleaves only on the top strand 4 enzymes. bp away from its recognition sequence [47]. a) the type II-like MTase, b) the ENase/MTase subunit, Interestingly, it has been shown that its close whose mechanism of interaction with the target or the homologs, MlyI and PleI introduce nicks prior possible multimerization mode is unknown, but may be to ds cleavage, which presumably occurs only related to that of type III and type IIS ENases (Figs. 2c after the ENases dimerize [48]. Hence, it has and 4b) been suggested that the peculiar limited bot- tom strand cleavage activity of N.BstNBI re- 3¢ direction from the recognition site [51]. sults from the inability of its cleavage domain Some type IIG enzymes exhibit peculiar bio- to dimerize. These results suggest that type chemical properties that make them similar to IIS enzymes exert ds DNA cleavage in a simi- type III enzymes (see below): For instance lar manner to type III enzymes, i.e. the top Eco57I cleaves the substrate only partially strand is cut by the ENase bound to the target and is stimulated by AdoMet [51], while for sequence proximal to the cleavage site, while BseMII AdoMet is essential for cleavage [52]. the bottom strand is cut by the distal ENase On the other hand, cleavage at a fixed dis- (Figs. 3, 4). tance from the target resembles both type IIS Type IIS MTases must methylate an asym- and type III enzymes. Hence, type IIG en- metric target, hence this kind of RM systems zymes were suggested to be the evolutionary comprises two MTases specific for each link between type III and type IIS systems, strand, which may methylate different bases, however this hypothesis has never been sup- like adenine (GGTGA) and cytosine (TCACC) ported by a genuine phylogenetic study [53]. Vol. 48 Restriction-modification systems 941

Type IIB (formerly type V or “BcgI-like”) likely that its TRD maps to the C-terminus RM systems encode both ENase and MTase [56]. activities within one polypeptide chain, simi- Generally, many type IIB enzymes exhibit larly to the type IIG bifunctional various peculiarities, which may be or may be ENase/MTase, but with the ability to modify not specific to other proteins of this class. For both strands of the symmetric, bipartite tar- instance, HaeIV was shown to release an get sequence [54]. The pattern of cleavage, asymmetric fragment after cleavage [56] and which makes them distinct from other types, BcgI requires two bipartite target sites for results from unprecedented combination of cleavage [57] similarly to the enzymes of types previously known features: all type IIB en- I, IIE, IIF, and III. It is tempting to speculate zymes cleave DNA on both sides of their bind- that type IIB enzymes are a compact variant ing site (like type I ENases) at a fixed distance of type I enzymes that lack the DNA trans- (like type IIs, IIG and III), resulting in exci- locase module, but may show the same mecha- sion of a short DNA fragment (Fig. 6). Some nism of DNA binding and cleavage on both sides of the target (compare Figs. 1 and 6). To my best knowledge, interactions between a pair of the ENase domains, each cleaving one strand of the double strand target, has been shown only for the orthodox type II and related “standalone” ENases (types IIT, IIE and IIF) and for the ENase modules of type IIS and type III RM enzymes. It is tempting to speculate that other RM enzymes, including type I, type IIG and type IIB ENases also re- quire a dimer of ENase domains to exert cleav- age as opposed to a single domain that would introduce two nicks in both strands of the tar- Figure 6. Schematic organization of an archetypal type IIB RM enzyme BcgI [54]. get, thereby making a ds break. If this hypoth- esis is corroborated by experiment, it would a) The ENase/MTase subunit, b) the S subunit, c) pro- be interesting to learn if in those complex en- posed architecture of the (MR)2S complex of the BcgI RM system that cleaves DNA at a limited distance at zymes that possess two ENase domains, the both sides of its bipartite type-I like target (compare catalytically competent dimers are formed in with Fig. 1d). The aspect of dimerization required for cis (i.e. by the ENase domains of a multi- the bilateral cleavage is ignored for clarity and because protein complex bound to the same target) or it is unclear if and how the four ENase domains of the in trans (i.e. by the ENase domains that belong [(MR)2S]2 complex cooperate during the cleavage. to different proteins, as in the case of type IIS enzymes). Remarkably, different in trans con- of them, like BcgI, require a separate subunit figurations can be envisaged for proteins with (S) to bind to DNA and recognize the target, more than two ENase domains in the catalytic but others, like CjeI [55] and HaeIV [56] seem unit [22]. to exert all three functions with one chain. Some type II RM enzymes recognize The S subunit of the BcgI RM system is re- lengthy, discontinuous sites, such as SfiI lated to the type I S subunits, while in CjeI the (GGCCNNNNNGGCC), BglI (GCCNNNNNG- S subunit is fused to the C-terminus of the GC) or XcmI (CCANNNNNNNNNTGG), but ENase/MTase subunit. In the HaeIV RM sys- most likely they acquired this functional pecu- tem, no region homologous to the typical S liarity independently in the evolution [58] and subunits has been identified to date, but it is they have not been classified as a separate 942 J.M. Bujnicki 2001 type or subtype. There have been several ex- containing methylated or hydroxymethylated cellent reviews articles in the last decade fo- cytosine (m4C, m5Corhm5C, respectively), cusing on various aspects of type II ENases [9, unless it is glucosylated as in wild type T-even 10, 59–61], however only recently experimen- coliphages [65, 66]. Together with the E. coli tal and computational studies on their se- Mrr enzyme, which targets modified adenine quences and structures provided new data or cytosine in a poorly defined sequence con- and interpretations, considerably broadening text [67] and Streptococcus pneumoniae DpnI our view on these enzymes and their relation- ENases [68] they make up a separate type of ship to other protein families (see the para- modification-directed restriction (MDR) en- graph devoted to the ENase domain within zymes. Another unusual enzyme of this class the subsequent section of this paper). is PvuRts1I, which restricts DNA containing hm5C, even when it is glucosylated. A MTase-like gene has been found near RM systems of other types PvuRts1I, but neither its activity as a modifi- There are also some RM systems that do not cation MTase nor influence on the fit into any of these classes — they likely repre- PvuRts1I-mediated restriction could be dem- sent genuine hybrids of ”regular” types, which onstrated [69]. The MDR enzymes can be arose by fusions of their separated compo- thought of as free-standing predecessors of nents, but so far no robust phylogenetic study RM system components or as that has been undertaken to infer the pathways of abandoned RM systems (for instance follow- their evolution. For example it has been also ing the ”death” of their cognate MTase) to be- suggested that type II ENases may couple come “ENases on the loose”. Alternatively, with type I MTases with a cognate sequence the MDR systems may be seen as products of specificity, giving rise to the chimerical “type the “arms race” between bacteria developing 1 I& /2” systems (G.G. Wilson, cited as per- new defensive weapons against T-even phages sonal communication in ref. [6]). On the other and the viruses protecting their DNA using in- hand, the LlaI system consists of four pro- creasingly more complex modifications (re- teins, one of which is a fusion of two type viewed in ref. [70]). II-like m6A MTases, a typical IIS MTase simi- Another class of sequence-specific nu- lar to FokI (see above) [62] and the other three cleases, whose relationships with restriction are remotely related to the McrBC enzymes were not known until very recently, (see below). There are also RM systems com- are the so called “homing” ENases (reviewed prised of multiple ENases and MTases; in sev- in refs. [71, 72]). A large number of these en- eral such cases, like DpnII [63] or BcnI [64], zymes has been identified in Eukaryotic nu- one of the two MTases of the same specificity clear and organellar , but there are also may also methylate single stranded DNA. a few, which have been found in Prokaryota and their phages. They function in dissemina- tion of certain mobile introns and inteins by Solitary ENases cleavage of long, asymmetric, and degenerate Paradoxically, the first restriction enzymes sequences. Creation of recombinogenic ends described were McrA (RglA) and McrBC promotes gene conversion, which leads to du- (RglB) from E. coli, which do not form a part plication of the intron. Homing ENases and of a RM system since they do not associate some freestanding intergenic ENases, which functionally with any particular MTase and share functional properties and sequence sim- their ENase activity is not inhibited by ilarities, can be grouped into three families of methylation of the target. Conversely, they presumably independent evolutionary origin specifically recognize and cleave sequences (LAGLIDADG, HNH, and GIY-YIG) [73]. In Vol. 48 Restriction-modification systems 943 this review I will refer only to the structural Some phages carry MTases with Dam-like data on members of HNH and GIY-YIG fami- specificity, but it is unclear whether they have lies, which are relevant to the evolutionary regulatory functions or serve to counteract re- studies on genuine restriction enzymes. striction enzymes with cognate specificities [6]. An intriguing group of “antirestriction” MTases has been identified in several Bacillus Solitary MTases subtilis phages — these enzymes can each rec- Another group of enzymes related to RM en- ognize and m5C-methylate several different zymes are DNA MTases not associated with targets, which are also targets for RM systems restriction enzymes. They are generally of the host. Based on the analysis of the thought to be involved in gene regulation, multispecific MTases carried out by chromosome replication, and DNA repair, Trautner’s group a modular model of MTase though only few enzymes of this category are organization has been proposed, in which characterized in enough detail to justify un- specificity of the core enzyme was achieved by equivocal definition of their physiological a combination with a variety of sequence-spe- function. The best studied examples is the cific modules [81, 82]. GATC-specific Dam (DNA m6A MTase) of E. coli and related g-Proteobacteria, which has STRUCTURAL AND FUCTIONAL been implicated in numerous regulatory pro- DOMAINS OF RM SYSTEMS cesses including control of expression of viru- lence determinants, and in methyl-directed Dryden [6] suggested that the MTase com- mismatch repair (reviewed in ref. [74]). The posed of the target-recognizing domain (TRD; mismatch-specific MutHSL excision appara- see next section), catalytic subdomain and tus uses Dam methylation to distinguish be- AdoMet-binding subdomain can be thought of tween the parental and daughter strands after as the structural core of a typical RM system. chromosome replication. Nevertheless, Dam In this respect, the RM system is made up by is not essential for viability [75]. The association of the MTase with a DNA cleavage GANTC-specific m6A MTase CcrM is an es- (ENase) module and in some cases a DNA sential enzyme involved in cell-cycle control of translocase module. Thus, all polypeptide sub- Caulobacter [76]. Another well-studied “soli- units either exert their activity in a protein tary” MTase is the CCWGG-specific Dcm complex containing MTase, which interacts (DNA m5C MTase) of E. coli, whose function with the target DNA sequence via its TRD, or however still remains a mystery [77]. Mis- they have functional autonomy owing to a sep- matches resulting from spontaneous arate TRD analog. For instance the ENase deamination of m5C to U are repaired by the module can exist as a separate protein com- so called very short patch (VSP) system, prising one or more structural domains (type which includes the C(T:G or U:G mis- II systems, Figs. 3b–e, 4), or as a fusion with match)WGG-specific single-strand nicking the DNA translocase module (type I and III, ENase Vsr [78]. Interestingly, both the Figs. 1, 2) or with the MTase module (type IIG Dam-associated nicking ENase MutH and the and IIB, Figs. 5, 6). The orthodox type II Dcm-associated Vsr are evolutionarily related ENases developed their own target-reco- to genuine restriction enzymes [79, 80]. gnizing elements, functioning either as a Other MTases not associated with bona fide clearly distinguishable TRD or an ensemble of restriction enzymes are specified by viral loops protruding from the catalytic interface. genomes or conjugative , and serve On the other hand, the multifunctional R sub- to self-protect the invasive DNA from restric- units of type I and type III RM systems exert tion endonucleases when it enters a new host. their function of DNA translocase/ENase 944 J.M. Bujnicki 2001 only when complexed with the MTase. In type gions is recognized by an independent TRD I R subunits a special domain responsible for (reviewed in ref. [91]). Most of the S subunits establishing protein–protein contacts has carry two separable TRDs, each approxi- been identified in the C-terminus [12] (Fig. 1); mately 150 aa in length, within a single poly- to my knowledge, such domain has not been peptide. It has been proposed that the TRDs delineated to date in primary structures of and the “conserved” domains in the S sub- type III R subunits. The apparent modular ar- units have a circular organization (Fig. 1) pro- chitecture of all enzyme types suggested that viding the symmetry for their interaction with shuffling of a quite limited repertoire of mod- the other subunits and with the bipartite, ules and domains conferring particular func- asymmetric DNA target [92]. However, a nat- tions is the main force driving their functional urally or artificially truncated S subunit com- diversification (Figs. 1–6). prising a single TRD and a set of conserved motifs can function as a dimer, specifying the bipartite, symmetric DNA target, suggesting The target recognition domain (TRD) that the present day S subunits are the result Target recognition domains have been oper- of a gene duplication [93]. The conserved re- ationally defined as regions responsible for se- gions can be thought of as a scaffold upon quence-specific binding of RM proteins to the which TRDs are mounted, allowing them to be target DNA. They have been initially (and swapped among type I RM systems to gener- most clearly) defined for mono- and ate new specificities. Indeed, natural combi- multi-specific m5C MTases [81] and the S sub- natorial variation of the S subunits and the units of type I RM systems [83], in which they half-subunits in certain type I RM systems are long, variable sequences, surrounded by have been reported [91, 94–96]. well conserved motifs. In the multi-specific By analogy, the large variable regions found m5C MTases from several bacteriophages of in most m4C and m6A MTases were also pre- Bacillus subtilis, certain mutations in the vari- dicted to function as TRDs [97]. X-Ray crystal- able region can abolish one target specificity lographic studies of the m5C MTases M.HhaI while leaving the others intact. By mapping [98] and M.HaeIII [99], m6A MTases M.TaqI the mutations and studying the specificity of [100], M.DpnM [101] and M.RsrI [102], and chimeric proteins, Trautner and coworkers m4C MTase M.PvuII [28] demonstrated that determined that each target sequence is rec- the TRDs of all these proteins (excepting the ognized by its own TRD and defined its mini- pair of m5C MTases) are structurally dissimi- mal size as approximately 40 amino acids. lar (Fig. 7). It is not clear if these similar TRDs Nevertheless, they failed to generate enzymes result from independent gene fusion events or with novel specificities by shuffling of gene evolutionary convergence. Based on structure fragments except for instances where entire prediction and random mutagenesis, Dryden TRDs were exchanged [81, 84–87]. TRD swap- and coworkers suggested that the TRDs of ping has also been successfully applied to al- type I enzymes may be similar to the TRDs of ter the DNA sequence specificity of mono- m5C MTases [103, 104]. Nevertheless, it is un- specific m5C MTases from Bacteria and clear to what degree the “alternative” TRDs Eukaryota [88, 89], in agreement with the con- are conserved in individual MTase sub- clusion of a recent phylogenetic study focused families and if there are novel types of TRD on the m5C MTase family ([90], J.M. Bujnicki, yet to be discovered. For instance, sequence unpublished). analysis demonstrated that certain mono- In type I RM systems, which recognize two specific MTases possess several variable re- short defined regions separated by a non-spe- gions, which may share the function of a spa- cific spacer of fixed length, each of these re- tially-discontinuous TRD [97, 105]. Some Vol. 48 Restriction-modification systems 945

Figure 7. Cartoon diagrams of four structurally characterized DNA MTases depicting similarities be- tween their catalytic domains and differences between their TRDs. The core of the consensus MTase fold, recognizable by the 7-stranded b-sheet, is in the same relative orientation in all four images. a) The m5C MTase M.HhaI co-crystalized with its target DNA (PDB coordinate file 5mht [109]), the TRD is “behind” the DNA, b) the g-m6A MTase M.TaqI (1g38 [110]) co-crystalized with its target DNA, the C-terminal TRD is on the left hand side, c) the a-m6A MTase M.DpnM (2dpm [101]) manually docked to its target, the TRD (localized within an insert in the catalytic domain) is on the right hand side, d) the b-m4C MTase M.PvuII (1boo [28]) manually docked to its target DNA, the proposed TRD (localized within an insert in the catalytic domain that maps to the upper left hand side of the image) is disordered in the crystal of the DNA-free form and therefore not shown. small MTases seem TRD-less, and it has been from their cognate MTases, may recognize suggested that their specificity determinants target sequences using either an autonomous reside within the short loops protruding from TRD fused to the catalytic domain, an ensem- the catalytic face of the catalytic domain [106, ble of elongated loops projected from the cata- 107]. Moreover, even the typical TRD-con- lytic domain or combination of both (reviewed taining enzyme M.EcoRV (and presumably its in ref. [10]). Generally, the first strategy is numerous homologs) has recruited residues characteristic for type IIS enzymes that cleave from at least two loops in the catalytic domain at a distance and the latter two strategies for to make specific protein–DNA contacts [108]. most other type II enzymes. For instance, In addition, it is not known how the series of X-ray crystallography demonstrated that type TRDs are arranged in the multispecific m5C IIS FokI endonuclease comprises a non-spe- MTases, or how these complex enzymes inter- cific cleavage domain and a large, compact act with their multiple targets. TRD composed of three subdomains resem- ENases also have to achieve sequence speci- bling helix-turn-helix domains [111, 112]. Sim- ficity. In the type I systems, the ENase speci- ilar bipartite architecture, albeit comprising ficity is provided by the same S subunit that is structurally dissimilar TRDs and catalytic do- used by the MTase. Type II ENases, which in- mains, has been predicted from computa- teract with their DNA targets independently tional sequence analysis for the type IIS en- 946 J.M. Bujnicki 2001 zymes BfiI [113] and MboII [114], and for homing nucleases from the GIY-YIG super- family [115]. It should be stressed that identi- fication of potential TRDs in sequences of re- striction enzymes is particularly difficult, since unlike in MTases the catalytic domains of ENases contain no obviously conserved se- quence motifs, which renders the simplistic criterion of sequence variability inadequate. Moreover, the key functions of type II restric- tion enzymes, i.e. multimerization, se- quence-specific DNA binding and cleavage are interwoven such that some regions and resi- dues are crucial for more than one aspect of the ENase function [10]. Figure 8. Conserved fold and variable topology of the common MTase domain. The MTase domain a) The “circularized” topology diagram with triangles The MTase domain, which transfers the representing b-strands, circles representing a- and methyl group from AdoMet onto the target 310-helices, and connecting lines representing loops; base, is the only truly conserved domain the thick lines correspond to the loops at the catalytic among RM systems; that is, representatives of face of the protein that harbor residues that take part in binding and catalysis. Circled Roman numerals rep- only one of several unrelated protein families resent nine motifs, the key motifs I and IV shown in known to catalyze this kind of reaction have bold and underlined. Arrows show the topological been identified in the context of RM systems breakpoints (N/C for generation of N- and C-termini) (reviewed in ref. [116]). Other enzymes, which and sites of TRD insertion characteristic for the indi- generate different modifications to inhibit re- vidual classes of MTases. b) The linear organization of six classes of amino-MTases (-) postulated in ref. [97] striction, are evolutionarily unrelated and 5 and m C MTases (the prevailing archetypal topology structurally dissimilar, including the only en- labeled as m5C, and the two underrepresented classes zyme that generates a chemically similar and DRM2). The AdoMet-binding region is shown as a product, the tetrahydrofolate-dependent cyto- solid arrow, the catalytic region is shown as a striped sine-C5 hydroxymetyltransferase of T-even arrow. Conserved motifs are labeled accordingly. coliphages [117]. The conserved ”MTase fold” is characterized by an a/b domain with a cen- tral seven-stranded b-sheet sandwiched be- tide-binding . This observation has tween two layers of a-helices (Figs. 7, 8a). It led to the suggestion that the ancestral MTase strongly resembles the architecture of the du- arose after gene duplication converted an plicated Rossmann-fold, with the only excep- AdoMet-binding protein into a protein that tion of a characteristic b-hairpin, involving bound two molecules of AdoMet and that the strands 6 and 7, which is absent from two halves then diverged [119]. An alternative Rossmann-fold proteins [118]. All DNA hypothesis has been put forward that various MTase structures exhibit very similar fold, MTases could have originated independently with only minor variations of orientation and from Rossmann-fold proteins [101]. Sup- number of peripheral secondary structural el- porting this view, a subsequent phylogenetic ements. The approximate two-fold pseudo study using both atomic coordinates and cor- symmetry reflects the structural similarity of responding amino-acid sequences suggested the AdoMet binding site to the target nucleo- that MTases exhibiting the “typical fold” origi- Vol. 48 Restriction-modification systems 947 nated from one common Rossmann-fold an- adjacent motif IX (present only in m5C cestor [118]. MTases) was recognized as the TRD, sug- Based on the methylated nucleotide that is gested to be acting as an autonomous struc- generated, DNA MTases can be divided into tural and functional domain [6, 97, 126]. That three different groups: m6A, m4C, and m5C alignment has been validated and its details MTases. m6A and m4C MTases methylate the refined by comparison with crystal structures exocyclic amino group of the nucleobase and of m6A MTases TaqI [100], DpnM [101], and are collectively termed “amino-MTases”, RsrI [102] and m4C MTase PvuII [28]. while m5C MTases methylate the C-5 atom of According to the possible linear arrange- cytosine. It has been suggested that m4C and ments of the AdoMet-binding subdomain, the m6A MTases are more closely related to each active site subdomain, and the variable region other than to m5C MTases [97]. Remarkably, assumed to function as a TRD, the certain m6A MTases display cryptic m4Cac- amino-MTases were subdivided into 6 classes: tivity on mismatched cytosines [120] and a, b, g, d, e and z [97] (Fig. 8). The majority of some m4C MTases may methylate mis- known DNA amino-MTases fall into the a, b, matched adenine [121]. Moreover, experimen- and g classes, with no bona fide g-m4C MTases tal and bioinformatics studies suggested that discovered yet. M.NgoMXV and its homolog m4C-specific enzymes may have evolved inde- M.LmoA118I are the only experimentally pendently multiple times from m6A MTases, characterized m4C MTases relatively closely although no consensus has been reached re- similar to g-m6A MTases, however they lack a garding the evolutionary pathways leading to well-defined TRD [106, 127]. Similarly, se- the present-day distribution of specificities quence analysis and structure prediction for a [105, 106, 120, 122]. Recently, it has been small group of viral g-like Dam MTases indi- shown that a change of the target base speci- cated that due to the lack of TRD they cannot ficity from m6A to m4C is possible with only a be put into any of the proposed classes [107, few amino acid substitutions. In an elegant ex- 128]. Besides, we have identified two families periment Roth and Jeltsch reduced the size of of enzymes closely related to DNA amino- the target base binding pocket of M.EcoRV by MTases, namely 16S rRNA: guanine-N2 site-directed mutagenesis, generating an en- MTases and the HemK family of putative nu- zyme variant that no longer methylated ade- cleic acid MTases that possess a large variable nine and whose activity towards mismatched region at the N-terminus, and therefore cytosine was reduced only 17-fold [108, 123]. should be classified as putative members of Nevertheless, such variant was not able to the z class [129, 130]. It has been also found methylate cytosine if it was base-paired with that the m4C MTase M.MwoI exhibits the d ar- guanine, suggesting that additional mutations chitecture [131], rather than previously pro- are needed to change the base flipping mecha- posed b [97]. Nearly all m5C MTases differ nism of amino-MTase. from the group g MTases only in the position Amino-acid sequence alignments of MTases of motif X, corresponding to a helix packing revealed 9 relatively weakly conserved motifs against the central beta-sheet next to motif I: and a variable region, localized differently in in m5C MTases it is as the C-terminus, while distinct families [124, 125] (Fig. 8b). Based on in g MTases it is in N-terminus. Nevertheless, the results of X-ray crystallography of m5C two exceptions to this rule have been identi- MTase HhaI [98] and on structure-based mul- fied: the M.BssHII MTase, which is a typical tiple sequence alignment, motifs IV–VIII member of the z class with the TRD at the were assigned to the active-site subdomain, N-terminus followed by the conserved motifs motifs X and I-III to the AdoMet-binding IX, X, I–VIII [132], and a family of putative de subdomain, and the variable region with the novo DNA MTases from Arabidopsis and 948 J.M. Bujnicki 2001 maize (DRM2), that contain a MTase module are quite distant in space [100]. Still, this with a unique arrangement of motifs: scheme may be valid for enzymes, which have VI–VIII–TRD–IX–X, I–V [133] (Fig. 8). not been identified yet, or whose sequences Based on careful sequence analysis and molec- have not been studied in enough detail. None- ular modeling it has been proposed that the theless, I believe that in most cases permuta- atypical architecture of M.BssHII is not a re- tion of m4C and m6A MTases occurred via sult of a simple gene permutation event, but intragenic relocations of gene segments (i.e. rather a series of recombination events be- “domain shuffling” [137]), which left no evi- tween of fragments of genes coding for up to dent intermediates or fusions and rearrange- three different m5C MTases [134]. ments of gene fragments [105], rather than Lately, models of circular permutation dur- solely according to the “duplicate and get rid ing evolution of m4C [105] and m6A MTases of redundant termini” scheme. However, to [135] have been proposed. Jeltsch argued that my knowledge, no systematic study has been the domain permutation process needs dupli- published, which would infer the evolutionary cation of a MTase gene, producing one en- history of shuffled fragments of MTase do- zyme with two catalytic domains. For in- mains in enzymes other than M.BssHII [134]. stance, after formation of new start and stop codons in a hypothetical tandem gg-class ENase domain MTase, a z-orb-like permutant would arise. This model corresponds to the widely ac- ENase exerts the second key activity of the cepted concept that a permuted protein may RM system and therefore could be predicted arise naturally from tandem repeats by ex- to exhibit the degree of conservation at least traction of the C-terminal portion of one re- similar to that of the MTase counterpart. peat together with the N-terminal portion of However, among numerous ENase sequences the subsequent repeat, if the protein’s N and known there are only a few that exhibit statis- C termini are in close spatial proximity [136]. tically significant similarity. The lack of se- Although the idea itself offers a plausible ex- quence conservation has led to speculation planation for the origin of permutants within that despite common features, such as a re- many protein families, the only duplicated quirement for Mg2+ and outstanding se- m6A MTases known to date are the type IIS quence specificity, most ENases may be unre- enzymes of the aa-class, whose permutation lated to one another [138]. Initially, the only would eventually produce enzymes of the d or similarities were detected between type II e classes that have not been identified to date. izoschizomers, enzymes with identical cleav- M.MwoI, the only plausible candidate for the age specificity, which may be regarded as di- d class known to date, is closely related to b rect descendants of one ancestor, transferred MTases, and its putative TRD seems to have horizontally to different hosts [59, 139]. Nev- “jumped” from the position in the middle of ertheless, X-ray crystallographic studies of 13 the protein to the C-terminus without convinc- seemingly dissimilar type II ENases demon- ing evidence for duplication of the entire strated unequivocally that they share a com- MTase gene (Ref. [131] and J.M. Bujnicki and mon structural core and metal-binding/cata- M. Radlinska, unpublished data). In my opin- lytic site, arguing for extreme divergence ion, simple interconversions of topologies rather than independent evolution of a similar from gg to b or from bb to g are rather implau- fair-sized domain (for the most recent reviews sible, since the TRDs of known MTases from b see [10, 38, 61, 140]). This domain, termed and g classes are unrelated [100, 102]. More- ”PD-(D/E)XK” for a very weakly conserved over, the N- and C-termini of M.TaqI, the only signature of the active site, turned out to be g-m6A MTase whose 3D structure is known, common to other nucleases, including phage Vol. 48 Restriction-modification systems 949 exonuclease [141], two Archaeal Holliday the RNase A-like active site in a distinct loca- junction resolvases Hjc [142, 143], phage T7 tion [153]. It is tempting to speculate that Endonuclease I [144], transposase TnsA [145] EndA may be related to a ”common ancestor” and two enzymes exerting ssDNA nicking in of the PD-(D/E)XK superfamily (Fig. 9), how- the context of methyl-directed and very short ever this hypothesis must await a thorough patch DNA repair: MutH [79] and Vsr [80]. It structure-based phylogenetic study with is particularly interesting that MutH and Vsr atomic coordinates of more ancient nucleases are genetically linked with DNA MTases Dam available. and Dcm, respectively. Since the sequences of Ironically, following the series of crystallo- structurally characterized PD-(D/E)XK cleav- graphic studies suggesting common origin of age domains seemed too divergent for ”regu- all ENase domains in restriction enzymes and lar” phylogenetic analysis, a structure-based related DNA repair and recombination en- treeing has been carried out in a similar man- zymes, bioinformatics studies provided evi- ner to that performed for MTase domains dence that some bona fide type II ENases are [140]. From this and other structure-based in fact diverged members of other well-studied comparative studies it can be concluded that nuclease superfamilies, unrelated to the the PD-(D/E)XK superfamily can be divided PD-(D/E)XK enzymes (Fig. 10). It has been into two lineages, roughly corresponding to found that the N-terminal part of the type IIS “5¢ four-base overhang cutters” like EcoRI or restriction enzyme exhibits low sequence sim- BamHI that interacts with the target DNA ilarity to an EDTA-resistant nuclease (Nuc) of predominantly via an a-helix and a loop and Salmonella typhimurium, and the relationship the “blunt end cutters” like PvuII and EcoRV of these nuclease domains has been confirmed that use a b-strand for DNA recognition [38]. experimentally [113]. We have also identified A hypothetical evolutionary scenario of evolu- the Nuc-like domain in type II restriction en- tion of the two main ENase lineages based on zymes NgoFVII, NgoAVII, and CglI (J.M. comparison of publicly available crystal struc- Bujnicki, M. Radliñska, V. Siksnys, unpub- tures is shown in Fig. 9. lished data). Another evolutionarily unrelated Recently, despite limitations resulting from nuclease domain, similar to the catalytic do- extreme divergence of the PD-(D/E)XK do- main of nucleases from the HNH superfamily, main, state-of-the-art algorithms for sequence has been identified in the m5C-specific restric- comparisons and structure prediction allowed tion enzyme McrA, type II restriction en- to identify it in a variety of other genuine and zymes HpyI, NlaIII, SphI, SapI, NspHI, NspI putative nucleases, including the (m6Aor and KpnI, and in type IIS enzyme MboII and m5C)-specific restriction enzyme Mrr and its its homologs from Helicobacter pylori by our homologs, the McrC subunit of the (m4C, m5C group [114, 154] and by Eugene Koonin’s or hm5C)-specific restriction enzyme McrBC, group [147]. We have also found that type II the hm5C-specific restriction enzyme enzymes Eco29kI, NgoMIII, NgoAIII, and PvuRts1I, herpesvirus alkaline exonucleases, MraI are homologous to the GIY-YIG endo- Archaeal-type Holliday junction resolvases nuclease domain present in certain homing Hjc, various proteins containing the NTPase endonucleases and DNA repair and recombi- module like the RecB and DNA2 nuclease fam- nation enzymes [114] and that the HgiDII en- ilies or other enzymes involved in DNA recom- zyme is related to the DNA repair enzyme bination and repair [146–151]. It has been MutL, which also possesses a distinct fold also found out that the catalytic domain of (J.M. Bujnicki, unpublished, and P. Friedhoff, tRNA splicing endonuclease EndA bears strik- cited as personal communication in ref. [10]). ing resemblance to the minimal core of the Presently, most of these predictions await ex- PD-(D/E)XK fold [152], although it developed perimental confirmation, however even in the 950 J.M. Bujnicki 2001

Figure 9. Proposed scheme of evolution of the PD-(D/E)XK family of proteins that depicts radiation and divergence of the a and b subfamilies of restriction enzymes [38, 140]. Secondary structural elements in the topological diagrams are coded as described in Fig. 7a. Evolutionary steps (ac- quisition and loss of structural elements) are indicated by arrows, elements that are conserved in a given step and in a given sub-lineage are shaded, novel elements are shown in white. The major features that allow distinction be- tween the two lineages are depicted by dotted circles: i) the directionality of the 5th b-strand (parallel in the a-lineage and antiparallel in the b-lineage) and ii) the appearance of an additional small b-sheet that participates in target rec- ognition in the b-lineage. The additional b-sheet of l-exo and other b-enzymes is a topologically different and hence independently acquired feature. Other peculiarities are the unusual left-handed b-a-b element at the C-teriminal edge of the b-strand in Vsr [80], as opposed to the typical right-handed structure in other proteins, and the fact that the core of T7 Endo I is made of fragments of two polypeptide chains forming a swapped dimer [144]. absence of crystal structures of ENases with multiple occasions. Moreover, analysis of the any of the three “alternative” folds it became various combinations of structural modules clear that restriction enzymes have evolved on present in homing endonucleases and type IIS Vol. 48 Restriction-modification systems 951

linear DNA and nucleotide triphosphate (NTP) hydrolysis before DNA cleavage can oc- cur [70]. Type I and III restriction enzymes re- quire ATP for activity (reviewed in ref. [8]), while McrBC requires GTP [157]. Type I en- zymes and McrBC exhibit a similar mecha- nism: they translocate along DNA from their recognition sites in a reaction powered by NTP hydrolysis until they encounter a block to translocation, which stimulates DNA cleav- age [158, 159]. The block is normally another enzyme molecule translocating from another site or a topological barrier resulting from supercoiling of the loop between the two en- zymes, explaining the dependence of reaction on two sites. However, other non-specific blocks to translocation, such as a bound Figure 10. Cartoon diagrams of four structurally and evolutionarily distinct nuclease families, repressor or a Holliday junction also stimu- whose members have been identified as alterna- late cleavage. One peculiarity of type I en- tive ENase domains in the context of RM systems. zymes is that they do not turn over in the cleavage reaction, but they hydrolyze ATP a) The canonical PD-(D/E)XK domain exemplified by a non-specific cleavage domain of FokI (PDB code 1fok long after DNA cleavage has stopped [160]. In [111]), which shows relatively few elaborations of the contrast, type III enzymes, require a specific minimal common fold as compared to other, se- contact between the two translocating enzyme quence-specific enzymes (Fig. 8), b) aMg2+-independ- molecules and non-specific blocks are inhibi- ent and hence EDTA-resistant Nuc/phospholipase D tory [19]. Bickle and coworkers demonstrated domain; a homology model of NgoAVII (J.M. Bujnicki, that cooperation between two enzymes is nec- unpublished data) based on coordinates of the S. typhimurium nuclease (1byr), c) the DNase domain of essary for ds DNA cleavage, since each colicin E7, a HNH superfamily member (7cei [155]), d) translocating enzyme complex cuts only one a model of the GIY-YIG nuclease catalytic domain ob- strand of DNA [22]. tained from an ab initio folding simulation based on The R subunit of all type I RM systems and published NMR restraints [156]. the Res subunit of all III RM systems com- prise two modules: a large DNA translocase module, exhibiting sequence similarity to cer- and certain multimodular type II restriction tain DNA and RNA helicases (Fig. 11a) [161] enzymes suggests that the “remote cutters” and a small PD-(D/E)XK cleavage domain arose independently multiple times from vari- (Figs. 1c, 2b). In type I enzymes the ous combinations of “cleavage domains” and PD-(D/E)XK domain is located at the TRDs with alternative folds and therefore rep- N-terminus of the DNA translocase domain, resent an interesting example of convergent while in type III enzymes it is located at its evolution. C-terminus [12], implying another case of se- quence permutation in RM proteins. Helicases are enzymes that separate duplex DNA translocase (helicase-like) domain DNA or RNA into single strands with the help All type I and III restriction enzymes, to- of ATP; on the basis of sequence comparison, gether with the modification-dependent en- they have been classified into five “super- zyme McrBC, require two recognition sites in families” (reviewed in refs. [162, 163]). How- 952 J.M. Bujnicki 2001

of the RecA protein [164], and several re- gions, which are not conserved between “superfamilies” and which in type I ENases were suggested to form additional domains re- quired for protein–protein interactions [12] (Fig. 1c). McrBC is the only known nuclease, which re- quires GTP [157]. Deletion mutagenesis stud- ies demonstrated that the N-terminal domain of McrB, missing from the naturally truncated form McrBS, is solely responsible for DNA binding and can be regarded as the TRD [165, 166]. On the other hand, GTP-binding motifs were identified in the amino-acid sequence of the central and C-terminal region of McrB [66], which also harbors determinants for Figure 11. Cartoon diagrams of components of binding of McrC [167]. However, site-directed the DNA translocase modules in NTP-dependent mutagenesis studies suggested that McrB is restriction enzymes. functionally and presumably structurally dis- a) The two RecA-like domains of the EcoAI R subunit tinct from the classic GTP-binding proteins homology-modeled (J.M. Bujnicki, unpublished) based [168]. Recently, based on extensive bioin- on atomic coordinates of the ATP-dependent “DEAD-box” proteins Mj0669 and Eif-4A (1hv8 and formatics analysis, it has been suggested that 1qva, respectively). The detailed mode of protein–DNA the GTPase module of McrB is related to the interactions and the mutual position of the two do- so-called AAA-ATPases (ATPases associated mains in the active enzyme is unknown, b) the E. coli with a variety of cellular activities) [169, 170], McrB monomer homology-modeled (J.M. Bujnicki, un- as well as the DnaA and RuvB helicases, the published) based on atomic coordinates of the Clp/Hsp100 family, clamp loading subunits AAA+-superfamily members RuvB (1hqc), Cdc6P for DNA polymerase, dynein motors and (1fnn) and the D2 domain of N-ethylmaleimide-sen- sitive fusion protein (1d2n). other proteins that appear to function as mo- lecular matchmakers in the assembly, opera- tion, and disassembly of diverse protein ma- ever, many proteins containing motifs com- chines or DNA–protein complexes [171] (Fig. mon to one or more of the “superfamilies” and 11b). In many cases, AAA domains assemble described initially as “putative helicases”, do into hexameric rings that are likely to change not appear to catalyze an unwinding reaction their shape during the ATPase cycle (reviewed [163]. Remarkably, the strand separation and in ref. [172]). However, the results of gel filtra- translocation activity could not be demon- tion and scanning transmission electron mi- strated for type I and III ENases, however it is croscopy analysis indicate that McrB and its believed that they accomplish dsDNA truncated version McrBS form forms single translocation via a helicase-like mechanism heptameric rings as well as tetradecamers, [12]. The DNA translocase module of type I with the latter being more stable when McrC and III ENases belongs to the large group of is bound [173]. However, the location and ex- evolutionarily related enzymes, which in- act stoichiometry of McrC in the McrBC cludes helicase superfamilies I and II and vari- nuclease could not be identified. Moreover, it ous DNA recombination and repair enzymes is still unclear, why McrBC is dependent on [12, 146]. This module spans two structurally GTP and not on ATP, like virtually all of its similar domains, whose fold is related to that homologs. Vol. 48 Restriction-modification systems 953

Regulatory proteins The regulatory protein from the unusual LlaI RM system [62] was shown to enhance The characterization of type II RM systems expression of LlaI restriction at a post-trans- has shown that some systems contain other criptional level rather than to function as a components in addition to the requisite transcriptional activator, despite its sequence endonuclease and methyltransferase. One of similarity to HTH proteins [182]. Similarly, these is the C (controller) protein, which has regulation of the ENase activity by inhibiting been proposed to allow establishment of RM intracellular subunit association was reported systems in new hosts by delaying the appear- for the PvuII enzyme and a 28-amino-acid pep- ance of restriction activity; its gene generally tide, designated W.PvuII [183]. precedes and in some cases partially overlaps the ENase gene [174]. C proteins have not yet been structurally characterized, but their Other elements associated with RM systems amino-acid sequences reveal that they are There have been several reports of the close probably helix-turn-helix proteins similar to association between enzymes involved in numerous known activators and repressors of DNA mobility and RM systems. Genes and gene expression (reviewed in ref. [175]). Se- partial genes encoding phage-like integrases quence comparisons have identified a con- and other proteins from the tyrosine served DNA sequence element termed a recombinase (Int) superfamily occur next to “C box” immediately upstream of most the sinIR [184], accIM [185], ecoHK31IM, and C genes [176]. It has been shown that C.PvuII eaeIM genes [186]. Genes for putative pro- and C.BamHI are DNA-binding proteins that teins similar to DNA invertases and bind to the C box and by autogenous activa- resolvases are found near the PaeR7I [187], tion of the polycistronic pvuIICR or bamHICR BglII [188], and ApaLI [189] RM systems. A promoter contribute to the temporal activa- complete copy of the IS982 element with a tion of the ENase gene expression (ref. [177] DDE-superfamily transposase-encoding gene and A. Sohail, I. Ghosh, R.M. Fuentes, and was identified between the llaKR2IR and J.E. Brooks, unpublished results cited llaKR2IM genes [190]; a putative transposase therein). It has been also demonstrated that was also found in the intergenic area between there is some cross-complementation between the eco47IR and eco47IIM genes [191]. These the C genes from different RM systems [178]. proteins may facilitate the transfer of RM Kobayashi and coworkers reported that genes among different bacterial strains. some type II RM systems on plasmids resist displacement by a bearing RM sys- tems with ENase and MTase of distinct speci- Genomic context of evolution, structure, ficity but the C protein of the same specific- and function of RM systems ity. An apparent cell suicide results from chromosome cleavage at unmodified sites by Currently hundreds of sequences of func- prematurely expressed ENase from an in- tionally characterized DNA MTases and coming RM system [179]. In general, C genes ENases are available in public databases were found to play important roles in the [192]. Although this number is still growing, maintenance, establishment, and mutual ex- we are also faced with a virtual explosion in clusion of RM systems. These roles are remi- the number of sequences of putative RM pro- niscent of the strategies of temperate bacte- tein deduced from data produced by numer- riophages [180] and are in accord with the ous Prokaryotic genome-sequencing projects. “selfish gene” hypothesis for the spread and 75% of completely sequences genomes appear maintenance of RM gene complexes [5, 181] to contain multiple RM systems (up to two (see also below). dozens in the case of Helicobacter pylori J99), 954 J.M. Bujnicki 2001 most of which have never been assayed bio- article. For instance, it seems plausible that chemically. However, as emphasized based on some restriction enzymes comprise cleavage the recent results of genome-wide analyses domains homologous to the LAGLIDADG, carried out for putative RM systems of H. AP, RusA, RuvC/RNase H or other nuclease pylori J99 [193], H. pylori 26695 [194] and superfamilies [147, 198], rather than the Cyanobacterium Anabaena strain PCC7120 PD-(D/E)XK, HNH, GIY-YIG and Nuc [122], many of the candidate genes are in fact superfamilies described to date. On the other pseudogenes in various states of decomposi- hand, the numerous ongoing structural tion. Nevertheless, as demonstrated for the genomics programs will undoubtedly provide Hpy99I system, which has been identified more insight into cases like the yeast RPB5 based on sequence analysis and subsequently subunit of RNA polymerase from Saccharo- characterized biochemically, the remaining myces cerevisiae, which comprises a active RM genes, may be a rich source of novel PD-(D/E)XK-like domain without the nuclease specificities [193]. Evidently, the genome- active site [199] or the EndA enzyme, whose based screening method has several impor- PD-(D/E)XK-like domain acquired the RNase tant advantages over conventional methods A-like active site on an opposite face of the employing testing the crude cell extracts for protein [153]. The latter case is especially in- their restriction activity: it can save the fer- teresting, since it suggests that additional mentation of large amount of microbes, which binding or catalytic sites could be engineered may pathogenic or very difficult to grow; and in structures of restriction ENases from the allows cloning and expression of RM systems, PD-(D/E)XK superfamily [152]. whose activity is not detectable in cell ex- The existence of specific relationships be- tracts. tween certain restriction enzymes and other Genome-wide comparisons carried out for evolutionarily conserved nucleases inferred pairs of related strains of: e-Proteobacteria H. from structural studies and sequence compar- pylori [195] and Archaea Pyrococcus abyssii isons on a genome scale suggests that they and P. horikoshii [196] suggested that the have arisen on multiple occasions from differ- presence of RM systems is often associated ent nuclease lineages [147]. It is tempting to with various types of genome polymorphisms. speculate that most of restriction ENases It has been noted, that certain chromosomal evolved as self-propagating, “selfish” ele- loci in different strains of related bacteria ments from DNA repair enzymes or other cel- may be associated with unrelated or very re- lular nucleases, however the available data do motely related RM systems that exhibit differ- not allow to draw definite conclusions. None- ent specificities [197]. This suggests that the theless, in the course of comparative analysis representational difference analysis may be of sequences and structures of various nu- used for isolation of novel RM systems based cleases carried out by our group and by others on genomic sequence analysis even if the se- it became clear that the major families of se- quence of the genome of the strain of interest quence-specific restriction enzymes are re- is not available. lated to either structure-specific or nonspe- From the data generated by combined theo- cific nucleases [114, 140, 146, 147, 149, 150, retical and experimental genomic approaches 154, 198]. It suggests that evolutionary path- many more surprises can be expected, not ways leading from non-specific nucleases to only as a result of enzymes with new speci- highly sequence-specific restriction enzymes ficities or new “types” combining old domains or vice versa can be inferred, provided suffi- in unprecedented manner, but also because cient number of sequences and structures cor- some RM systems may comprise novel do- responding to “evolutionary intermediates”. mains, not related to those described in this Even though many putative RM genes are in- Vol. 48 Restriction-modification systems 955 active, their sequences may aid in generation cific cleavage domains that are able to bind to of multiple sequence alignments and phylo- DNA on their own seems more promising genetic trees. The use of “intermediate se- than modifying the highly elaborated quences” is also helpful in molecular model- DNA-binding surface of enzymes like EcoRV. ing, where one attempts to predict the Unfortunately, only a few crystal structures three-dimensional structure of a protein of in- are available for the non-specific nucleases terest based on sequence alignment to a ho- [112, 206] and none for the “sloppy” MTases. mologous protein of known structure [200, Our unpublished results suggest that the 201]. three-dimensional structure of certain Such information could guide mutagenesis ENases can be predicted based on results of experiments aiming at rational engineering of sequence-structure threading even in the ab- restriction enzymes with new specificities. To sence of significant sequence similarity, how- date, attempts to change the specificity of ever it remains to be verified experimentally if type II restriction enzymes using site-directed such models are of sufficient resolution to or random mutagenesis were rather unsuc- guide knowledge-based redesign of DNA-bin- cessful [202, 203]. It has been concluded that ding determinants. Nevertheless, it is obvious even for the very well characterized restric- that good insight into evolutionary plasticity tion enzymes, like EcoRV, properties that de- of functionally important elements in RM pro- termine specificity and selectivity are difficult teins can be obtained in the course of compar- to model on the basis of the available struc- ative analysis carried out using advanced tural information [204]. However, with the computational methods. In my opinion the broad range of enzymes with different elusive goal of creating MTases and ENases specificities in hand one can systematically with novel specificities will be achieved only if analyze the structure–function relationships the large-scale bioinformatics and experimen- and follow the evolutionary history of selected tal approaches are combined. families of RM proteins. Since MTases show much greater sequence similarity than ENases, several projects have been launched CONCLUSIONS aiming at engineering of their specificity based on phylogenetic analysis and identifica- This review covers recent results on the tion of mutations correlated with functional structure and evolution of RM enzymes. One modifications. To date, there has been no immediately obvious fact is the rapid accelera- spectacular success; it has been concluded tion in the production of new data in this field. that the evolutionary pathway for specificity This has allowed the demonstration of phylo- change leads through a stage of relaxed speci- genetic and mechanistic links between RM en- ficity (ref. [108], S. Klimasauskas, personal zymes and other proteins that often possess communication, J.M. Bujnicki and M. similar biochemical or enzymatic properties. Radlinska, unpublished). It suggests that best The wealth of new data becoming available targets for specificity engineerning would be should help to answer many open questions not the highly specific enzymes studied pres- concerning the structure–function relation- ently, but the “sloppy” ones [205], which make ships of RM proteins. No doubt the approach only a few key protein–DNA contacts to recog- of functional genomics will play a significant nize their target (or rather a broad range of role in identifying genes coding for novel targets). A similar approach seems applicable ENases and MTases, and the newly developed for engineering of ENases with novel speci- computational tools will guide their experi- ficities. In my opinion, engineering specificity mental characterization and engineering, sug- into polypeptide loops of inherently non-spe- gesting that a new era in research on the 956 J.M. Bujnicki 2001 structure and function of RM systems has just 5. Kobayashi, I., Nobusato, A., Kobayashi- begun. Takahashi, N. & Uchiyama, I. (1999) Shaping the genome-restriction-modification systems I would like to thank Drs. Ashok Bhagwat, as mobile genetic elements. Curr. Opin. Genet. Thomas Bickle, Robert Blumenthal, Xiaodong Dev. 9, 649–656. Cheng, David Dryden, Jeff Elhai, Alan Fried- 6. Dryden, D.T. (1999) Bacterial DNA man, Peter Friedhoff, Arvydas Janulaitis, Al- bert Jeltsch, Antal Kiss, Saulius methyltranferases; in: S-Adenosylmethio- nine-dependent Methyltransferases: Structures Klimasauskas, Ichizo Kobayashi, Daniel Panne, Andrzej Piekarowicz, Alfred Pingoud, and Functions (Cheng, X. et al., eds.) pp. 283– Monika Radlinska, Virgis Siksnys, and 340, World Scientific Inc., Singapore. Geoffrey Wilson for stimulating discussions 7. Murray, N.E. (2000) Type I restriction sys- and kind provision of unpublished data. I tems: Sophisticated molecular machines thank Drs. Xiaodong Cheng, Alan Friedman, (a legacy of Bertani and Weigle). Microbiol. Richard Gumport, Sanford A. Lacks, Michael Mol. Biol. Rev. 64, 412–434. Topal, and Simon E. Phillips for sending me 8. Rao, D.N., Saha, S. & Krishnamurthy, V. coordinates of crystal structures of RM en- zymes before they were made publicly avail- (2000) ATP-dependent restriction enzymes. able. I also thank numerous colleagues for Prog. Nucleic Acid Res. Mol. Biol. 64, 1–63. sending reprints and preprints of their arti- 9. Pingoud, A. and Jeltsch, A. (1997) Recognition cles and apologize for not being able to cite all and cleavage of DNA by type-II restriction references due to space limitations. Finally, I endonucleases. Eur. J. Biochem. 246, 1–22. am indebted to Drs. Robert Blumenthal and Monika Radlinska for critical reading of the 10. Pingoud, A. & Jeltsch, A. (2001) Structure and manuscript and to Leszek Rychlewski for his function of type II restriction endonucleases. constant support. Nucleic Acids Res. 29, 3705–3727. 11. Pingoud, A., Jeltsch, A., Maxwell, A. & Sherratt, D. (2001) Enzymes that keep DNA REFERENCES under control: Meeting: DNA enzymes: struc- tures and mechanisms. EMBO Rep. 2, 1. Wilson, G.G. & Murray, N.E. (1991) Restric- 271–276. tion and modification systems. Annu. Rev. Genet. 25, 585–627. 12.Davies, G.P., Martin, I., Sturrock, S.S., Cronshaw, A., Murray, N.E. & Dryden, D.T. 2. Arber, W. & Dussoix, D. (1962) Host specific- (1999) On the structure and operation of type I ity of DNA produced by Escherichia coli. Host DNA restriction enzymes. J. Mol. Biol. 290, controlled modification of bacteriophage l. J. 565–579. Mol. Biol. 5, 18–36. 13. Dryden, D.T., Cooper, L.P., Thorpe, P.H. & By- 3. Arber, W. (1979) Promotion and limitation of ron, O. (1997) The in vitro assembly of the genetic exchange. Science 205, 361–365. EcoKI type I DNA restriction/modification en- 4. Lacks, S.A., Ayalew, S., de la Campa, A.G. & zyme and its in vivo implications. Biochemis- try, 36, 1065–1076. Greenberg, B. (2000) Regulation of compe- tence for genetic transformation in Streptococ- 14. Ellis, D.J., Dryden, D.T., Berge, T., cus pneumoniae: Expression of dpnA, a late Edwardson, J.M. & Henderson, R.M. (1999) competence gene encoding a DNA methyl- Direct observation of DNA translocation and transferase of the DpnII restriction system. cleavage by the EcoKI endonuclease using Mol. Microbiol. 35, 1089–1098. Vol. 48 Restriction-modification systems 957

atomic force microscopy. Nat. Struct. Biol. 6, 24.Roberts, R.J. & Cheng, X. (1998) Base flip- 15–17. ping. Annu. Rev. Biochem. 67, 181–198.

15. Yuan, R., Hamilton, D.L. & Burckhardt, J. 25.Hornby, D.P. & Ford, G.C. (1998) Pro- (1980) DNA translocation by the restriction tein-mediated base flipping. Curr. Opin. enzyme from E. coli K. Cell 20, 237–244. Biotechnol. 9, 354–358. 16. Berge, T., Ellis, D.J., Dryden, D.T., 26.Ho, D.K., Wu, J.C., Santi, D.V. & Floss, H.G. Edwardson, J.M. & Henderson, R.M. (2000) (1991) Stereochemical studies of the Translocation-independent dimerization of C-methylation of deoxycytidine catalyzed by the EcoKI endonuclease visualized by atomic HhaI methylase and the N-methylation of force microscopy. Biophys. J. 79, 479–484. deoxyadenosine catalyzed by EcoRI methylase. Arch. Biochem. Biophys. 284, 17. Boyer, H.W. (1971) DNA restriction and mod- 264–269. ification mechanisms in bacteria. Annu. Rev. Microbiol. 25, 153–176. 27. Ahmad, I. & Rao, D.N. (1996) Chemistry and biology of DNA methyltransferases. Crit. Rev. 18. Kauc, L. & Piekarowicz, A. (1978) Purification Biochem. Mol. Biol. 31, 361–380. and properties of a new restriction endonuclease from Haemophilus influenzae Rf. 28.Gong, W., O’Gara, M., Blumenthal, R.M. & Eur. J. Biochem. 92, 417–426. Cheng, X. (1997) Structure of PvuII DNA-(cy- tosine N4) methyltransferase, an example of 19. Meisel, A., Mackeldanz, P., Bickle, T.A., domain permutation and protein fold assign- Kruger, D.H. & Schroeder, C. (1995) Type III ment. Nucleic Acids Res. 25, 2702–2715. restriction endonucleases translocate DNA in a reaction driven by recognition site-specific 29.Vertino, P.M. (1999) Eukaryotic DNA ATP hydrolysis. EMBO J. 14, 2958–2966. methyltransferases; in: S-Adenosylmethionine- dependent Methyltransferases: Structures and 20.Bist, P., Sistla, S., Krishnamurthy, V., Functions (Cheng, X. et al., eds.) pp. 341–372, Acharya, A., Chandrakala, B. & Rao, D.N. World Scientific Inc., Singapore. (2001) S-Adenosyl-L-methionine is required for DNA cleavage by type III restriction en- 30.Winkler, F.K., Banner, D.W., Oefner, C., zymes. J Mol. Biol. 310, 93–109. Tsernoglou, D., Brown, R.S., Heathman, S.P., Bryan, R.K., Martin, P.D., Petratos, K. & Wil- 21. Meisel, A., Bickle, T.A., Kruger, D.H. & son, K.S. (1993) The crystal structure of Schroeder, C. (1992) Type III restriction en- EcoRV endonuclease and of its complexes with zymes need two inversely oriented recognition cognate and non-cognate DNA fragments. sites for DNA cleavage. Nature 355, 467–469. EMBO J. 12, 1781–1795. 22.Janscak, P., Sandmeier, U., Szczelkun, M.D. & 31. Cheng, X., Balendiran, K., Schildkraut, I. & Bickle, T.A. (2001) Subunit assembly and Anderson, J.E. (1994) Structure of PvuII mode of DNA cleavage of the type III restric- endonuclease with cognate DNA. EMBO J. tion endonucleases EcoP1I and EcoP15I. J. 13, 3927–3935. Mol. Biol. 306, 417–431. 32.Kim, Y., Grable, J.C., Love, R., Green, P.J. & 23.Berman, H.M., Westbrook, J., Feng, Z., Rosenberg, J.M. (1990) Refinement of EcoRI Gilliland, G., Bhat, T.N., Weissig, H., endonuclease crystal structure: A revised pro- Shindyalov, I.N. & Bourne, P.E. (2000) The tein chain tracing. Science 249, 1307–1309. Protein Data Bank. Nucleic Acids Res. 28, 235–242. 33.Newman, M., Strzelecka, T., Dorner, L.F., Schildkraut, I. & Aggarwal, A.K. (1995) Struc- 958 J.M. Bujnicki 2001

ture of BamHI endonuclease bound to DNA: quired for DNA cleavage. Proc. Natl. Acad. Sci Partial folding and unfolding on DNA binding. U.S.A. 95, 10570–10575. Science 269, 656–663. 43.Vanamee, E.S., Santagata, S. & Aggarwal, 34.Newman, M., Lunnen, K., Wilson, G., Greci, A.K. (2001) FokI requires two specific DNA J., Schildkraut, I. & Phillips, S.E. (1998) Crys- sites for cleavage. J. Mol. Biol. 309, 69–78. tal structure of restriction endonuclease BglI 44.Kim, Y.G. & Chandrasegaran, S. (1994) Chi- bound to its interrupted DNA recognition se- meric restriction endonuclease. Proc. Natl. quence. EMBO J. 17, 5466–5476. Acad. Sci U.S.A. 91, 883–887. 35.Stankevicius, K., Lubys, A., Timinskas, A., 45.Kim, Y.G., Smith, J., Durgesha, M. & Vaitkevicius, D. & Janulaitis, A. (1998) Clon- Chandrasegaran, S. (1998) Chimeric restric- ing and analysis of the four genes coding for tion enzyme: Gal4 fusion to FokI cleavage do- Bpu10I restriction-modification enzymes. Nu- main. Biol. Chem. 379, 489–495. cleic Acids Res. 26, 1084–1091. 46.Chandrasegaran, S. & Smith, J. (1999) Chime- 36.Hsieh, P.C., Xiao, J.P., O’loane, D. & Xu, S.Y. ric restriction enzymes: what is next? Biol. (2000) Cloning, expression and purification of Chem. 380, 841–848. a thermostable nonhomodimeric restriction enzyme, BslI. J Bacteriol. 182, 949–955. 47. Morgan, R.D., Calvet, C., Demeter, M., Agra, R. & Kong, H. (2000) Characterization of the 37. Kruger, D.H., Barcak, G.J., Reuter, M. & specific DNA nicking activity of restriction Smith, H.O. (1988) EcoRII can be activated to endonuclease N.BstNBI. Biol. Chem. 381, cleave refractory DNA recognition sites. Nu- 1123–1125. cleic Acids Res. 16, 3997–4008. 48.Higgins, L.S., Besnier, C. & Kong, H. (2001) 38.Huai, Q., Colandene, J.D., Chen, Y., Luo, F., The nicking endonuclease N.BstNBI is closely Zhao, Y., Topal, M.D. & Ke, H. (2000) Crystal related to Type IIs restriction endonucleases structure of NaeI-an evolutionary bridge be- MlyI and PleI. Nucleic Acids Res. 29, tween DNA endonuclease and topoisomerase. 2492–2501. EMBO J. 19, 3110–3118. 49.Gunn, J.S. & Stein, D.C. (1997) The Neisseria 39.Deibert, M., Grazulis, S., Sasnauskas, G., gonorrhoeae S.NgoVIII restriction/modifica- Siksnys, V. & Huber, R. (2000) Structure of tion system: A type IIs system homologous to the tetrameric restriction endonuclease the Haemophilus parahaemolyticus HphIre- NgoMIV in complex with cleaved DNA. Nat. striction/modification system. Nucleic Acids Struct. Biol. 7, 792–799. Res. 25, 4147–4152. 40.Bilcock, D.T. & Halford, S.E. (1999) DNA re- 50.Friedrich, T., Fatemi, M., Gowhar, H., striction dependent on two recognition sites: Leismann, O. & Jeltsch, A. (2000) Specificity Activities of the SfiI restriction-modification of DNA binding and methylation by the system in Escherichia coli. Mol. Microbiol. 31, M.FokI DNA methyltransferase. Biochim. 1243–1254. Biophys. Acta 1480, 145–159. 41. Szybalski, W., Kim, S.C., Hasan, N. & 51. Janulaitis, A., Petrusyte, M., Maneliene, Z., Podhajska, A.J. (1991) Class-IIS restriction en- Klimasauskas, S. & Butkus, V. (1992) Purifica- zymes — a review. Gene 100, 13–26. tion and properties of the Eco57I restriction 42.Bitinaite, J., Wah, D.A., Aggarwal, A.K. & endonuclease and methylase-prototypes of a Schildkraut, I. (1998) FokI dimerization is re- new class (type IV). Nucleic Acids Res. 20, 6043–6049. Vol. 48 Restriction-modification systems 959

52.Jurenaite-Urbanaviciene, S., Kazlauskiene, 61. Kovall, R.A. & Matthews, B.W. (1999) Type II R., Urbelyte, V., Maneliene, Z., Petrusyte, M., restriction endonucleases: Structural, func- Lubys, A. & Janulaitis, A. (2001) Characteriza- tional and evolutionary relationships. Curr. tion of BseMII, a new type IV restriction-mo- Opin. Chem. Biol. 3, 578–583. dification system, which recognizes the penta- 62.O’Sullivan, D.J., Zagula, K. & Klaenhammer, nucleotide sequence 5¢-CTCAG(N)(10/8). Nu- T.R. (1995) In vivo restriction by LlaIisen- cleic Acids Res. 29, 895–903. coded by three genes, arranged in an operon 53.Janulaitis, A., Vaisvila, R., Timinskas, A., with llaIM, on the conjugative Lactococcus Klimasauskas, S. & Butkus, V. (1992) Cloning plasmid pTR2030. J. Bacteriol. 177, 134–143. and sequence analysis of the genes coding for 63.de la Campa, A.G., Kale, P., Springhorn, S.S. Eco57I type IV restriction-modification en- & Lacks, S.A. (1987) Proteins encoded by the zymes. Nucleic Acids Res. 20, 6051–6056. DpnII restriction gene cassette. Two 54.Kong, H. (1998) Analyzing the functional orga- methylases and an endonuclease. J. Mol. Biol. nization of a novel restriction modification 196, 457–469. system, the BcgI system. J. Mol. Biol. 279, 64.Merkiene, E., Vilkaitis, G. & Klimasauskas, S. 823–832. (1998) A pair of single-strand and dou- 55.Vitor, J.M. & Morgan, R.D. (1995) Two novel ble-strand DNA cytosine-N4 methyltrans- restriction endonucleases from Campylobacter ferases from Bacillus centrosporus. Biol. Chem. jejuni. Gene 157, 109–110. 379, 569–571.

56.Piekarowicz, A., Golaszewska, M., Sunday, 65.Revel, H.R. (1967) Restriction of non- A.O., Siwinska, M. & Stein, D.C. (1999) The glucosylated T-even bacteriophage: Properties HaeIV restriction modification system of of permissive mutants of Escherichia coli B Haemophilus aegyptius is encoded by a single and K12. Virology 31, 688–701. polypeptide. J Mol. Biol. 293, 1055–1065. 66.Dila, D., Sutherland, E., Moran, L., Slatko, B. 57. Kong, H. & Smith, C.L. (1897) Does BcgI, a & Raleigh, E.A. (1990) Genetic and sequence unique restriction endonuclease, require two organization of the mcrBC locus of Escherichia recognition sites for cleavage? Biol. Chem. coli K-12. J. Bacteriol. 172, 4888–4900. 379, 605–609. 67. Waite-Rees, P.A., Keating, C.J., Moran, L.S., 58.Gormley, N.A., Bath, A.J. & Halford, S.E. Slatko, B.E., Hornstra, L.J. & Benner, J.S. (2000) Reactions of BglI and other type II re- (1991) Characterization and expression of the striction endonucleases with discontinuous Escherichia coli Mrr restriction system. J recognition sites. J Biol. Chem. 275, Bacteriol. 173, 5207–5219. 6928–6936. 68.Lacks, S. & Greenberg, B. (1977) Complemen- 59.Jeltsch, A. & Pingoud, A. (1996) Horizontal tary specificity of restriction endonucleases of gene transfer contributes to the wide distribu- Diplococcus pneumoniae with respect to DNA tion and evolution of type II restric- methylation. J. Mol. Biol. 114, 153–168. tion-modification systems. J. Mol. Evol. 42, 69.Janosi, L., Yonemitsu, H., Hong, H. & Kaji, A. 91–96. (1994) and expression of a 60.Jeltsch, A., Wenz, C., Wende, W., Selent, U. & novel hydroxymethylcytosine-specific restric- Pingoud, A. (1996) Engineering novel restric- tion enzyme (PvuRts1I) modulated by tion endonucleases: Principles and applica- glucosylation of DNA. J. Mol. Biol. 242, tions. Trends Biotechnol. 14, 235–238. 45–61. 960 J.M. Bujnicki 2001

70.Bickle, T.A. & Kruger, D.H. (1993) Biology of crystal structure of very short patch repair DNA restriction. Microbiol. Rev.57, 434–450. endonuclease in complex with a DNA duplex. Cell 99, 615–623. 71. Jurica, M.S. & Stoddard, B.L. (1999) Homing endonucleases: Structure, function and evolu- 81. Behrens, B., Noyer-Weidner, M., Pawlek, B., tion. Cell Mol. Life Sci. 55, 1304–1326. Lauster, R., Balganesh, T.S. & Trautner, T.A. (1987) Organization of multispecific DNA 72.Gimble, F.S. (2000) Invasion of a multitude of methyltransferases encoded by temperate Ba- genetic niches by mobile endonuclease genes. cillus subtilis phages. EMBO J. 6, 1137–1142. FEMS Microbiol. Lett. 185, 99–107. 82.Tran-Betcke, A., Behrens, B., Noyer-Weidner, 73.Belfort, M. & Perlman, P.S. (1995) Mecha- M. & Trautner, T.A. (1986) DNA methyl- nisms of intron mobility. J. Biol. Chem. 270, transferase genes of Bacillus subtilis phages: 30237–30240. Comparison of their nucleotide sequences. 74. Marinus, M.G. (1996) Methylation of DNA; in Gene 42, 89–96. Escherichia coli and Salmonella typhimurium 83.Fuller-Pace, F.V. & Murray, N.E. (1986) Two (Neidhardt, F.C., ed.) pp. 782–791, ASM DNA recognition domains of the specificity Press, Washington DC. polypeptides of a family of type I restriction 75.Peterson, K.R., Wertman, K.F., Mount, D.W. enzymes. Proc. Natl. Acad. Sci. U.S.A. 83, & Marinus, M.G. (1985) Viability of Esche- 9368–9372. richia coli K-12 DNA adenine methylase (dam) 84.Wilke, K., Rauhut, E., Noyer-Weidner, M., mutants requires increased expression of spe- Lauster, R., Pawlek, B., Behrens, B. & cific genes in the SOS regulon. Mol. Gen. Trautner, T.A. (1988) Sequential order of tar- Genet. 201, 14–19. get-recognizing domains in multispecific 76. Wright, R., Stephens, C. & Shapiro, L. (1997) DNA-methyltransferases. EMBO J. 7, The CcrM DNA methyltransferase is wide- 2601–2609. spread in the alpha subdivision of 85.Lange, C., Jugel, A., Walter, J., Noyer- proteobacteria and its essential functions are Weidner, M. & Trautner, T.A. (1991) ‘Pseudo’ conserved in Rhizobium meliloti and domains in phage-encoded DNA methyl- Caulobacter crescentus. J. Bacteriol. 179, transferases. Nature 352, 645–648. 5869–5877. 86.Lange, C., Wild, C. & Trautner, T.A. (1996) 77. Gomez-Eichelmann, M.C. & Ramirez- Santos, Identification of a subdomain within DNA-(cy- J. (1993) Methylated cytosine at Dcm tosine-C5)-methyltransferases responsible for (CCATGG) sites in Escherichia coli: Possible the recognition of the 5¢ part of their DNA tar- function and evolutionary implications. J. get. EMBO J. 15, 1443–1450. Mol. Evol. 37, 11–24. 87. Trautner, T.A., Pawlek, B., Behrens, B. & 78.Lieb, M. & Bhagwat, A.S. (1996) Very short Willert, J. (1996) Exact size and organization patch repair: Reducing the cost of cytosine of DNA target-recognizing domains of multi- methylation. Mol. Microbiol. 20, 467–473. specific DNA-(cytosine-C5)-methyltransfera- 79.Ban, C. & Yang, W. (1998) Structural basis for ses. EMBO J. 15, 1434–1442. MutH activation in E.coli mismatch repair and 88.Mi, S. & Roberts, R.J. (1992) How M.MspI and relationship of MutH to restriction endo- M.HpaII decide which base to methylate. Nu- nucleases. EMBO J. 17, 1526–1534. cleic Acids Res. 20, 4811–4816. 80.Tsutakawa, S.E., Jingami, H. & Morikawa, K. (1999) Recognition of a TG mismatch: The Vol. 48 Restriction-modification systems 961

89.Pradhan, S. & Roberts, R.J. (2000) Hybrid of the HhaI DNA methyltransferase com- mouse-prokaryotic DNA (cytosine-5) methyl- plexed with S-adenosyl-L-methionine. Cell 74, transferases retain the specificity of the paren- 299–307. tal C-terminal domain. EMBO J. 19, 99.Reinisch, K.M., Chen, L., Verdine, G.L. & 2103–2114. Lipscomb, W.N. (1995) The crystal structure 90.Bujnicki, J.M. & Radlinska, M. (1999) Molecu- of HaeIII methyltransferase convalently lar phylogenetics of DNA 5mC-methyl- complexed to DNA: An extrahelical cytosine transferases. Acta Microbiol. Pol. 48, 19–33. and rearranged base pairing. Cell 82, 143– 153. 91. Gann, A.A., Campbell, A.J., Collins, J.F., Coulson, A.F. & Murray, N.E. (1987) 100. Labahn, J., Granzin, J., Schluckebier, G., Reassortment of DNA recognition domains Robinson, D.P., Jack, W.E., Schildkraut, I. & and the evolution of new specificities. Mol. Saenger, W. (1994) Three-dimensional struc- Microbiol. 1, 13–22. ture of the adenine-specific DNA methyl- transferase M.TaqI in complex with the co- 92.Kneale, G.G. (1994) A symmetrical model for factor S- adenosylmethionine. Proc. Natl. the domain structure of type I DNA Acad. Sci. U.S.A. 91, 10957–10961. methyltransferases. J. Mol. Biol. 243, 1–5. 101. Tran, P.H., Korszun, Z.R., Cerritelli, S., 93.MacWilliams, M.P. & Bickle, T.A. (1996) Gen- Springhorn, S.S. & Lacks, S.A. (1998) Crys- eration of new DNA binding specificity by tal structure of the DpnM DNA adenine methyltransferase from the DpnII restriction truncation of the type IC EcoDXXI hsdS gene. system of Streptococcus pneumoniaebound to EMBO J. 15, 4775–4783. S-adenosylmethionine. Structure 6, 1563– 94.Thorpe, P.H., Ternent, D. & Murray, N.E. 1575. (1997) The specificity of StySKI, a type I re- 102. Scavetta, R.D., Thomas, C.B., Walsh, M.A., striction enzyme, implies a structure with ro- Szegedi, S., Joachimiak, A., Gumport, R.I. & tational symmetry. Nucleic Acids Res. 25, Churchill, M.E. (2000) Structure of RsrI 1694–1700. methyltransferase, a member of the N6-ade- nine beta class of DNA methyltransferases. 95.Dybvig, K., Sitaraman, R. & French, C.T. Nucleic Acids Res. 28, 3950–3961. (1998) A family of phase-variable restriction enzymes with differing specificities generated 103. Dryden, D.T., Sturrock, S.S. & Winter, M. by high-frequency gene rearrangements. Proc. (1995) Structural modelling of a type I DNA methyltransferase. Nat. Struct. Biol. 2, 632– Natl. Acad. Sci. U.S.A. 95, 13923–13928. 635. 96.Schouler, C., Gautier, M., Ehrlich, S.D. & Cho- 104. O’Neill, M., Dryden, D.T. & Murray, N.E. pin, M.C. (1998) Combinational variation of (1998) Localization of a protein-DNA inter- restriction modification specificities in Lacto- face by random mutagenesis. EMBO J. 17, coccus lactis. Mol. Microbiol. 28, 169–178. 7118–7127. 97. Malone, T., Blumenthal, R.M. & Cheng, X. 105. Bujnicki, J.M. & Radlinska, M. (1999) Molec- (1995) Structure-guided analysis reveals nine ular evolution of DNA-(cytosine-N4) methyl- sequence motifs conserved among DNA transferases: Evidence for their polyphyletic amino-methyltransferases and suggests a cat- origin. Nucleic Acids Res. 27, 4501–4509. alytic mechanism for these enzymes. J. Mol. 106. Radlinska, M., Bujnicki, J.M. & Piekarowicz, Biol. 253, 618–632. A. (1999) Structural characterization of two 98.Cheng, X., Kumar, S., Posfai, J., Pflugrath, tandemly arranged DNA methyltransferase J.W. & Roberts, R.J. (1993) Crystal structure genes from Neisseria gonorrhoeae MS11: N4-cytosine specific M.NgoMXV and non- 962 J.M. Bujnicki 2001

functional 5-cytosine-type M.NgoMorf2P. computational and molecular findings. Nu- Proteins 37, 717–728. cleic Acids Res. 27, 2115–2125.

107. Radlinska, M. & Bujnicki, J.M. (2001) Clon- 116. Dixon, M., Fauman, E.B. & Ludwig, M.L. ing of enterohemorrhagic Escherichia coli (1999) The black sheep of the family: phage VT-2 Dam methyltransferase. Acta AdoMet-dependent methyltransferases that Microbiol. Pol. 50, 151–156. do not fit the consensus structural fold; in: S-Adenosylmethionine-dependent Methyltrans- 108. Beck, C., Cranz, S., Solmaz, M., Roth, M. & ferases: Structures and Functions (Cheng, X., Jeltsch, A. (2001) How does a DNA interact- et al., eds.) pp. 39–54, World Scientific Inc., ing enzyme change its specificity during mo- Singapore. lecular evolution? A site directed mutagene- sis study at the DNA binding site of the 117. Song, H.K., Sohn, S.H. & Suh, S.W. (1999) 6 DNA-(adenine-N )-methyltransferase EcoRV. Crystal structure of deoxycytidylate Biochemistry 40, 10956–10965. hydroxymethylase from bacteriophage T4, a component of the deoxyribonucleoside 109. Klimasauskas, S., Kumar, S., Roberts, R.J. & triphosphate-synthesizing complex. EMBO Cheng, X. (1994) HhaI methyltransferase J. 18, 1104–1113. flips its target base out of the DNA helix. Cell 76, 357–369. 118. Bujnicki, J.M. (1999) Comparison of protein structures reveals monophyletic origin of the 110. Goedecke, K., Pignot, M., Goody, R.S., AdoMet-dependent methyltransferase family Scheidig, A.J. & Weinhold, E. (2001) Struc- and mechanistic convergence rather than re- ture of the N6-adenine DNA methyl- cent differentiation of N4-cytosine and transferase M.TaqI in complex with DNA and N6-adenine DNA methylation. In Silico Biol. a cofactor analog. Nat. Struct. Biol. 8,121– 1, 1–8, 125. http://www.bioinfo.de/isb/1999–01/ 0016/. 111. Wah, D.A., Hirsch, J.A., Dorner, L.F., 119. Lauster, R. (1989) Evolution of type II DNA Schildkraut, I. & Aggarwal, A.K. (1997) methyltransferases. A gene duplication Structure of the multimodular endonuclease model. J. Mol. Biol. 206, 313–321. FokI bound to DNA. Nature 388, 97–100. 120. Jeltsch, A., Christ, F., Fatemi, M. & Roth, M. 112. Wah, D.A., Bitinaite, J., Schildkraut, I. & (1999) On the substrate specificity of DNA Aggarwal, A.K. (1998) Structure of FokI has methyltransferases. Adenine-N6 DNA implications for DNA cleavage. Proc. Natl. methyltransferases also modify cytosine res- Acad. Sci U.S.A. 95, 10564–10569. idues at position N4. J. Biol. Chem. 274, 113. Sapranauskas, R., Sasnauskas, G., Laguna- 19538–19544. vicius, A., Vilkaitis, G., Lubys, A. & Siksnys, 121. Jeltsch, A. (2001) The cytosine V. (2000) Novel subtype of type IIs restric- N4-methyltransferase M.PvuII also modifies tion enzymes. J Biol. Chem. 275, 30878– adenine residues. J. Biol. Chem. 382, 30885. 707–710. 114. Bujnicki, J.M., Radlinska, M. & Rychlewski, 122. Matveyev, A.V., Young, K.T., Meng, A. & L. (2001) Polyphyletic evolution of type II re- Elhai, J. (2001) DNA methyltransferases of striction enzymes revisited: Two independ- the Cyanobacterium Anabaena PCC7120. ent sources of second-hand folds revealed. Nucleic Acids Res. 29, 1491–1506. Trends Biochem. Sci. 26, 9–11. 123. Roth, M. & Jeltsch, A. (2001) Changing the 115. Kowalski, J.C., Belfort, M., Stapleton, M.A., target base specificity of the EcoRV DNA Holpert, M., Dansereau, J.T., Pietrokovski, methyltransferase by rational de novo pro- S., Baxter, S.M. & Derbyshire, V. (1999) Con- tein-design. Nucleic Acids Res. 29, 1–8. figuration of the catalytic GIY-YIG domain of intron endonuclease I-TevI: Coincidence of Vol. 48 Restriction-modification systems 963

124. Posfai, J., Bhagwat, A.S., Posfai, G. & Rob- 132. Xu, S.Y., Xiao, J.P., Posfai, J., Maunus, R.E. erts, R.J. (1989) Predictive motifs derived & Benner, J.S. (1997) Cloning of the BssHII from cytosine methyltransferases. Nucleic restriction-modification system in Esche- Acids Res. 17, 2421–2435. richia coli: BssHII methyltransferase con- tains circularly permuted cytosine-5 125. Klimasauskas, S., Timinskas, A., Menke- methyltransferase motifs. Nucleic Acids Res. vicius, S., Butkiene, D., Butkus, V. & 25, 3991–3994. Janulaitis, A. (1989) Sequence motifs charac- teristic of DNA[cytosine-N4]methyltransfe- 133. Cao, X., Springer, N.M., Muszynski, M.G., rases: Similarity to adenine and cytosine-C5 Phillips, R.L., Kaeppler, S. & Jacobsen, S.E. DNA-methylases. Nucleic Acids Res. 17, (2000) Conserved plant genes with similarity 9823–9832. to mammalian de novo DNA methyltrans- ferases. Proc. Natl. Acad. Sci. U.S.A. 97, 126. Kumar, S., Cheng, X., Klimasauskas, S., Mi, 4979–4984. S., Posfai, J., Roberts, R.J. & Wilson, G.G. (1994) The DNA (cytosine-5) methyltrans- 134. Bujnicki, J.M. (2000) Homology modelling of ferases. Nucleic Acids Res. 22, 1–10. the DNA 5mC methyltransferase M.BssHII. is permutation of functional subdomains 127. Bujnicki, J.M. & Radlinska, M. (2001) Clon- common to all subfamilies of DNA methyl- ing and characterization of M.LmoA118I, a transferases? Int. J. Biol. Macromol. 27, novel DNA:m4C methyltransferase from the 195–204. Listeria monocytogenes phage A118, a close homolog of M.NgoMXV. Acta Microbiol. Pol. 135. Jeltsch, A. (1999) Circular permutations in 50, 151–156. the molecular evolution of DNA methyltransferases. J. Mol. Evol. 49, 128. Piekarowicz, A. & Bujnicki, J.M. (1999) Clon- 161–164. ing of the Dam methyltransferase gene from Haemophilus influenzae bacteriophage HP1. 136. Heinemann, U. & Hahn, M. (1995) Circular Acta Microbiol. Pol. 48, 123–129. permutation of polypeptide chains: Implica- tions for protein folding and stability. Prog. 129. Bujnicki, J.M. & Radlinska, M. (1999) Is the Biophys. Mol. Biol. 64, 121–143. HemK family of putative S-adenosylmethio- nine-dependent methyltransferases a “miss- 137. Heringa, J. & Taylor, W.R. (1997) Three-di- ing” zeta subfamily of adenine methyl- mensional domain duplication, swapping transferases? A hypothesis. IUBMB Life 48, and stealing. Curr. Opin. Struct. Biol. 7, 247–250. 416–421. 130. Bujnicki, J.M. (2000) Phylogenomic analysis 138. Heitman, J. (1993) On the origins, structures of 16S rRNA:(guanine-N2) and functions of restriction-modification en- methyltransferases suggests new family zymes. Genet. Eng. N.Y. 15, 57–108. members and reveals highly conserved mo- tifs and a domain structure similar to other 139. Jeltsch, A., Kroger, M. & Pingoud, A. (1995) nucleic acid amino-methyltransferases. Evidence for an evolutionary relationship FASEB J. 14, 2365–2368. among type-II restriction endonucleases. Gene 160, 7–16. 131. Bujnicki, J.M. & Rychlewski, L. (2000) Diver- gence and retroconvergence in the evolution 140. Bujnicki, J.M. (2000) Phylogeny of the re- of sequence specificity and reaction mecha- striction endonuclease-like superfamily in- nism of DNA methyltransferases and their ferred from comparison of protein struc- relatives; in: Proceedings of the IUBMB Sym- tures. J. Mol. Evol. 50, 39–44. posium “DNA Enzymes: Structures and Mech- 141. Kovall, R.A. & Matthews, B.W. (1998) Struc- anisms”, Anonymous pp. 61, Bangalore, In- tural, functional and evolutionary relation- dia. ships between lambda-exonuclease and the 964 J.M. Bujnicki 2001

type II restriction endonucleases. Proc. Natl. tive site in the methyl-directed restriction en- Acad. Sci. U.S.A. 95, 7893–7897. zyme Mrr and its homologs. Gene 267, 183–191. 142. Bond, C.S., Kvaratskhelia, M., Richard, D., White, M.F. & Hunter, W.N. (2001) Struc- 151. Bujnicki, J.M. & Rychlewski, L. (2001) The ture of Hjc, a Holliday junction resolvase, herpesvirus alkaline exonuclease belongs to from Sulfolobus solfataricus. Proc. Natl. the restriction endonuclease PD-(D/E)XK Acad. Sci U.S.A. 98, 5509–5514. superfamily: Insight from molecular model- ing and phylogenetic analysis. Virus Genes 143. Nishino, T., Komori, K., Tsuchiya, D., Ishino, 22, 219–230. Y. & Morikawa, K. (2001) Crystal structure of the Archaeal Holliday junction resolvase 152. Bujnicki, J.M. & Rychlewski, L. (2001) Un- Hjc and implications for DNA recognition. usual evolutionary history of the tRNA splic- Structure 9, 197–204. ing endonuclease EndA: Relationship to the LAGLIDADG and PD-(D/E)XK deoxyribonu- 144. Hadden, J.M., Convery, M.A., Declais, A.C., cleases. Protein Sci. 10, 656–660. Lilley, D.M. & Phillips, S.E. (2001) Crystal structure of the Holliday junction resolving 153. Li, H., Trotta, C.R. & Abelson, J. (1998) Crys- enzyme T7 endonuclease I. Nat. Struct. Biol. tal structure and evolution of a transfer RNA 8, 62–67. splicing enzyme. Science 280, 279–284. 145. Hickman, A.B., Li, Y., Mathew, S.V., May, 154. Bujnicki, J.M., Radlinska, M. & Rychlewski, E.W., Craig, N.L. & Dyda, F. (2000) Unex- L. (2000) Atomic model of the 5-methylcyto- pected structural diversity in DNA recombi- sine-specific restriction enzyme McrA re- nation: The restriction endonuclease connec- veals an atypical zinc-finger and structural tion. Mol. Cell. 5, 1025–1034. similarity to bba Me endonucleases. Mol. Microbiol. 37, 1280–1281. 146. Aravind, L., Walker, D.R. & Koonin, E.V. (1999) Conserved domains in DNA repair 155. Ko, T.P., Liao, C.C., Ku, W.Y., Chak, K.F. & proteins and evolution of repair systems. Nu- Yuan, H.S. (1999) The crystal structure of cleic Acids Res. 27, 1223–1242. the DNase domain of colicin E7 in complex with its inhibitor Im7 protein. Structure Fold. 147. Aravind, L., Makarova, K.S. & Koonin, E.V. Des. 7, 91–102. (2000) Holliday junction resolvases and re- lated nucleases: Identification of new fami- 156. Bujnicki, J.M., Rotkiewicz, P., Kolinski, A. & lies, phyletic distribution and evolutionary Rychlewski, L. (2001) Three-dimensional trajectories. Nucleic Acids Res. 28, modeling of the I-TevI homing endonuclease 3417–3432. catalytic domain, a GIY-YIG superfamily member, using NMR restraints and Monte 148. Kvaratskhelia, M., Wardleworth, B.N., Nor- Carlo dynamics. Protein Eng. 14, 717–721. man, D.G. & White, M.F. (2000) A conserved nuclease domain in the Archaeal Holliday 157. Sutherland, E., Coe, L. & Raleigh, E.A. junction resolving enzyme Hjc. J. Biol. Chem. (1992) McrBC: A multisubunit GTP-de- 275, 25540–25546. pendent restriction endonuclease. J. Mol. Biol. 225, 327–348. 149. Bujnicki, J.M. & Rychlewski, L. (2001) Grouping together highly diverged PD-(D/ 158. Janscak, P., MacWilliams, M.P., Sandmeier, E)XK nucleases and identification of novel U., Nagaraja, V. & Bickle, T.A. (1999) DNA superfamily members using struc- translocation blockage, a general mecha- ture-guided alignment of sequence profiles. nism of cleavage site selection by type I re- J. Mol. Microbiol. Biotechnol. 3, 69–72. striction enzymes. EMBO J. 18, 2638–2647.

150. Bujnicki, J.M. & Rychlewski, L. (2001) Identi- 159. Panne, D., Raleigh, E.A. & Bickle, T.A. (1999) fication of a PD-(D/E)XK-like domain with a The McrBC endonuclease translocates DNA novel configuration of the endonuclease ac- Vol. 48 Restriction-modification systems 965

in a reaction dependent on GTP hydrolysis. J 170. Patel, S. & Latterich, M. (1998) The AAA Mol. Biol. 290, 49–60. team: Related ATPases with diverse func- tions. Trends Cell Biol. 8, 65–71. 160. Yuan, R. (1981) Structure and mechanism of multifunctional restriction endonucleases. 171. Neuwald, A.F., Aravind, L., Spouge, J.L. & Annu. Rev. Biochem. 50, 285–319. Koonin, E.V. (1999) AAA+: A class of chape- rone-like ATPases associated with the assem- 161. Gorbalenya, A.E. & Koonin, E.V. (1991) bly, operation and disassembly of protein Endonuclease (R) subunits of type-I and complexes. Genome Res. 9, 27–43. type-III restriction-modification enzymes contain a helicase-like domain. FEBS Lett. 172. Vale, R.D. (2000) AAA proteins. Lords of the 291, 277–281. ring. J Cell Biol. 150, F13–F19 162. Gorbalenya, A.E. & Koonin, E.V. (1993) 173. Panne, D., Muller, S.A., Wirtz, S., Engel, A. Helicases: amino acid sequence comparisons & Bickle, T.A. (2001) The McrBC restriction and structure-function relationships. Curr. endonuclease assembles into a ring struc- Opin. Struct. Biol. 3, 419–429. ture in the presence of G nucleotides. EMBO J. 20, 3210–3217. 163. Hall, M.C. & Matson, S.W. (1999) Helicase motifs: The engine that powers DNA unwind- 174. Tao, T., Bourne, J.C. & Blumenthal, R.M. ing. Mol. Microbiol. 34, 867–877. (1991) A family of regulatory genes associ- ated with type II restriction-modification 164. Korolev, S., Yao, N., Lohman, T.M., Weber, systems. J. Bacteriol. 173 , 1367–1375. P.C. & Waksman, G. (1998) Comparisons be- tween the structures of HCV and Rep heli- 175. Wintjens, R. & Rooman, M. (1996) Struc- cases reveal structural similarities between tural classification of HTH DNA-binding do- SF1 and SF2 super-families of helicases. Pro- mains and protein-DNA interaction modes. tein Sci. 7, 605–610. J. Mol. Biol. 262, 294–313. 165. Gast, F.U., Brinkmann, T., Pieper, U., 176. Rimseliene, R., Vaisvila, R. & Janulaitis, A. Kruger, T., Noyer-Weidner, M. & Pingoud, A. (1995) The Eco72IC gene specifies a (1997) The recognition of methylated DNA trans-acting factor which influences expres- by the GTP-dependent restriction sion of both DNA methyltransferase and endonuclease McrBC resides in the endonuclease from the Eco72I restriction- N-terminal domain of McrB. Biol Chem. 378, modification system. Gene 157, 217–219. 975–982. 177. Vijesurier, R.M., Carlock, L., Blumenthal, 166. Pieper, U., Schweitzer, T., Groll, D.H. & R.M. & Dunbar, J.C. (2000) Role and mecha- Pingoud, A. (1999) Defining the location and nism of action of C.PvuII, a regulatory pro- function of domains of McrB by deletion mu- tein conserved among restriction-modifica- tagenesis. Biol Chem. 380, 1225–1230. tion systems. J Bacteriol. 182, 477–487. 167. Panne, D., Raleigh, E.A. & Bickle, T.A. (1998) 178. Ives, C.L., Sohail, A. & Brooks, J.E. (1995) McrBs, a modulator peptide for McrBC activ- The regulatory C proteins from different re- ity. EMBO J. 17, 5477–5483. striction-modification systems can cross- complement. J. Bacteriol. 177 , 6313–6315. 168. Pieper, U., Schweitzer, T., Groll, D.H., Gast, F.U. & Pingoud, A. (1999) The GTP-binding 179. Nakayama, Y. & Kobayashi, I. (1998) Restric- domain of McrB: more than just a variation tion-modification gene complexes as selfish on a common theme? J. Mol. Biol. 292, gene entities: Roles of a regulatory system in 547–556. their establishment, maintenance and apo- ptotic mutual exclusion. Proc. Natl. Acad. 169. Confalonieri, F. & Duguet, M. (1995) A Sci. U.S.A. 95, 6442–6447. 200-amino acid ATPase module in search of a basic function. BioEssays 17, 639–650. 966 J.M. Bujnicki 2001

180. Campbell, A. (1994) Comparative molecular NspHI, SacI, Sca I and SapI restric- biology of lambdoid phages. Annu. Rev. tion-modification systems in Escherichia coli. Microbiol. 48, 193–222. Mol. Gen. Genet. 260, 226–231. 181. Kobayashi, I. (1998) Selfishness and death: 190. Twomey, D.P., McKay, L.L. & O’Sullivan, Raison d’etre of restriction, recombination D.J. (2000) Molecular characterization of the and mitochondria. Trends Genet. 14, Lactococcus lactis LlaKR2I restriction-modi- 368–374. fication system and effect of an IS982 ele- ment positioned between the restriction and 182. O’Sullivan, D.J. & Klaenhammer, T.R. (1998) modification genes. J. Bacteriol. 180,5844– Control of expression of LlaI restriction in 5854. Lactococcus lactis. Mol. Microbiol. 27, 1009–1020. 191. Stankevicius, K., Povilionis, P., Lubys, A., Menkevicius, S. & Janulaitis, A. (1995) Clon- 183. Adams, G.M. & Blumenthal, R.M. (1995) ing and characterization of the unusual re- Gene pvuIIW: A possible modulator of Pvu II striction-modification system comprising endonuclease subunit association. Gene 157, two restriction endonucleases and one 193–199. methyltransferase. Gene 157, 49–53. 184. Karreman, C. & de Waard, A. (1988) Cloning 192. Roberts, R.J. & Macelis, D. (2001) and complete nucleotide sequences of the REBASE-restriction enzymes and type II restriction-modification genes of Sal- methylases. Nucleic Acids Res. 29, 268–269. monella infantis. J. Bacteriol. 170, 2527– 2532. 193. Kong, H., Lin, L.F., Porter, N., Stickel, S., Byrd, D., Posfai, J. & Roberts, R.J. (2000) 185. Brassard, S., Paquet, H. & Roy, P.H. (1995) A Functional analysis of putative restric- transposon-like sequence adjacent to the tion-modification system genes in the AccI restriction-modification operon. Gene Helicobacter pylori J99 genome. Nucleic Acids 157, 69–72. Res. 28, 3216–3223. 186. Lee, K.F., Shaw, P.C., Picone, S.J., Wilson, 194. Vitkute, J., Stankevicius, K., Tamulaitiene, G.G. & Lunnen, K.D. (1998) Sequence com- G., Maneliene, Z., Timinskas, A., Berg, D.E. parison of EcoHK31I and EaeI restric- & Janulaitis, A. (2001) Specificities of eleven tion–modification systems suggest an different DNA methyltransferases of Helico- intergenic transfer of genetic material. Biol. bacter pylori strain 26695. J Bacteriol. 183, Chem. 379, 437–442. 443–450. 187. Vaisvila, R., Vilkaitis, G. & Janulaitis, A. 195. Nobusato, A., Uchiyama, I., Ohashi, S. & (1995) Identification of a gene encoding a Kobayashi, I. (2000) Insertion with long tar- DNA invertase-like enzyme adjacent to the get duplication: A mechanism for gene mobil- PaeR7I restriction-modification system. ity suggested from comparison of two re- Gene 157, 81–84. lated bacterial genomes.Gene 259, 99–108. 188. Anton, B.P., Heiter, D.F., Benner, J.S., Hess, 196. Chinen, A., Uchiyama, I. & Kobayashi, I. E.J., Greenough, L., Moran, L.S., Slatko, (2000) Comparison between Pyrococcus B.E. & Brooks, J.E. (1997) Cloning and char- horikoshii and Pyrococcus abyssi genome se- acterization of the BglII restriction-mo- quences reveals linkage of restriction-mo- dification system reveals a possible evolu- dification genes with large genome poly- tionary footprint. Gene 187, 19–27. morphisms. Gene 259, 109–121. 189. Xu, S.Y., Xiao, J.P., Ettwiller, L., Holden, M., 197. Claus, H., Friedrich, A., Frosch, M. & Vogel, Aliotta, J., Poh, C.L., Dalton, M., Robinson, U. (2000) Differential distribution of novel D.P., Petronzio, T.R., Moran, L., Ganatra, M., restriction-modification systems in clonal Ware, J., Slatko, B. & Benner, J. (1998) Clon- lineages ofNeisseria meningitidis. J Bacteriol. ing and expression of the ApaLI, NspI, 182, 1296–1303. Vol. 48 Restriction-modification systems 967

198. Lilley, D.M. & White, M.F. (2000) Resolving 203. Schottler, S., Wenz, C., Lanio, T., Jeltsch, A. the relationships of resolving enzymes. Proc. & Pingoud, A. (1998) Protein engineering of Natl. Acad. Sci. U.S.A. 97, 9351–9353. the restriction endonuclease EcoRV-struc- ture-guided design of enzyme variants that 199. Todone, F., Weinzierl, R.O., Brick, P. & recognize the base pairs flanking the recog- Onesti, S. (2000) Crystal structure of RPB5, nition site. Eur. J. Biochem. 258, 184–191. a universal eukaryotic RNA polymerase sub- unit and transcription factor interaction tar- 204. Lanio, T., Jeltsch, A. & Pingoud, A. (2000) get. Proc. Natl. Acad. Sci U.S.A. 97, 6306– On the possibilities and limitations of ratio- 6310. nal protein design to expand the specificity of restriction enzymes: A case study employ- 200. Park, J., Teichmann, S.A., Hubbard, T. & ing EcoRV as the target. Protein Eng. 13, Chothia, C. (1997) Intermediate sequences 275–281. increase the detection of homology between sequences. J. Mol. Biol. 273, 349–354. 205. Radlinska, M. & Piekarowicz, A. (1998) Clon- ing and characterization of the gene encod- 201. Sauder, J.M., Arthur, J.W. & Dunbrack, R.L. ing a new DNA methyltransferase from Neis- (2000) Large-scale comparison of protein se- seria gonorrhoeae. Biol. Chem. 379, 1391– quence alignment algorithms with structure 1395. alignments. Proteins 40, 6–22. 206. Miller, M.D., Tanner, J., Alpaugh, M., 202. Lanio, T., Jeltsch, A. & Pingoud, A. (1998) Benedik, M.J. & Krause, K.L. (1994) 2.1 A Towards the design of rare cutting restric- structure of Serratia endonuclease suggests a tion endonucleases: Using directed evolution mechanism for binding to double-stranded to generate variants of EcoRV differing in DNA. Nat. Struct. Biol. 1, 461–468. their substrate specificity by two orders of magnitude. J. Mol. Biol. 283, 59–69.