A Thesis

entitled

Naturally-Occurring Fusion Between the Regulatory and Catalytic Components of Type

IIP Restriction-Modification Systems

by

Jixiao Liang

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Master of Science Degree in Biomedical Sciences

______Dr. Robert Blumenthal, Committee Chair

______Dr. Steve Patrick, Committee Member

______Dr. Jason Huntley, Committee Member

______Dr. Patricia R. Komuniecki, Dean College of Graduate Studies

The University of Toledo

December 2013

Copyright 2013, Jixiao Liang

This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of

Naturally-Occurring Fusion Between the Regulatory and Catalytic Components of Type IIP Restriction-Modification Systems

by

Jixiao Liang

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Sciences

The University of Toledo

December 2013

Restriction-modification (R-M) systems play key roles in controlling gene flow among and archaea, and their own genetic mobility depends critically on their regulation, but the regulation of these systems is poorly understood. The PvuII R-M system is a Type IIP R-M system in that the protective DNA methyltransferase (MTase) is a separate and independently-active protein from the potentially lethal restriction endonuclease (REase). PvuII is one of the best studied of the R-M systems that use a positive feedback regulatory loop, involving a transcriptional regulator called C protein, to delay expression of the REase relative to that of the MTase. This allows protective methylation of a new host cell’s DNA before the REase is produced. In searching for R-

M systems related to PvuII, in order to study evolution and variation of its regulatory system, a putative system was found in the genome sequence of the bacterium Niabella soli strain DSM 19437, in which the regulatory C protein and the REase are translationally fused. The hypothesis is that N. soli truly produces a fused C-R protein, and that it is active as both a REase and as an autogenous regulator. The genes for the N. soli R-M system were synthesized, produced and purified with affinity tags, and the

iii production of full-length C-REase fusion protein was confirmed. The dual activity of the fusion protein was determined by in vitro restriction of known DNAs, and in vivo transcriptional activation of a lacZ fusion to the promoter on which the C protein acts.

iv

This work is dedicated to my parents, Zhao-jun Liang and Gui-ying Xu for their love and support.

Acknowledgements

This thesis and the associated research would not have been possible without the ever-patient guidance of my mentor, Dr. Robert Blumenthal. I would like to express my sincere gratitude to my major advisor Dr. Robert Blumenthal for his continuous support of my graduate study and research, for his patience, encouragement, guidance and support. He recognizes my strength and weakness, which keep me motivated. I am also grateful for all his advice about life, career and everything else.

I would additionally like to thank my committee members, Dr. Jason Huntley and

Dr. Steve Patrick for their valuable time, constructive suggestions, and criticisms during my study.

Further, for her constant support as an instructor in lab and a friend in life, I would like to sincerely thank my lab mate Dr. Kristen Williams. Also, my friends Dr.

Guo-ping Ren and Dr. Gang Ren have offered me valuable advice and help on my experiments. Last but not least, I would like to thank all the students, faculty, and staff in the Medical Microbiology and Immunology Department. Thank you all!

vi

Table of Contents

Abstract ...... iii

Acknowledgements ...... vi

Table of Contents ...... vii

List of Figures ...... viii

Chapter 1: Literature Review ...... 1

Chapter 2: Materials and Methods ...... 13

Chapter 3: Results………………………………………………………………………..22

Chapter 4: Discussion and Conclusion ...... 33

References ...... 39

vii

List of Figures

Figure1 Complex formed by R.PvuII and its cognate DNA.

Figure2 PvuII R-M system control region.

Figure3 Structure of C. AhdI.

Figure4 Sequence of synthesized NsoJS138I R-M system.

Figure5 Vector map of constructed plasmids

Figure6 Alignment of CR fusion proteins orthologous to C.PvuII and R.PvuII.

Figure7 Test of CR fusion protein production.

Figure8 Test of CR fusion protein production.

Figure9 Assessment of REase activity in CR.NsoJS138I.

Figure10 Confirmation of specific digestion conditions.

Figure11 Assessment of C activity in CR.NsoJS138I.

Figure12 Possible interactions of C-REase fusion polypeptides.

viii Chapter 1

Literature Review

1. Restriction-modification (R-M) systems

The biological phenomenon of restriction and modification were first recognized in the early 1950s, and the first R-M system was cloned in E. coli in the late 1970s [1]. R-

M systems are present in the great majority of bacteria and archaea, with more than 3000 being found to date (most by detecting MTase gene sequences) [2]. As the term indicates, a typical R-M system comprises two activities: a restriction endonuclease (REase) that cleaves DNA at a target sequence, and a methyltransferase (MTase) that modifies the same sequence to protect it from the cognate REase [2]. Four broad types of R-M systems have been reported so far, each with unique characteristics, and the two enzymes have been combined into a single multi-subunit protein in some of the systems [3]. However in

Type IIP R-M systems, the REase and MTase separately execute their opposing intracellular enzymatic activities [3].

1.1 Restriction Endonuclease (REase)

The REase catalyzes the cleavage of double-stranded DNA, generally on both strands. REases recognize specific sequences on the target DNA, and the cleavage occurs

1 via hydrolysis of one phosphate-deoxyribose bond in the backbone of each DNA strand

[4]. Typically, such enzymatic activity takes place without energy input, but commonly requires Mg2+ or a similar divalent cation; some REases also require or are stimulated by,

ATP or S-adenosylmethionine (AdoMet) [5]. REases appear to come from very different backgrounds, and are difficult to identify from their sequences alone [6-8].

1.2 Modification Methyltransferase (MTase)

REase cleavage of DNA could be lethal to cells producing R-M systems. To protect endogenous DNA from REase, the paired (cognate) MTase catalyzes addition of a methyl group to one nucleotide in each strand of the recognition sequence, with the identities and positions varying from MTase to MTase [9]. AdoMet always serves as the methyl donor and is thus an essential cofactor for methylation [10]. The sensitivity of the

REase of R-M systems to methylation on the recognition sequences usually prevents cleavage of endogenous DNA. However, while cleavage can be prevented by the cognate methylation, noncognate methylation occurring elsewhere in the recognition sequence may or may not prevent the cleavage [11].

1.3 Types of restriction modification systems

R-M systems are classified based on enzyme composition and cofactor requirements, recognition sequence symmetry, and cleavage position [3, 12]. Because my research defines a new subtype of R-M system, in which the REase and regulatory C protein are fused, it is appropriate to describe the various known types of R-M systems.

2 1.3.1 Type I Systems

Type I systems are considered as the most complex R-M systems, as they consist of three polypeptides: R (Restriction), M (Modification) and S (Specificity). These form a complex that can both cleave and methylate DNA in an energy (ATP) dependent manner, and about half of the bacterial genomes contain closely linked-genes that are predicted to code for these three polypeptides, based on screening of the present database of complete genomic sequences [13]. Furthermore, the fact that cleavage occurs at a considerable distance away from the recognition site in most cases, makes it difficult to visualize the discrete bands by gel electrophoresis [14]. So these enzymes have substantial biological significance, but have not yet found major biotechnological uses.

1.3.2 Type II Systems

Type II systems are believed to be the simplest and most prevalent R-M systems.

As opposed to type I systems, Type II REase and MTase act independently without the need of a specificity protein, and each has its own simple catalytic requirement: REase requires Mg2+ (or similar divalent cation) and MTase requires AdoMet [14]. Type IIP

REases are generally active after they dimerize and form homodimers, while most Type

II MTases only form monomers for catalyzing the addition of methyl groups to the cognate DNA [14, 15]. Early on it was recognized that while typical Type II enzymes recognized palindromic sequences and cleaved symmetrically within them, the Type IIS enzymes cut outside their normally asymmetric sequences and differed in other interesting ways [16]. There are many subdivisions of Type II enzymes, classified based on their recognition and cleavage differences [3]. Specifically, some of the criteria are

3 based on the sequence cleaved and others on the structure of the enzymes themselves, so not all subdivisions are mutually exclusive [3]. Type IIP designates the enzymes that recognize symmetric sequences (palindromes) [3]. Some new subclasses of Type II R-M systems involve fusion of components, such as between the REase and MTase [17-20].

1.3.3 Type III and Type IV Systems

Type III MTase and REase form a complex of modification and cleavage [21].

Similar to Type II systems, Mg2+ and AdoMet are essential cofactors for Type III REase and MTase, respectively; and in the presence of such cofactors, a complex formed from

REase and MTase competes internally for modifying and restricting at the same DNA position [22]. As a consequence, incomplete digestions are typical [14]. The Type IV

REases cleave only modified DNA, which consist of methylated, hydroxymethylated and glucosyl-hydroxymethylated bases [3]. However, their recognition sequences have usually not been well defined except for EcoK-McrBC, and cleavage occurs at ~30 bp away from one of the sites [3] The Escherichia coli McrBC enzyme, the best studied of the type IV REases and the only one that is commercially available, requires two purine methylcytosine/hydroxymethylcytosine sites separated by 40–3000 base pairs for cleavage [23].

2. Roles and Control of R-M systems

One major function of R-M systems is to protect bacterial cells from bacteriophage infection or invasion by foreign DNA [24]. In addition to being bacterial defense systems, R-M systems manifest themselves in a diverse range of functions such

4 as stabilization of genomic islands, maintenance of bacterial fitness and nutrition, immigration control, recombination and genome rearrangement, evolution of genomes, enforcing methylation on the genome and so forth [25].

Lethal DNA damage would occur if the two R-M enzymatic activities (MTase and REase) were unbalanced [26]. This is particularly true when R-M genes first enter a new host cell that has completely unmethylated DNA [27]. Therefore, a timing delay between expression of the MTase and REase is theoretically believed to occur in Type

IIP systems, and this has been documented to occur in PvuII [28]. Specifically, there is a

~10-min delay between the appearance of MTase and REase transcripts and activities

[28]. This boosts our understanding of the mobility of R-M systems.

3. PvuII R-M system and its regulatory characteristics

3.1 Overview of PvuII

PvuII was discovered [29] and then cloned into E. coli from its original host

Proteus vulgaris about three decades ago [30]. Since then, it has been subjected to many regulatory studies [31-34]. This system was also the first R-M system to have had both the REase [35, 36], and MTase [37] structures crystallographically determined. Because this study reports a REase-C protein fusion, it is important to discuss the structures of those two components.

5 3.2 Structure and function of PvuII restriction endonuclease

Figure 1. Complex formed by R.PvuII and its cognate DNA. In this

view, the enzyme is in ribbons representation in purple, with the DNA strands in

green and cyan. The amino termini of the two REase subunits are at the right. The

image is structure 1EYU of the Protein Data Bank (managed by the Research

Collaboratory for Structural Biology). The image is in the public domain.

With the application of X-ray crystallography, the molecular structure of active

PvuII endonuclease has been identified as a homodimeric protein, with the subunit interface region consisting of a pseudo three-helix bundle at the amino end [35]. Three regions have been determined in R.PvuII, namely the subunit interface region, catalytic

6 region and DNA recognition region. The recognition sequence for R. PvuII cleavage is

CAG↓CTG, and such cleavage is prevented by N4-methylcytosine (yielding

CAGN4mCTG), generated by its cognate methyltransferase [27].

3.3 C- protein and its regulatory roles

3.3.1 Overview

In addition to the MTase and REase genes, a subset of type II R-M systems contains regulatory genes. The regulatory C (controller) gene was first discovered in the

PvuII [38, 39] and BamHI [40] R-M systems. A milestone in characterizing the PvuII system is the identification of a regulatory element called “C-Boxes” between the pvuIIM and pvuIIR genes, exerting the time-control for the expression of REase and MTase [28,

39, 41]. C boxes are where the C protein binds to exert its effects [31]. While the location of the MTase gene varies among R-M systems, in those that have C proteins the C gene is typically upstream of the REase gene [31].

Figure 2. PvuII R-M system control region. Two transcription starts for

pvuIICR are identified by rightward bent arrows: from the C-independent weak

7 promoter (left) and C-dependent strong promoter (right) [38]. The two pvuIIM

promoters are also shown (leftward bent arrows). Gray wavy lines represent the

resulting mRNAs.

3.3.2 C protein-dependent regulatory circuit in PvuII

C proteins (encoded by C gene), where tested, activate transcription of their own gene (‘autogenous’ activation). They are believed to be responsible for the delay in

REase activity, since the REase gene typically does not have its own promoter [33] and is completely dependent on transcription from the upstream autogenously regulated C gene

[42]. Thus when the R-M genes enter a new cell, and no C protein is present, MTase is expressed while C protein (and REase) are initially produced at very low levels. As C protein accumulates, the positive feedback loop results in a sharp increase in C and

REase expression [33, 34]. The C protein acts as both as an activator and repressor, so it can prevent overexpression of the REase [43].

8 3.3.3 C protein structure Figure 3. Structure of C.

AhdI [44]. In this view, the

dimeric protein is in ribbons

representation. The image is

structure 1Y7Y of the

Protein Data Bank

(managed by the Research

Collaboratory for Structural

Biology). The image is in

the public domain.

Studies in Type II R-M systems have indicated that C proteins are only active when they become homodimers [44, 45]. The dimerization of C proteins is required for

DNA binding and, considering the relatively low stability of the dimer itself, this appears to be an important component of the genetic switch that delays transcription of the C- gene, and consequently that of the endonuclease (R) gene transcribed from the same promoter [46]. The regulatory C protein of another R-M system named AhdI has been crystallized [44], and a high-resolution crystal structure of C.AhdI was described two years later by the same group of scientists [47]. The high-resolution structure of C.AhdI reveals a compact, single-domain homodimer and can be classified as an all-alpha protein: 65% of the residues are in a helical conformation with no beta-sheet present [44]

(Figure 3).

9 4. CR fusion protein in Type II R-M systems

The PvuII R-M system is one of the best studied of the group that uses a positive feedback regulatory loop to delay restriction endonuclease (REase) expression with respect to DNA methyltransferase (MTase) expression [43], allowing protective methylation of a new host cell’s DNA before the REase is produced. To better understand the variation in and evolution of this regulatory system, I searched for other R-M systems closely related to PvuII. This work is described under Results, but a group of related systems had naturally-occurring fusions between the C and REase proteins. I provide here some background on the considerations underlying my studies on one of these fused systems. Gene fusion is a major contributor to the evolution of multi-domain bacterial proteins, that typically results in one long composite protein in one organism in place of two or more smaller split proteins in another organism [48, 49].

4.1 Identification of the CR fused Type II R-M systems

To search for R-M systems closely related to PvuII, the REase (R.PvuII) amino acid sequence was used as the search seed in TBLASTN [61]. This was done because the

C proteins are fairly well conserved [31, 33, 50], and the MTase proteins have well- conserved motifs [24, 51], so using them as search seeds would likely give a higher background of unrelated R-M systems. However , the generally poor conservation of

REases implies that only two closely related R-M systems would have similar REase sequences. One fused polypeptide with portions similar to both C.PvuII and R.PvuII was found in the bacterium Niabella soli.

10 4.1.1 Overview of Niabella soli

The Niabella was proposed by Kim et al. (2007) [52] for a bacterium isolated from soil. This genus was characterized as Gram-negative, aerobic, non- flagellated, flexirubin-pigment-producing bacteria that form short rods. Shortly after that, a dark yellow-colored bacterium, JS13-8T, was isolated from a soil sample from Jeju

Island, Republic of Korea [53]. The cells were aerobic, Gram-negative, non-motile, short rods. Growth occurred at 15–35 oC (optimally at 30 oC). On the basis of the phylogenetic, physiological and chemotaxonomic data, strain JS13-8T was deemed to represent a novel of the genus Niabella, for which the name Niabella soli sp. nov. was proposed

[53]. Subsequent to our discovery of a fused system in N. soli it was also detected by an automated sequence search by the curators of REBASE [2], which is a continuously updated R-M system database. We have adopted their nomenclature as NsoJS138I for this system, following their entry on April 10, 2013. They performed no biochemical characterization of the R-M system.

4.1.2 Translational frameshifting as a possible mechanism for production of free C protein in such fused systems

C.NsoJS138I and R.NsoJS138I are clearly fused at the sequence level, as described in Results. However, it is possible that a certain amount of free NsoJS138I C or

REase protein is produced via translational frameshifting or post-translational processing.

Post-translational processing could involve proteolytic cleavage that yields free C and free REase polypeptides. Alternatively, free C protein (but not free REase) could result from ribosomal frameshifting during translation, which can occur when a ribosome

11 encounters certain sequence patterns in the mRNA [54]. Translational frameshifting represents an alternative process of protein translation [55], and occurs much more frequently than was originally expected [56]. For instance, a study of ribosomal frameshifting on the sequence GCAAAA has shown that this pattern is associated with efficient -1 ribosomal frameshifting in Escherichia coli [57].

4.1.3 Novel demonstration of CR fusions in Type II R-M systems

Natural and synthetic fusions of the REase and MTase polypeptides have been observed, and found to be active [17-20]. However this thesis focuses on naturally- occurring fusions between the REase and the regulatory C protein. These have been suggested to occur by automated annotation systems, such as REBASE, but have never been tested and shown to be active for either the REase or the C protein components.

12 Chapter 2

Materials and Methods

Gene synthesis

The sequence containing the complete R-M system of Niabella soli (1837nt, from

NCBI database; GenBank accession # NZ_AGSA01000028) was obtained from

Genscript Inc.

(Piscataway, NJ). Some modifications were made to optimize the distribution of restriction sites, but without changing the specified amino acids (Figure 4). The inferred

NsoJS138I C-Box and promoter region (161nt) was also obtained from Genscript, and for cloning purposes the restriction sites XmaI (at C gene end) and BamHI (at R gene end) were appropriately placed .

Cloning strategy

The R-M system Mru1279I (~2.4 kbp) was cloned into the high-copy vector pUC19, using NruI (at CR gene end) and BamHI sites (at M gene end). Genscript synthesized the complete NsoJS138I system, but could only clone it into a low-copy number vector pCC1 (they normally use higher-copy pUC57). This presumably resulted from a frameshift error in the MTase gene that is due to an error in the requested

13 sequence. To avoid the apparent toxicity, a truncated version was subcloned, consisting of only the fused CR gene of NsoJS138I and missing a portion of the COOH-end of the

REase (so the MTase would not be required). The truncated NsoJS138ICR was cloned into the pACYCDuet-1 vector (Novagen®), with the N-terminus (C protein end) in-frame with the His-tag (using BamHI and SaI I sites), and preceded by a T7 promoter. This plasmid, pJL100, is referred to for readability as “pNsoShort”. Full length NsoJS138ICR was also cloned into this vector, by transforming an E. coli strain containing the pre- expressed PvuII MTase [58], with the NsoJS138ICR COOH-terminus (REase end) in- frame with the His-tag (using the NcoI site), named as pJL200 (“pNso”). The truncated product would be ~1.5 kDa less than the full length one.

The synthesized NsoJS138I “C-Box” region was digested with BamHI and XmaI and ligated into pBH403, which is a derivative of pKK232-8 and contains a promoterless lacZ gene between two bidirectional transcription terminators [59], making the pJL300

(“pBoxLac”). These plasmids are illustrated in Figure 5. The oligonucleotide primers used for PCR amplification are shown below (all in the 5’à3’ direction).

Primer set for cloning the complete Mru1279I R-M system:

ggtTCGCGActtccgggtctacacctcaa; ggtGGATCCagccctaaccagccgtaaat

Primer set for making the truncated NsoJS138ICR PCR product for pJL100:

aatGTCGACttatttgggattattaatatccttatcac; aatGGATCCgatgaacgaaccaaatgc

14 Primer set for making the full length NsoJS138ICR PCR product for pJL200: cgtCCATGGacaaaagtcttatgccat; cgtCCATGGatgaacgaaccaaatgctta

15

Figure 4. Sequence of synthesized NsoJS138I R-M system. The initiators of the

CR and M genes are in green. The red arrow near the top indicates the position at which the C-REase gene is interrupted in the truncated clone (pJL100, pNsoShort).

16 Figure 5. Vector map of constructed plasmids.

(A) pJL100 (“pNsoShort”), truncated version of the C-REase fusion with N-terminal His tag;

(B) pJL200 (“pNso”), full-length version of the C-REase fusion with COOH-terminal His tag (next page);

(C) pJL300 (“pBoxLac”), promoter-C-box region fused to promoterless lacZ reporter gene (next page).

A

17 B

C 18 Protein expression & purification

Twenty ng of both truncated and full-length versions of NsoJS138ICR DNA were transformed into a BL21 (DE3) E. coli strain (InvitrogenTM) that has isopropylthio-β-D- galactoside (IPTG) inducible T7 RNA polymerase expression. Overnight cultures of cells in stationary phase were subcultured into 250 mL (as per the QIAexpress® protocol for

His tagged protein purification) LB medium with a 1:20 dilution at 37oC. IPTG was added to a final concentration of 0.5 mM when the subculture cells reached mid-log phase (OD600~0.46). Cells were grown for another 2.5 h before being harvested by centrifugation and frozen at -80oC. The QIAexpress® Ni-NTA Fast Start Kit was used to purify 6xHis-tagged protein (under naïve condition). PMSF protease inhibitor was added

(final concentration of 0.5 mM) to the lysis buffer immediately before purification of full length NsoJS138ICR. Purified protein was added immediately to either Diluent B

(NEB#B8002S, for protection of REase activity) or 2x SDS PAGE sample buffer (1:1 solution), and stored at -20oC. Protein concentration was determined by the Pierce 660 nm Assay (Thermo Scientific).

Western blots

Purified proteins were separated by SDS-PAGE (Novex® 10~20% Tris-Glycine gradient gel), and either stained with standard Coomassie blue or blotted onto PVDF membranes at 30 V for 2 h using an Xcell apparatus (Invitrogen). For signal detection, membranes were blocked by incubation at 4oC overnight in 1% BSA-0.1% Tween-20 in

PBS, followed by incubation with a 1:1,000 dilution of mouse anti-His tag monoclonal antibody (Millipore) for 2 h at 4oC, followed by three 10-min washes. The blots were

19 then incubated with horseradish peroxidase (HRP)-conjugated goat anti-mouse IgG (1 :

15,000, Invitrogen) for 2 h at room temperature. After three 10-min washes, protein bands were visualized by ECL Plus enhanced chemiluminescence (GE Healthcare) and image captured using an Alpha Innotech FluorChem HD Imaging System. Minor adjustments of brightness and contrast were carried out to better visualize data, but in all cases the same manner of such changes were applied to the complete image panel as a whole. The pre-stained MW markers used were SeeBluePlus (Invitrogen).

Restriction activity assay

To assess the enzymatic activity of NsoJS138I REase, bacteriophage lambda

DNA (NEB#N3011S) was used as substrate. Restriction enzyme PvuII (NEB#R0151S) was used as a standard control, with the digestion pattern on lambda DNA already known. NsoJS138IR (2.36 µg) or 10 u PvuII were incubated with 1.5 µg of lambda DNA for 1 h at 27, 32, 37 or 42 °C, in four NEB standard buffers for each reaction, and the

DNA was resolved on 0.8% agarose gels. Empty pUC19 vector DNA (0.8 µg) was also used as substrate.

Assays for C protein activity

pJL300 was transformed into a IPTG inducible Tn7 E. coli DE3 strain (Lac-) carrying the NsoJS138IC RM system. The LacZ assay was based on hydrolysis of O- nitrophenyl-β-D-thiogalactoside using the Miller units as modified by others [60].

Briefly, β -galactosidase activity and culture density were measured at 20–30 min intervals during exponential growth. The units for this assay were calculated by dividing

20 the measured A420nm (released nitrophenol) by the time allowed for the reaction and by the volume of permeabilized cells used for the reaction. For plots vs. time, culture density

(OD600nm) was also in the denominator, yielding standard Miller units. For plots vs. culture density, this term was omitted from the denominator, yielding modified Miller units (1000 × ΔA420nm min-1 ml-1). Specific activity was obtained by determining the slope of a plot of LacZ activity versus the culture density via linear regression.

List of abbreviations used

HRP: horseradish peroxidase

IPTG: isopropylthio-β-D-galactoside

LacZ: β-galactosidase (product of lacZ gene)

MTase: modification DNA methyltransferase

OD: optical density

ONPG: O-nitrophenyl-β-D-thiogalactoside

PMSF: phenyl-methyl-sulfonylfluoride

REase: restriction endonuclease

RM: restriction-modification

SDS: sodium dodecylsulfate

21 Chapter 3

Results

1. Identification of C-REase-fused R-M systems

We are interested in how regulation of RM systems varies and evolves. To study this, our lab periodically searches Genbank for sequences related to the PvuII RM system, because many of our studies have focused on that system. As noted above

(section 1.1), REases are the most poorly-conserved component of RM systems, so using the R.PvuII amino acid sequence as the search seed ensures that only systems relatively closely related to PvuII will be identified. Sequence searching used TBLASTN [61], with default parameters.

A putative system in the genome sequence of the thermophilic bacterium

Meiothermus ruber Mru1279I was identified, in which the REase had a long amino- terminal extension relative to R.PvuII that, on further analysis, bore strong resemblance to a C protein (Figure 6). However, studying the catalytic and regulatory effects of the fusion in the Meiothermus system would likely be problematic using clones in the mesophile E. coli – not only the catalytic functions but also DNA binding and subunit associations are designed to work at much higher temperatures than E. coli can survive

22 [62]. By using the translated sequence of the Meiothermus C-R fusion as searching seed, another hit in an organism called Niabella soli has been found (Figure 6). The fact that N. soli has a mesophilic system makes it easier to perfrom in vivo studies in E. coli [53], and this type of special R-M system with C-R fusion protein will have unusual regulatory properties that will shed light on the regulation of type II R-M systems in general.

Figure 6. Alignment of CR fusion proteins orthologous to C.PvuII and

R.PvuII. The PvuII system (top line) is unfused, and shown for comparison. Species

sources are: Pvu (Proteus vulgaris), Mru (Meiothermus ruber) and Nso (Niabella

soli). Annotations refer to (in order) the transcriptional activation, DNA recognition

and dimerization interface portions of C protein, and the dimerization, catalytic, and

DNA methylation recognizing portions of REase, based on knowledge of the PvuII

system. Identities are shaded.

23 2. Testing for translational frameshifting (free NsoJS138I C protein production).

Considering the fact that no structural or functional studies have been done on C-

REase fused Type II R-M systems before, it is very possible that C protein might be produced separately. Translational frameshifting [63] happens much more frequently than was expected [54]. In particular, translational frameshifting in NsoJS138I is suggested by two features of the DNA sequence in the junction region (Figure 7A): one is a short sequence that has been associated with -1 translational frameshifts [55], and the other is a nearby stop codon in the -1 reading frame.

High-copy clones of the NsoJS138I R-M system were toxic (see Materials and

Methods), presumably due to a frameshift error in the MTase gene. To address this, a truncated version of NsoJS138ICR was subcloned, removing part of the REase COOH end (34 aa; see Figure 4 from Materials and Methods). To test for translational frameshifting, we added a His6 tag to the amino or carboxyl end of the fusion protein, expressed the tagged proteins from a strong inducible promoter, partially purified cell extracts on affinity columns, and resolved the column eluates on SDS-polyacrylamide gels. Figure 7B shows the Coomassie-stained gels next to western blots probed with anti-

His6 antiserum, while Figure 8 shows the amino-tagged protein isolated in the presence of protease inhibitor PMSF. Translational frameshifting would result in a ~9 kDa polypeptide in the extracts with amino-tagged fusion (the carboxyl-tagged fusion would only yield smaller protein in the case of proteolytic cleavage), and we see no evidence for that product. Nevertheless, we cannot rule out the possibility that frameshifting occurs in

24 the native host (N. soli), or in E. coli under different growth conditions than those we used.

Figure 7. Test of CR fusion protein production. (A) The sequence spanning the

C-REase junction has properties that might result in production of some free C protein (~

9 kDa). GCAAAAA has been associated with -1 ribosomal frameshifts (see text for references), and this would result here in termination at a nearby TGA triplet. (B)

Production of NsoJS138I C-REase fusion protein (~27.2 kDa), with an amino-terminal

(upper) or carboxyl-terminal (lower) His6 tag, was induced using a T7 RNA polymerase- dependent promoter (see Materials and Methods). The upper panels show the results from clones having a small carboxyl-terminal deletion (done in case the REase activity was

25 toxic; size ~25.6 kDa), while the lower panels show full-length clones. Centrifugally- clarified whole-cell extracts were passed over affinity columns to purify the His-tagged polypeptides, and resolved on duplicate 10-20% gradient acrylamide SDS gels. For the lower panels, the extracts were prepared in the presence of protease inhibitor. One gel of each pair was stained (left), the other was electroblotted and probed with anti-His-tag antiserum (see Materials and Methods). Loaded amounts of protein per lane were 2.0 µg

(upper, in both lanes), and 3.4, 6.8 and 5.1 µg (lower, left to right).

Figure 8. Test of CR fusion protein production. Lanes, from left included 4.5

µg protein, MW markers, a control (lysis buffer + PMSF), and 5.5 µg protein. Production of NsoJS138I C-REase fusion protein, with an amino-terminal His6 tag, was induced using a T7 RNA polymerase-dependent promoter (see Methods). The clone had a small carboxyl-terminal deletion (in case the REase activity proved to be toxic). Centrifugally- clarified whole-cell extracts, containing protease inhibitor PMSF, were passed over affinity columns and resolved on a 10-20% gradient acrylamide SDS gels. The gel was blotted to PVDF, blocked, and probed with HRP-conjugated anti-His tag antibodies. The

26 image on the left was detected using lights with 365/302 nm dual-wavelength for the visibility of markers ; the image on the right is from chemiluminescence alone .

3. Activity assay for assessing NsoJS138IR activity and specificity

The central question regarding these C-REase fusions is whether or not they are active. There are numerous examples of RM systems, identified through sequence comparisons, that do not produce catalytically active proteins [64, 65]. We focused on two of the fused RM systems, isolating the Meiothermus ruber Mru1279I genes by amplification from genomic DNA (not shown), and having the Niabella soli NsoJS138I genes synthesized. Full length NsoJS138ICR fragment has been ligated into pACYCDuet

(His-tag at C terminus, REase end) and cloned into a E. coli strain carrying the pre- expressed PvuII MTase (Figure 5B & Figure 7B lower panel).We were unable to detect

REase activity from the M. ruber clones (not shown), possibly due to poor expression in

E. coli and/or improper folding of the protein at the lower E. coli growth temperature (37

°C), though cell extracts were tested at the optimum for M. ruber growth (60 °C) [66].

The M. ruber clones were not studied further.

In contrast, extracts from E. coli cultures carrying the N. soli genes gave detectable REase activity (Figures 9, 10) that indicated a specificity indistinguishable from that of PvuII REase. However, the NsoJS138I C-REase fusion exhibited much more stringent activity requirements than R.PvuII when they were tested at four temperatures in each of four buffers (Figures 9, 10). For these studies, I used 10u of PvuII from a commercial supplier; this is equivalent to ~20 ng of PvuII REase protein [67]. In

27 comparison, 2.4 µg of NsoJS138I CR protein was used (~ 120x as much). R.PvuII was active in 15/16 tested conditions, while the fusion was active in 5/16 (Figure 9).

NsoJS138I was inactive in all four buffers at 27 and 42 °C, and was active in three buffers at 32° and two buffers at 37 °C (Figures 9 and 10). Serial dilution indicated that, at 32 °C, NsoJS138I was most active in NEBuffer 3 (not shown). Differences from

R.PvuII could be due to the presence of the fused C portion at the amino ends of each subunit, to the sequence differences between the PvuII and NsoJS138I REase portions, or a combination of the two factors. The C-terminal His6 tag might also play a role, though it has little effect on R.PvuII.

Figure 9. Assessment of REase activity in CR.NsoJS138I. CR.NsoJS138I and, for comparison, commercial R.PvuII, were incubated for 1h with DNA from

28 bacteriophage λ. Four different reaction buffers were used, and in each buffer four temperatures were used. Reactions were resolved on agarose-TBE gels containing ethidium bromide (see Materials and Methods for details). Inset: Left-to-right: markers, uncut pUC19, pUC19 cut with CR.NsoJS138I or PvuII at 37°, pUC19 cut with

CR.NsoJS138I at 30°.

Figure 10. Confirmation of specific digestion conditions. In some buffers,

R.PvuII or CR.NsoJS138I was active or inactive at a single temperature (as shown in

Figure 9). To confirm this temperature specificity, those two buffer-enzyme combinations were re-tested using 2 µg (upper lanes) or 1.5 µg (lower lanes) of bacteriophage λ DNA.

29 M = size markers; U = uncut DNA. The image is a UV - illuminated agarose gel containing ethidium bromide.

4. In vivo test of CR.NsoJ138I for C protein activity

Detection of C protein activity was via LacZ assays, using a new pBoxLac plasmid (see Figure 5 from Materials and Methods). This assumes that, as in PvuII, the C protein activates expression of its own promoter (in this case boosting beta-galactosidase activity) [31, 68-71]. The C protein operators, called C boxes, have recognizable sequences with symmetrical elements upstream of the C ORFs [41-43, 72]. Based on this,

I examined this region of the NsoJ138I sequence for putative bacterial promoters [73-75], with the best candidate shown in Figure 11B. This 161 bp sequence as shown in Figure

11B was cloned upstream of a reporterless lacZ gene, and transformed into an E. coli strain that also carried ∆lacZ and the nsoJ138ICR gene under control of T7 RNA polymerase (Figure 5C & Figure 11A). In this strain, IPTG induction leads to production of T7 RNA polymerase, which results in production of CR.NsoJ138I (Figure 7B). If

CR.NsoJ138I activates the putative promoter region, β-galactosidase (LacZ) activity will be increased. To detect this, first, IPTG was added to growing cultures with or without the promoter-lacZ fusion plasmid, and samples taken over time showed a clear induction when the fusion plasmid is present (Figure 11C). In addition, I also grew cultures under conditions approximating steady-state, where the IPTG (when present) was in the culture medium for at least 10 generations, and the slope of the activity vs. culture OD plot is a sensitive measure of expression. As shown in Figure 11D, I observed a 23-fold increase

30 in LacZ activity in response to production of CR.NsoJ138I. These results indicate that the fusion is active as a C protein.

Figure 11. Assessment of C activity in CR.NsoJS138I. A. Schematic design of experiment. Top line indicates IPTG-inducible gene for T7 RNA polymerase in the host strain’s chromosome, middle indicates a plasmid (called pNso for the figure) that carries the gene for 25CR.NsoJS138I linked to a T7 promoter, and the bottom indicates a plasmid (called pBox-Lac for the figure, and this is actually PNsoJS138ICR-lacZ from

Figure 5C) that carries the putative promoter and C box region from NsoJS138I linked to a promoterless gene for lacZ (β-galactosidase). B. Sequence of the putative promoter and

C box region from NsoJS138I, showing the candidate C boxes (shaded) and promoter elements (-35 and -10 hexamers, or tTGaCA and tAtRaTg). This 161 nt sequence is what was included in pBoxLac. C. Time course of LacZ induction. Growing triplicate cultures of cells containing the indicated plasmids were treated at time = 0 with the inducer IPTG

31 (which in these cells controls the gene for T7 RNA polymerase), and matched control cultures received no IPTG. LacZ activity was measured over time. The symbols indicate means of the triplicate cultures; standard errors are shown but mostly obscured by the symbols. D. Steady-state expression of lacZ. Triplicate cultures containing the plasmids indicated were grown for at least 10 generations in the presence or absence of IPTG, and

LacZ activity was measured. In this case, activity is plotted vs. culture density, and so modified Miller units are used. Cultures approximating steady-state growth should give good linear fits, the slopes of which accurately measure relative expression levels.

Symbols indicate means of the triplicate cultures; standard errors are shown but are in some cases obscured by the symbols.

32 Chapter 4

Discussion and Conclusion

While C-REase fusions have not previously been characterized, other types of

REase fusions have. One class, for example, involves natural and synthetic fusions of the

REase and MTase polypeptides [17-20]. This ability to form a variety of active fusions illustrates the remarkable flexibility and modularity of RM systems.

Formation of C-REase fusions

Ribosomes must maintain the translational reading frame for the sake of processing primary genetic information into polypeptides, but it is possible certain signals embedded into the mRNAs segments represent a higher order of information content, that being encountered by the ribosomes, can shift the original reading frame, thus alternating gene expression in a major way. The event of translational frameshift happens at an unexpectedly high frequency, and might play vital roles in gene regulation such as controlling transposition in bacteria (by affecting production of the transposase protein)

[76]. However, it is not clearly known whether this occurs at a fixed frequency in response to the genetic signal or, in other words, the signal sequence has not been proved

33 as the decisive factor here, and certain external conditions (e.g., growth media, temperature, pH and host cell) might significantly affect the rate at which it happens.

Despite the lack of evidence in the present study supporting the occurrence of translational frameshifting (see Figures 7 & 8), it still merits further investigation for several reasons. First of all, growth conditions can be altered to observe whether there might be some translational frameshifting. Furthermore, since the test was carried out with the COOH-truncated NsoICR, we cannot rule out the possibility, although it seems unlikely, that such frameshifting would occur in its full length native form. Last but not least, since this study was based on the use of a synthetic gene in E. coli as the host cell, we may want to obtain its native host N. soli, if possible, and repeat some studies in it.

Implications of C-REase fusions

The occurrence of active translational fusions between REases and regulatory proteins has not previously been reported. This is the first study aimed at understanding how the fusion of regulatory C proteins and restriction endonuclease, which are separately active in most Type II R-M systems, could result in the regulatory variation.

As opposed to MTase-REase fusions, both proteins in C-REase fusions (in the case of C and Type IIP REases) can only become active when they form dimers. Since in this study the C-REase fusion has been proved to be produced and both parts are active, certain advantage might be conferred by this arrangement, which results in three ways for the possible dimerization state of C protein and REase in such fusion event.

34

First, both subunit interfaces (C and REase portions) on one fusion polypeptide may dimerize at the same time with the interaction with those on a second fusion molecule (Figure 12B). However, this has to occur without violating the symmetry rules, and thus a linker region of sufficient length and flexibility is essential for achieving this.

Although there are no additional amino acids in the C-REase junctions of Mru

(CR.Mru1279I) and Nso (CR.NsoJ138I) (relative to PvuII; see Figure 6), it is still possible that both C and REase portions are able to dimerize simultaneously, if the carboxyl-terminal region of the C protein portion and/or the amino-terminal region of the

REase portion are sufficiently flexible.

Second, a concatemeric chain might be formed with alternating interfaces (C-

R•R-C•C-R…) (Figure 12D), and this assumes the two interfaces may dimerize with two different polypeptides. Although it is not clear whether such chain formation would confer any structural or functional advantage for such fusion proteins, this could still occur at higher protein concentrations, or if the two interfaces have similar Kd values.

However, the dimerization interface for R.PvuII is ~2300 Å2 [35, 77], while that for a C protein (C.AhdI) is ~1400 Å2 [44], suggesting that the REase portion may have a lower

Kd than the C protein portion.

The third model, which might be the most interesting one, is that the two portions effectively act as internal (first-order) competitive inhibitors of one another’s dimerizations (Figure 12C). This model assumes that either active C protein or active

35 REase will be formed at certain time with each of them being dimerized, respectively. If this is true, the competitive dimerization not only indicates that the two portions could not become active simultaneously, but also might have implications on the relative timing of MTase and REase appearance after the R-M system genes enter a new host cell. First, I consider effects on the appearance of REase activity. If the C interface is stronger than the R interface, active REase would only appear at later times, after a higher concentration of fused polypeptide had accumulated. On the other hand, a stronger REase interface (compared to the C interface) would result in the early appearance of small amounts of REase activity, which might require DNA repair. Second, I consider the effects on timing of gene expression. A stronger C interface (relative to REase) would result in an earlier and sharper induction threshold in the positive feedback loop; while a stronger REase interface may lead to the longer time for the positive feedback loop to cross the threshold for high expression of the fusion gene, giving more time for protective methylation to occur (even if some DNA damage results from the low level of early

REase activity). Either way, this competitive dimerization model seems to provide the most obvious (and testable) hypotheses of the three interaction modes for possible selective advantages of forming C-REase fusions.

The significance of this work manifests itself not only in identification and characterization of this novel Type II R-M system bearing the C-R fusion, but also provides a perspective of how the structural variation of certain proteins could have affected the functional regulation, thus contributing to the understanding of bacterial evolution.

36

Conclusions: R-M systems closely related to PvuII (as judged by similarity of the REase sequences) have diverse regulatory mechanisms. Most resemble PvuII in having a separate regulatory (C) protein, and one of these fusion proteins, from the bacterium

Niabella soli, is active both as a REase and as a C protein. Fusions between C proteins and REases have not previously been characterized. These results reinforce the evidence for modularity among RM system proteins, and raise important questions about the possible selective advantages of C-REase fusion, including implications of these fusions on RM system expression kinetics.

37 Figure 12. Possible interactions of C-REase fusion polypeptides. A. Unfused systems such as PvuII, where the C protein and REase form separate homodimers. B. Fused system in which the linker between C and REase regions of the polypeptide is long enough and flexible enough to allow simultaneous dimerization at both C and REase subunit interfaces. C. Fused system in which dimerization of the C portion is in competition with dimerization of the REase portion. D. Fused system in which concatameric chains can form. See text for details.

38

References

1. Mann, M. B., Rao , R. N., Smith, H. O.: Cloning of restriction and mod

ification genes in E. coli: The HhaJI system from Haemophilus haemolyticus.

Gene 1978 3:97 -112

2. Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE--a database for DNA

restriction and modification: enzymes, genes and genomes. Nucleic Acids Res.

2010 Jan;38(Database issue):D234-6.

3. Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, Blumenthal

RM, Degtyarev S, Dryden DT, Dybvig K et al: A nomenclature for restriction

enzymes, DNA methyltransferases, homing endonucleases and their genes.

Nucleic Acids Res 2003, 31(7):1805-1812.

4. Roberts , RJ: Restriction endonucleases. CRC Crit. Rev. Biochem. 1976 4: 1 23-

64

5. Srivani Sistla and Desirazu N. Rao: S-Adenosyl-L-methionine–Dependent

Restriction Enzymes. Critical Reviews in Biochemistry and Molecular Biology,

39:1–19, 2004

6. Kovall RA, Matthews BW: Type II restriction endonucleases: structural,

functional and evolutionary relationships. Curr Opin Chem Biol 1999,

3(5):578-583.

39 7. Pawlak SD, Radlinska M, Chmiel AA, Bujnicki JM, Skowronek KJ: Inference of

relationships in the 'twilight zone' of homology using a combination of

bioinformatics and site-directed mutagenesis: a case study of restriction

endonucleases Bsp6I and PvuII. Nucleic acids research 2005, 33(2):661-671.

8. Pingoud A, Fuxreiter M, Pingoud V, Wende W: Type II restriction

endonucleases: structure and mechanism. Cell Mol Life Sci 2005, 62(6):685-

707

9. Smith, H. 0., Kelly, SV: Methylases of the type II restriction modification

systems. In DNA Methylation: Biochemistry and Biological Significance, ed.

1984 A. Razin, H. Cedar, A. D. Riggs, pp. 39-7 1. New York: Springer-Verlag

10. Chiang PK, Gordon RK, Tal J, Zeng GC, Doctor BP, Pardhasaradhi K, McCann

PP: S-Adenosylmethionine and methylation. FASEB J. 1996 Mar;10(4):471-80.

11. McClelland, M., Nelson, M: The effect of site-specific methylation on

restriction endonucleases and DNA modification methyltransferases-a

review. Gene 1988 74:291- 304

12. Yuan, R., Smith, HO: The restriction and modification DNA methylases: an

overview. In DNA Methylation: Biochemistry and Biological Significance, ed. A.

Razin, H. Cedar, A. D. Riggs, pp. 73-80. New York: Springer-Verlag 1984

13. Murray NE: Type I restriction systems: sophisticated molecular machines (a

legacy of Bertani and Weigle). Microbiol Mol Biol Rev. 2000 Jun;64(2):412-34.

14. Wilson, G: "Restriction and Modification Systems," Annual Review of

Genetics (1991), 25:585-627.

40 15. Bickle, TA: The ATP-dependent restriction endonucleases. In Nucleases, ed.

S. M. Linn, R. J. Roberts , pp. 85- 108. Cold Spring Harbor, NY: Cold Spring

Harbor Lab. Press 1982

16. Szybalski, W. , Kim, S. c., Hasan, N., Podha jska, AJ: Class-lIs restriction

enzymes-a review. Gene. 1991

17. Zylicz-Stachula A, Bujnicki JM, Skowron PM: Cloning and analysis of a

bifunctional methyltransferase/restriction endonuclease TspGWI, the

prototype of a Thermus sp. enzyme family. BMC molecular biology 2009,

10:52.

18. Zylicz-Stachula A, Zolnierkiewicz O, Lubys A, Ramanauskaite D, Mitkaite G,

Bujnicki JM, Skowron PM: Related bifunctional restriction endonuclease-

methyltransferase triplets: TspDTI, Tth111II/TthHB27I and TsoI with

distinct specificities. BMC molecular biology 2012, 13:13.

19. Mokrishcheva ML, Solonin AS, Nikitin DV: Fused eco29kIR- and M genes

coding for a fully functional hybrid polypeptide as a model of molecular

evolution of restriction-modification systems. BMC Evol Biol 2011, 11:35.

20. Shen BW, Xu D, Chan SH, Zheng Y, Zhu Z, Xu SY, Stoddard BL:

Characterization and crystal structure of the type IIG restriction

endonuclease RM.BpuSI. Nucleic acids research 2011, 39(18):8223-8236.

21. Richard J. Roberts: Restriction enzymes and their isoschizomers. Nucleic

Acids Res. 1990 April 25; 18(Suppl): 2331–2365.

41 22. Bickle, TA: DNA restriction and modification systems. In Escherichia coli

and Salmonella typhimurium: Cellular and Molecular Biolo gy, ed. F. C.

Neidhardt, J. L. Ingraham , K. B. Low, B. Magasanik, M. Schaechter, H. E.

Umbarg er, pp. 692-96 . Washington, DC: Am. Soc. Microbiol. 1987

23. E A Raleigh, J Benner, F Bloom, H D Braymer, E DeCruz, K Dharmalingam, J

Heitman, M Noyer Weidner, A Piekarowicz, P L Kretz, et al: Nomenclature

relating to restriction of modified DNA in Escherichia coli. J Bacteriol. 1991

April; 173(8): 2707–2709.

24. Bujnicki JM, Radlinska M, Zaleski P, Piekarowicz A: Cloning of the

Haemophilus influenzae Dam methyltransferase and analysis of its

relationship to the Dam methyltransferase encoded by the HP1 phage. Acta

Biochim Pol. 2001;48(4):969-83.

25. Kommireddy Vasu and Valakunja Nagaraja: Diverse Functions of Restriction-

Modification Systems in Addition to Cellular Defense. Microbiol. Mol. Biol.

Rev. 2013, 77(1):53. DOI: 10.1128/MMBR.00044-12.

26. Handa N, Ichige A, Kusano K, Kobayashi I: Cellular responses to

postsegregational killing by restriction-modification genes. J. Bacteriol. 2000

182:2218 –2229.

27. Rice MR, Blumenthal RM: Recognition of native DNA methylation by the

PvuII restriction endonuclease. Nucleic Acids Res 2000, 28(16):3143-3150.

42 28. Mruk I, Blumenthal RM: Real-time kinetics of restriction-modification gene

expression after entry into a new host cell. Nucleic Acids Res 2008, 36(8):2581-

2593.

29. Gingeras TR, Greenough L, Schildkraut I, Roberts RJ: Two new restriction

endonucleases from Proteus vulgaris. Nucleic Acids Res 1981, 9(18):4525-

4536.

30. Blumenthal RM, Gregory SA, Cooperider JS: Cloning of a restriction-

modification system from Proteus vulgaris and its use in analyzing a

methylase-sensitive phenotype in Escherichia coli. J Bacteriol 1985,

164(2):501-509.

31. Tao T, Bourne JC, Blumenthal RM: A family of regulatory genes associated

with type II restriction-modification systems. J Bacteriol 1991, 173(4):1367-

1375.

32. Adams,G.M. and Blumenthal, RM: The PvuII DNA (cytosine-N4)-

methyltransferase comprises two trypsin-defined domains, each of which

binds a molecule of S-adenosyl-L-methionine. Biochemistry, 36, 8284–8292.

1997

33. Mruk I, Rajesh P, Blumenthal RM: Regulatory circuit based on autogenous

activation- repression: roles of C-boxes and spacer sequences in control of

the PvuII restriction- modification system. Nucleic Acids Res 2007,

35(20):6935-6952.

43 34. Williams K, Savageau MA, Blumenthal RM: A bistable hysteretic switch in an

activator- repressor regulated restriction-modification system. Nucleic acids

research 2013.

35. Cheng X, Balendiran K, Schildkraut I, Anderson JE: Structure of PvuII

endonuclease with cognate DNA. The EMBO journal 1994, 13(17):3927-3935.

36. Horton JR, Nastri HG, Riggs PD, Cheng X: Asp34 of PvuII endonuclease is

directly involved in DNA minor groove recognition and indirectly involved in

catalysis. J Mol Biol. 1998 Dec 18;284(5):1491-504.

37. Gong W, O'Gara M, Blumenthal RM, Cheng X: Structure of pvu II DNA-

(cytosine N4) methyltransferase, an example of domain permutation and

protein fold assignment. Nucleic Acids Res 1997, 25(14):2702-2715.

38. Knowle D, Lintner RE, Touma YM, RM Blumenthal: Nature of the promoter

activated by C.PvuII, an unusual regulatory protein conserved among

restriction-modification systems. J Bacteriol. 2005 Jan;187(2):488-97.

39. Tao, T., and R. M. Blumenthal: Sequence and characterization of pvuIIR, the

PvuII endonuclease gene, and of pvuIIC, its regulatory gene. J. Bacteriol.

1992 174:3395–3398.

40. Sohail A, Ives CL, JE Brooks: Purification and characterization of C.BamHI,

a regulator of the BamHI restriction-modification system. Gene. 1995 May

19;157(1-2):227-8.

44 41. Bart A, Dankert J, van der Ende A: Operator sequences for the regulatory

proteins of restriction modification systems. Mol Microbiol 1999, 31(4):1277-

1278.

42. Vijesurier RM, Carlock L, Blumenthal RM, Dunbar JC: Role and mechanism of

action of C. PvuII, a regulatory protein conserved among restriction-

modification systems. J Bacteriol 2000, 182(2):477-487.

43. Mruk I, Blumenthal RM: Tuning the relative affinities for activating and

repressing operators of a temporally regulated restriction-modification

system. Nucleic Acids Res 2009, 37(3):983-998.

44. McGeehan JE, Streeter SD, Papapanagiotou I, Fox GC, Kneale GG: High-

resolution crystal structure of the restriction-modification controller protein

C.AhdI from Aeromonas hydrophila. J Mol Biol 2005, 346(3):689-701.

45. McGeehan JE, Streeter SD, Thresh SJ, Ball N, Ravelli RB, Kneale GG:

Structural analysis of the genetic switch that regulates the expression of

restriction-modification genes. Nucleic Acids Res 2008, 36(14):4778-4787.

46. S. D. Streeter, I. Papapanagiotou, J. E. McGeehan and G. G. Kneale: DNA

footprinting and biophysical characterization of the controller protein

C.AhdI suggests the basis of a genetic switch. Nucleic Acids Research, 2004,

Vol. 32, No. 21 6445–6453 doi:10.1093/nar/gkh975

47. McGeehan,J.E., Streeter,S., Cooper,J.B., Mohammed, F., Fox,G.C. and

Kneale,GG: Crystallization and preliminary X-ray analysis of the controller

protein C.AhdI from Aeromonas hydrophilia. Acta Crystallogr. D Biol.

Crystallogr. 2004 60, 323–325.

45

48. Sophie Pasek, Jean-Loup Risler and Pierre Bre ́ zellec: Gene fusion/fission is a

major contributor to evolution of multi-domain bacterial proteins.

Bioinformatics Vol. 22 no. 12 2006, pages 1418–1423

49. Sarah K. Kummerfeld and Sarah A. Teichmann: Relative rates of gene fusion

and fission in multi-domain proteins. TRENDS in Genetics Vol.21 No.1

January 2005

50. Sorokin V, Severinov K, Gelfand MS: Systematic prediction of control

proteins and their DNA binding sites. Nucleic Acids Res 2009, 37(2):441-451.

51. Malone T, Blumenthal RM, Cheng X: Structure-guided analysis reveals nine

sequence motifs conserved among DNA amino-methyltransferases, and

suggests a catalytic mechanism for these enzymes. J Mol Biol 1995,

253(4):618-632.

52. Kim YJ, Kim MK, Bui TP, Kim HB, Srinivasan S, Yang DC: Solibius

ginsengiterrae gen. nov., sp. nov., isolated from soil of a ginseng field, and

emended description of the genus Sediminibacterium and of

Sediminibacterium salmoneium. Int J Syst Evol Microbiol. 2010 Dec;60(12).

53. Weon HY, Kim BY, Joa JH, Kwon SW, Kim WG, Koo BS: Niabella soli sp.

nov., isolated from soil from Jeju Island, Korea. Int J Syst Evol Microbiol.

2008 Feb;58(Pt 2):467-9

54. Sharma V, Firth AE, Antonov I, Fayet O, Atkins JF, Borodovsky M, Baranov

PV: A pilot study of bacterial genes with disrupted ORFs reveals a surprising

46 profusion of protein sequence recoding mediated by ribosomal frameshifting

and transcriptional realignment. Mol Biol Evol 2011, 28(11):3195-3211.

55. Ivanov IP, Atkins JF: Ribosomal frameshifting in decoding antizyme mRNAs

from yeast and protists to humans: close to 300 cases reveal remarkable

diversity despite underlying conservation. Nucleic acids research 2007,

35(6):1842-1858.

56. Mazauric MH, Licznar P, Prère MF, Canal I, Fayet O: Apical loop-internal loop

RNA pseudoknots: a new type of stimulator of -1 translational frameshifting

in bacteria. J Biol Chem. 2008 Jul 18;283(29):20421-32.

57. Gurvich OL, Baranov PV, Zhou J, Hammer AW, Gesteland RF, Atkins JF:

Sequences that direct significant levels of frameshifting are frequent in

coding regions of Escherichia coli. EMBO J. 2003 22:5941–5950.

58. Nastri HG, Evans PD, Walker IH, Riggs PD: Catalytic and DNA binding

properties of PvuII restriction endonuclease mutants. J Biol Chem. 1997 Oct

10;272(41):25761-7.

59. Paul L, Blumenthal RM, Matthews RG: Activation from a distance: Roles of Lrp

and integration host factor in transcriptional activation of gltBDF. J

Bacteriol. 2001 Jul;183(13):3910-8.

60. Platko JV, Willins DA, Calvo JM: The ilvIH operon of Escherichia coli is

positively regulated. Journal of bacteriology 1990, 172(8):4563-4570.

47 61. Gertz EM, Yu YK, Agarwala R, Schaffer AA, Altschul SF: Composition-based

statistics and translated nucleotide searches: improving the TBLASTN

module of BLAST. BMC Biol 2006, 4:41.

62. Tenreiro S, Nobre MF, da Costa MS: Thermus silvanus sp. nov. and Thermus

chliarophilus sp. nov., two new species related to Thermus ruber but with

lower growth temperatures. Int J Syst Bacteriol. 1995 Oct;45(4):633-9.

63. Baranov PV, Hammer AW, Zhou J, Gesteland RF, Atkins JF: Transcriptional

slippage in bacteria: distribution in sequenced genomes and utilization in IS

element gene expression. Genome Biol. 2005 6:R25.

64. Naderer M, Brust JR, Knowle D, Blumenthal RM: Mobility of a restriction-

modification system revealed by its genetic contexts in three hosts. J Bacteriol

2002, 184(9):2411-2419.

65. Aras RA, Takata T, Ando T, van der Ende A, Blaser MJ: Regulation of the

HpyII restriction- modification system of Helicobacter pylori by gene

deletion and horizontal reconstitution. Mol Microbiol 2001, 42(2):369-382.

66. Tindall BJ, Sikorski J, Lucas S, Goltsman E, Copeland A, Glavina Del Rio T,

Nolan M, Tice H, Cheng JF, Han C et al: Complete genome sequence of

Meiothermus ruber type strain (21). Stand Genomic Sci 2010, 3(1):26-36.

67. Dominguez MA Jr, Thornton KC, Melendez MG, Dupureur CM: Differential

effects of isomeric incorporation of fluorophenylalanines into PvuII

endonuclease. Proteins. 2001 Oct 1;45(1):55-61.

48 68. Bogdanova E, Djordjevic M, Papapanagiotou I, Heyduk T, Kneale G, Severinov

K: Transcription regulation of the type II restriction-modification system

AhdI. Nucleic Acids Res 2008, 36(5):1429-1442.

69. Cesnaviciene E, Mitkaite G, Stankevicius K, Janulaitis A, Lubys A: Esp1396I

restriction- modification system: structural organization and mode of

regulation. Nucleic Acids Res 2003, 31(2):743-749.

70. Ives CL, Nathan PD, Brooks JE: Regulation of the BamHI restriction-

modification system by a small intergenic open reading frame, bamHIC, in

both Escherichia coli and Bacillus subtilis. Journal of bacteriology 1992,

174(22):7194-7201.

71. Semenova E, Minakhin L, Bogdanova E, Nagornykh M, Vasilov A, Heyduk T,

Solonin A, Zakharova M, Severinov K: Transcription regulation of the EcoRV

restriction-modification system. Nucleic Acids Res 2005, 33(21):6942-6951.

72. Sorokin V, Severinov K, Gelfand MS: Systematic prediction of control

proteins and their DNA binding sites. Nucleic Acids Res 2009, 37(2):441-451.

73. Davis SE, Mooney RA, Kanin EI, Grass J, Landick R, Ansari AZ: Mapping E.

coli RNA polymerase and associated transcription factors and identifying

promoters genome-wide. Methods Enzymol 2011, 498:449-471.

74. Mendoza-Vargas A, Olvera L, Olvera M, Grande R, Vega-Alvarado L, Taboada

B, Jimenez- Jacinto V, Salgado H, Juarez K, Contreras-Moreira B et al: Genome-

wide identification of transcription start sites, promoters and transcription

factor binding sites in E. coli. PLoS One 2009, 4(10):e7526.

49 75. Shultzaberger RK, Chen Z, Lewis KA, Schneider TD: Anatomy of Escherichia

coli sigma70 promoters. Nucleic acids research 2007, 35(3):771-788.

76. Polard P, Prere MF, Chandler M, Fayet O: Programmed translational

frameshifting and initiation at an AUU codon in gene expression of bacterial

insertion sequence IS911. J Mol Biol. 1991 222:465–477.

77. Athanasiadis A, Vlassi M, Kotsifaki D, Tucker PA, Wilson KS, Kokkinidis M:

Crystal structure of PvuII endonuclease reveals extensive structural

homologies to EcoRV. Nat Struct Biol 1994, 1(7):469-475.

50