University of Connecticut OpenCommons@UConn

Doctoral Dissertations University of Connecticut Graduate School

8-5-2019 Characterizing Restriction-Modification Systems and DNA Methyltransferases in the Halobacteria Matthew Ouellette University of Connecticut - Storrs, [email protected]

Follow this and additional works at: https://opencommons.uconn.edu/dissertations

Recommended Citation Ouellette, Matthew, "Characterizing Restriction-Modification Systems and DNA Methyltransferases in the Halobacteria" (2019). Doctoral Dissertations. 2280. https://opencommons.uconn.edu/dissertations/2280 Characterizing Restriction-Modification Systems and DNA Methyltransferases in the Halobacteria Matthew Ouellette, Ph.D. University of Connecticut, 2019 Abstract The Halobacteria are a class of archaeal organisms which are obligate halophiles. Several studies have described the Halobacteria as highly recombinogenic, yet members even within the same geographic location cluster into distinct phylogroups, suggesting that barriers exist which limit recombination. Barriers to recombination in the Halobacteria include CRISPRs, glycosylation, and archaeosortases. Another possible barrier which could limit gene transfer might be restriction-modification (RM) systems, which consist of restriction endonucleases

(REases) and DNA methyltransferases (MTases) that both target the same sequence of DNA.

The REase cleaves the target sequence, whereas the MTase methylates the site and protects it

from cleavage. This dissertation examines the role of DNA methylation and RM systems in the

Halobacteria and their impact on halobacterial speciation and genetic recombination. Using the

model haloarchaeon, Haloferax volcanii, the genomic DNA methylation patterns (methylome) of

an archaeal organism was characterized for the first time. Further investigations via gene deletion

were used to identify the DNA methyltransferases responsible for the detected methylation

patterns in H. volcanii, and experiments were conducted to determine the impact of RM systems

on recombination via cell-to-cell mating. The distribution of MTase and RM system genes

throughout the Halobacteria was also examined using a bioinformatics approach. The results

indicated that the methylome of H. volcanii consists of two methylated motifs:

GCAm6BNNNNNNVTGC, methylated by the Type I RM system RmeRMS, and Cm4TAG,

methylated by the orphan MTase HVO_0794. Three other putative MTase genes (HVO_C0040,

HVO_A0079, and HVO_A0237) were also identified, but were not observed to contribute to the Matthew Ouellette, University of Connecticut, 2019 methylome. Results from mating experiments indicated that RM systems could potentially limit recombination via cell-to-cell mating in H. volcanii. Furthermore, RM system genes were observed to be patchily distributed among the Halobacteria, whereas orphan MTase gene families were more widespread and well-conserved. The results of these studies provide insight into the life cycle of RM systems in the Halobacteria and how they might contribute to halobacterial diversification and speciation. The RM deletion mutants of H. volcanii also have the potential to be useful in future work examining the role of RM systems and DNA methylation in the Halobacteria. Characterizing Restriction-Modification Systems and DNA Methyltransferases in the Halobacteria

Matthew Ouellette B.S., Post University, 2011

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Connecticut

i Copyright by: Matthew Ouellette

2019

ii APPROVAL PAGE

Doctor of Philosophy Dissertation

Characterizing Restriction-Modification Systems and DNA Methyltransferases in the

Halobacteria

Presented by:

Matthew Ouellette, B.S.

Major Advisor

______

R. Thane Papke, Ph.D.

Associate Advisor

______

Johann Peter Gogarten, Ph.D.

Associate Advisor

______

Daniel Gage, Ph.D.

Associate Advisor

______

Kenneth Noll, Ph.D.

Associate Advisor

______

Michael O’Neill, Ph.D.

University of Connecticut

2019

iii

This dissertation is dedicated to my father, Michael Ouellette, whose support and encouragement

was the greatest thing a son could ask for.

iv

Acknowledgements

I would like to thank my parents, Melanie and Michael Ouellette, and my brother Eric

Ouellette. My family has been the greatest blessing in my life, and this work would not have

been possible without their love and support. I would also like to thank my major advisor, Thane

Papke, whose guidance and creative ideas were essential to shaping my dissertation work and helping me to grow as a researcher. Thank you, also, to Andrea Makkay, Sean Pearson, Jess

Lajoie, Scott Chimileski, Nikhil Ram Mohan, and all current and former Papke Lab members, who provided a wonderful, friendly, and supportive work environment. I am also grateful to

Artemis Louyakis and Matt Fullmer of the Gogarten Lab for their support, friendship, and bioinformatics expertise. I am also thankful for the support of my associate advisors: Peter

Gogarten, Dan Gage, Ken Noll, and Mike O’Neill.

v

Table of Contents

Chapter 1. Introduction...... 1 1. Halobacteria ...... 1 1.1 Classification and of the Halobacteria ...... 1 1.2 A Brief History of Halobacteria Discoveries ...... 3 1.3 Habitats of Halobacteria ...... 4 1.4 Physiology of Halobacteria ...... 9 1.5 Genetics of Halobacteria ...... 13 1.6 Gene Transfer and Recombination in the Halobacteria ...... 16 2. DNA Methylation and Methyltransferases...... 20 2.1 What is DNA Methylation? ...... 20 2.2 What are DNA Methyltransferases? ...... 21 2.3 Strategies for Detecting DNA Methylation Patterns ...... 22 3. Restriction-Modification Systems ...... 24 3.1 What are Restriction-Modification Systems? ...... 24 3.2 A Brief History of RM System Discoveries in Bacteria ...... 25 3.3 Classes of RM Systems ...... 26 4. Orphan Methyltransferases ...... 28 4.1 What are Orphan Methyltransferases? ...... 28 4.2 DNA Adenine Methyltransferase (Dam) In Escherichia coli ...... 28 4.3 Cell-Cycle-Regulated Methyltransferase (CcrM) in Caulobacter crescentus ...... 30 4.4 Campylobacter Transformation System Methyltransferase (CtsM) in Campylobacter jejuni ...... 31 5. DNA Methylation and RM Systems in the ...... 32 5.1 RM Systems in Thermophilic Archaea ...... 32 5.2 RM Systems in Methanogenic Archaea ...... 35 5.3 RM Systems in the Halobacteria ...... 37 5.4 Domain-wide Investigation of Archaeal DNA Methylation and RM Systems ...... 38 6. Conclusion ...... 39 References ...... 41 Chapter 2. Dihydroxyacetone metabolism in Haloferax volcanii ...... 58

vi

Abstract ...... 59 Introduction ...... 59 Materials and Methods ...... 60 Strains and Growth Conditions ...... 60 PCR and DNA Isolation ...... 60 Gene Deletion in Hfx. volcanii ...... 60 Complementation of Deleted Genes ...... 62 DHA Growth Experiments ...... 62 Bioinformatics ...... 62 Results ...... 62 DHA Kinase is Patchily Distributed Among the Halobacteria ...... 62 Growth on DHA in Hfx. volcanii is Concentration Dependent ...... 62 DHA Kinase is Used in DHA Metabolism in Hfx. volcanii ...... 64 Glycerol Kinase is More Important than DHA Kinase ...... 64 Glycerol Kinase is Widely Distributed Among the Halobacteria ...... 65 Discussion ...... 66 Authors Contributions ...... 68 Acknowledgements ...... 68 References ...... 68 Chapter 3. Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006...... 70 Abstract ...... 71 Introduction ...... 72 Materials and Methods ...... 73 Strains and Growth Conditions ...... 73 Deletion of HVO_A0006 Gene ...... 73 DNA Preparation for PacBio Sequencing ...... 73 PacBio SMRT Sequencing ...... 73 Protein Domain Architecture Analysis and Multiple Alignment ...... 74 Results ...... 74 Characterization of DNA Methylation in Haloferax volcanii H26 ...... 74

vii

Deletion of HVO_A0006 Increases Specificity of m6A Motif and Reduces m6A Methylation ...... 75 HVO_A0006 is a Fragment Homologous to Larger, Multi-Domain RM Proteins ...... 75 HVO_A0006 contains a Conserved PD-(D/E)XK Nuclease Motif ...... 76 HVO_A0006 and HVO_A0237 are Flanked by Predicted Integrase and Transposase Genes ...... 76 Discussion ...... 76 Acknowledgements ...... 79 Supplementary Material ...... 79 References ...... 79 Chapter 4. Characterizing the DNA methyltransferases of Haloferax volcanii via bioinformatics, gene deletion, and SMRT sequencing ...... 82 Abstract ...... 83 1. Introduction ...... 83 2. Materials and Methods ...... 85 2.1 Strains and Growth Conditions ...... 85 2.2 Deletion of Annotation Restriction Modification Genes ...... 86 2.3 DNA Purification for Single-Molecule Real-Time Sequencing ...... 87 2.4 Single-Molecule Real-Time Sequencing ...... 87 2.5 Bioinformatics Analysis ...... 87 2.6 Haloferax volcanii Growth Experiments ...... 88 3. Results ...... 88 3.1 Bioinformatics Analysis Supports Identification of HVO_0794 as a Chromosomal 4mC CTAG MTase ...... 88 3.2 Bioinformatics Analysis Supports Identification of RmeM as a Type I 6mA MTase and RmeS as a Type I Specificity Subunit on the Chromosome ...... 92 3.3 Bioinformatics Analysis Supports Annotation of HVO_C0040 as a 5mC MTase and HVO_A0079 as a 6mA MTase ...... 94 3.4 Deletion of HVO_0794, HVO_A0006, and HVO_A0237 Eliminates 4mC Methylation and Does Not Effect 6mA Methylation ...... 97 3.5 Deletion of the rmeRMS Operon Abolishes 6mA Methylation ...... 97 3.6 Multi-RM Deletion Eliminates Detection of All DNA Methylation ...... 97 3.7 No Defect in Growth Occurs in the Multi-RM Deletion Compared to the Parental Strain ...... 98

viii

4. Discussion ...... 99 Supplementary Materials ...... 101 Acknowledgements ...... 101 Author Contributions ...... 101 Conflicts of Interest ...... 101 References ...... 101 Chapter 5. The Impact of Restriction-Modification Systems on Mating in Haloferax volcanii ...... 106 Abstract ...... 107 Introduction ...... 107 Materials and Methods ...... 111 Strains and Growth Conditions ...... 111 Deletion of trpA and hdrB Genes from ΔRM Strain ...... 112 Mating H53 and H98 Strains with RM ΔtrpA and RM ΔhdrB Strains ...... 113 Results ...... 114 Mating Efficiency is Lower When Mating RM-incompatible Strains ...... 114 Discussion ...... 116 References ...... 121 Chapter 6. The patchy distribution of restriction–modification system genes and the conservation of orphan methyltransferases in Halobacteria ...... 126 Abstract ...... 127 1. Introduction ...... 127 2. Materials and Methods ...... 129 2.1 Search Approach ...... 129 2.2 Reference Phylogeny ...... 129 2.3 Presence-absence Phylogeny ...... 130 2.4 Horizontal Gene Transfer Detection ...... 130 2.5 Data Analysis and Presentation ...... 130 3. Results ...... 131 3.1 RM-System Gene Distribution ...... 131 3.2 Horizontal Gene Transfer Explains Patchy Distribution ...... 136 4. Discussion ...... 139 5. Conclusions ...... 141

ix

Supplementary Materials ...... 141 Author Contributions ...... 141 Funding ...... 141 Acknowledgements ...... 141 Conflicts of Interest ...... 141 References ...... 141 Chapter 7. Conclusion ...... 146 References ...... 150 Appendix ...... 152 Requests to Re-Publish ...... 152

x

Chapter 1. Introduction 1. Halobacteria

1.1. Classification and Taxonomy of the Halobacteria

The Halobacteria comprise a group of archaeal organisms which need high salt

concentrations to survive. The Halobacteria require environments with at least 1.5 M salinity

(Thombre et al., 2016), but are able to live in environments at or near salt saturation (Fendrihan

et al., 2006). In the seventh edition of Bergey’s Manual of Determinative Bacteriology of 1957,

Halobacteria of the Halobacterium were classified as belonging to the bacterial order

Pseudomonadales, but based on their ribosomal RNA characteristics and the absence of murein,

these organisms were later identified as archaea (Magrum et al., 1978) and given their own

taxonomic class (Oren, 2012). This taxonomic class was named Halobacteria, and it contains the

orders , , and Natrialbales (Fendrihan et al., 2006; Gupta et al.,

2015). The order Halobacteriales contains the family , which was developed in the eighth edition of Bergey’s Manual of Determinative Bacteriology of 1974 to encompass the genera of Halobacterium and Halococcus, and which later would encompass as many as 36

different genera (Oren et al., 2009; Oren, 2012). A phylogenetic study by Walsh et al. (2004) used the RNA polymerase B’ gene (rpoB’) to analyze the phylogeny of 21 halobacterial , and observed that the DNA and protein trees formed two well-supported clades (clades I and II), with clade I consisting of genera such as Natrialba and Natronobacterium and clade II consisting of genera such as and Haloferax. These two clades were also supported by multi- locus sequence analysis (MLSA) by Papke et al. (2011) on 31 strains of Halobacteria, with little statistical support observed for relationships between other genera outside of these clades. In a

phylogenomic analysis by Gupta et al. (2015) using several highly-conserved proteins from over

1

100 halobacterial genomes, almost two thirds of halobacterial species were observed to group

separately into two distinct clades resembling clades I and II (clades A and B in this study), with each clade having unique signature molecular markers present only within members of those clades, suggesting unique evolutionary histories. Based on this result, it was proposed that these two clades should be recognized as separate orders from the Halobacteriales, which would now be organized as a polyphyletic group. Clade A would be named Natrialbales, containing the family Natrialbaceae, and Clade B would be named Haloferacales, containing the family

Haloferacaceae. Natrialbaceae would contain genera such as Natrialba, Natronococcus, and

Natronobacterium; whereas Haloferacaceae would consist of genera such as Haloferax,

Halorubrum, and Haloquadratum (Gupta et al., 2015). A later analysis by Gupta et al. (2016), which examined conserved signature insertions/deletions (CSIs) and conserved signature proteins (CSPs) of 129 sequenced halobacterial genomes, observed that several genera within the order Halobacteriales grouped into two clades (HB1 and HB2), and two distinct clades (HF1 and

HF2) were observed to occur within the order Haloferacales. Therefore, Halobacteriales was proposed to be divided into three families: clade HB1 would be named Haloarculaceae, which would include genera such as Haloarcula; clade HB2 would be named Halococcaceae, which would include genera such as Halococcus; and all other genera within Halobacteriales, such as

Halobacterium, would be classified into the polyphyletic family Halobacteriaceae. The analysis did not find strong phylogenetic and molecular evidence of the members of Halobacteriaceae being closely related to each other, and were grouped together only based on their dissimilarity to the other two families. Therefore, future work will need to focus on identifying reliable

genomic and molecular markers that can be used to accurately establish the phylogenetic

relationships of these remaining genera, although frequent horizontal gene transfer would make

2

finding reliable markers difficult. The order Haloferacales was proposed to be split into two families: clade HF1 would be named Haloferacaceae, which would include genera such as

Haloferax and Haloquadratum; and clade HF2 would be named Halorubraceae, which would include genera such as Halorubrum (Gupta et al., 2016).

1.2. A Brief History of Halobacteria Discoveries

The earliest evidence of halophilic microorganisms was the observation of a reddish color present in hypersaline environments (Grote and O'Malley, 2011). The use of sea salt as a preservative of fish in the nineteenth century resulted in the observation that the fish would develop a red coloration over time, leading to the conclusion that microbes were responsible for the coloration (Farlow, 1880; Grote and O'Malley, 2011). One of the earliest descriptions of halophilic prokaryotes was by Henrich Klebahn in 1919, in which he isolated an organism from salted cod (DasSarma et al., 2010). The isolate was described as consisting of rod-shaped cells which were Gram-negative and red in color. The isolated organism was named Bacillus halobius ruber (DasSarma et al., 2010). Another red, halophilic strain was isolated from salted cod by

Harrison and Kennedy (1922), and was named Pseudomonas salinara (Oren, 2012). Both

Bacillus halobius ruber and Pseudomonas salinara were later identified as the same species by

Helena Petter in 1931, who isolated the red halophile from both salted cod and sea salt and reclassified it as Bacterium halobium since it did not form spores (Houwink, 1956; Grote and

O'Malley, 2011). The genus name Halobacterium was later given to the species, along with other isolates, by Elazari-Volcani (1944), resulting in the species name Halobacterium halobium

(Houwink, 1956; Grote and O'Malley, 2011).

Other strains were later isolated and classified from various hypersaline environments, such as the Gram-negative, rod-shaped Halobacterium cutirubrum and Halobacterium

3

trapanicum from salterns in Trapani, Italy, which were later reclassified as Halobacterium salinarum and Halorubrum trapanicum (Grote and O'Malley, 2011). In a study by

Mullakhanbhai and Larsen (1975), a halophilic microbe from Dead Sea sediment was isolated which had a salt requirement of 1.7-2.5 M, relatively moderate compared to other Halobacteria discovered at the time. This organism was named Halobacterium volcanii, after Elazari-Volcani

(Mullakhanbhai and Larsen, 1975). The species was later reclassified as Haloferax volcanii

(Torreblanca et al., 1986), and it has become a very useful model organism for studying

Halobacteria in the laboratory (Allers and Ngo, 2003).

Until the classification of Domain Archaea by Carl Woese and George Fox (1977), the

Halobacteria were all assumed to be bacteria. However, a study by Magrum et al. (1978) on the

16S rRNA of Halobacterium halobium indicated that it did not group with the bacteria. Instead, it grouped with the archaea, indicating that H. halobium, and the other Halobacteria, are members of the Domain Archaea (Magrum et al., 1978). Therefore, the Halobacteria are also commonly referred to as , and some researchers have suggested that the taxonomic terminology should be officially changed to reflect this and avoid confusion (DasSarma and

DasSarma, 2008).

1.3. Habitats of Halobacteria

Halobacteria are found in many different hypersaline environments. Solar salterns are one type of well-studied hypersaline environment. The salterns are designed to precipitate salt from seawater via evaporation and consist of a series of ponds of salinities which increase in salt concentration up to saturation, making them ideal environments to study the effect of salinity on microbial diversity and ecology (Fernandez et al., 2014). Most studies on salterns have focused on the saturated crystallizer ponds, where the Halobacteria are observed to be the dominant

4

organisms (Anton et al., 2000; Pasic et al., 2005; Oh et al., 2010). The most dominant OTUs in these crystallizer ponds tend to belong to the genera Halorubrum or Haloquadratum. In a metagenomic study by Pasić et al. (2005) of a salt crystallizer pond in an Adriatic solar saltern, they observed that Halorubrum made up about 66% of the halobacterial community according to

16S rRNA analysis, whereas Haloquadratum was observed to make up over 40% of the archaeal communities in the Australian salt crystallizer ponds examined in a metagenomic analysis by Oh et al. (2010). One of the few eukaryotic organisms which can be found in these environments is the halophilic algae Dunaliella, which is the most dominant primary producer in these ecosystems (Oren, 2009b), while the bacteria most predominant in these environments typically belong to the halophilic Bacteroidetes genus Salinibacter (Anton et al., 2000; Anton et al.,

2002). While the crystallizer ponds of solar salterns are typically dominated by Halobacteria, the diversity varies depending on the salt concentration. In a study by Rodriguez-Valera et al.

(1985), the taxonomic distribution of populations from a solar saltern in Santa Pola, Alicante,

Spain, was examined in ponds ranging from 10% to over 50% salinity. The results indicated that, in ponds below 15% salinity, many taxa were observed which are typically found in seawater, such as Pseudomonas and Vibrio. However, above 15% salinity, halophilic taxa such as

Dunaliella and Halobacterium became more abundant, with only Halobacteria observed at ponds above 50% salinity (Rodriguez-Valera et al., 1985). In a metagenomic study by Benlloch et al.

(2002), 16S rRNA analyses were performed on samples obtained from a solar saltern (Bras del

Port) in Santa Pola in Alicante, Spain from ponds ranging between 4-37% salinity, with clone libraries focusing on ponds with 8%, 22%, and 32% salinity. The results indicated that, up 11% salinity, bacterial diversity was comparable to coastal seawater, with a low abundance of archaeal species. However, in intermediate- and high-salinity ponds, the bacterial communities

5

became more dominated by halophilic genera such as Salinibacter, whereas the Halobacteria

dominated the archaeal communities. Overall, the number of phylogenetic clusters decreased as

salinity increased, but a notable degree of microdiversity was still observed within those clusters

(Benlloch et al., 2002). In a metagenomic study by Fernández et al. (2014), the microbial

community compositions of samples acquired from ponds at Santa Pola with 19%, 33%, and

37% salinity were examined in comparison to the community structure of a sample from a 21%

salinity pond from Isla Cristina in Huelva, Spain. They observed that while the Halobacteria

were dominant in all samples, there was greater diversity in the 19% salinity pond from Santa

Pola. Only 46% of the 19% salinity sample consisted of OTUs from the phylum , with the rest of the community consisting of various bacterial phyla. In comparison, 84% of the community belonged to Euryarchaeota in the 21% Isla Cristina sample, as well as 89% in the

33% salinity Santa Pola sample. The 37% salinity Santa Pola sample was similar in composition to the 33% salinity sample, but consisted of only Euryarchaeota and Bacteroidetes. These results indicate that microbial diversity decreases as salinity increases, with Halobacteria dominating the most saline ponds (Fernandez et al., 2014).

Halobacteria are also found in natural hypersaline lakes and ponds. One of these lakes which has been well-studied is the Dead Sea, where Elazari-Volcani (1944) isolated various strains of Halobacteria and other halophilic microbes (Grote and O'Malley, 2011). The salinity of the surface water of the upper Dead Sea was ~34% as of 2015 (Jacob et al., 2017), but it has been increasing due to water from the Dead Sea being fed into nearby crystallizer ponds and source water from the Jordan River being diverted to other areas (Gavrieli et al., 1998). Like in solar salterns, Dunaliella species are present in the Dead Sea, such as Dunaliella parva, as the dominant primary producers of the community (Oren and Shilo, 1982). However, the salinity of

6

the Dead Sea in recent years has been measured at ~ 5.9 M (Rhodes et al., 2012), higher than the

upper limit for Dunaliella growth (~5.5 M) and much higher than its optimal salinity of ~2 M

(Chen et al., 2009). The most recent years during which the salinity was low enough in the

surface waters to allow for notable Dunaliella growth were 1980 and 1992 (Oren, 1993; Rhodes

et al., 2012). The Halobacteria population has remained successful throughout these changes,

although it has changed over time. A study by Rhodes et al. (2012) examined 16S rRNA libraries

from halobacterial blooms from the Dead Sea in 1992 and a residual 2007 population to

determine how well the 1992 population has persisted over time. They observed a number of

differences in the population, with the 1992 blooms being dominated by Halosarcina and

Natronococcus, while the 2007 residual population was dominated by Halorhabdus and

Natronomonas. These differences suggest that, while there are still traces of the 1992 blooms,

the residual population is not a reflection of the 1992 blooms and that the harsher interbloom

conditions have resulted in shifts in the population (Rhodes et al., 2012). A more recent study by

Jacob et al. (2017) examined the diversity of the Dead Sea population via high throughput

sequencing. The results indicated that the majority of the population (~52%) were Archaea, with

the most dominant OTUs belonging to the genera Halorhabdus and Natronomonas. However,

there was also a large fraction of Bacteria (~45%), consisting of genera such as Acinetobacter

and Bacillus (Jacob et al., 2017). The halobacterial genera observed to be dominant in the Dead

Sea differ from salt crystallizer ponds, which are typically dominated by Haloquadratum and

Halorubrum. These differences could be due to the Dead Sea being a natural environment, which has undergone various changes in its salinity and water chemistry over time, with the Jordan

River as its primary water source (Gavrieli et al., 1998). These conditions could support different

7

microbial communities than those found in the more artificial saltern environments which have

marine water as their primary water source.

Halobacteria have also been observed in the fluid inclusions of halite deposits. Many of these deposits were formed from the evaporation of ancient oceans (Lowenstein et al., 2001), and can range between 22,000 years old (Schubert et al., 2010a) to 419 million years old

(Satterfield et al., 2005). These crystals typically have fluid inclusions which were formed around the same time as the halite deposits, sometimes with various Halobacteria trapped inside which have been hypothesized to be as old as the halite (Sankaranarayanan et al., 2014; Jaakkola et al., 2016). Therefore, studying Halobacteria from ancient halite can provide unique insights

into the microbial ecology of ancient environments. In a study by Park et al. (2009),

halobacterial diversity was examined in DNA extracted from 23-million-year-old halite from

Spain, 121-million-year-old halite from Brazil, and 419-million-year-old halite from Michigan.

The 23-million-year-old sample was observed to consist of Haloarcula and Halorubrum DNA,

whereas the older samples consisted of Halobacterium and various unclassified groups of

Halobacteria. These results indicate that ancient hypersaline environments were composed

predominantly of Halobacterium and unknown Halobacteria, whereas Haloarcula and

Halorubrum have come to dominate more modern environments. In one unclassified group

closely related to the Halorubrum, the V2 region of the 16S rRNA sequences from the 121-

million-year old and 419-million-year samples contained a 55-bp fragment not present in the 23-

million-year-old sample or in modern Halobacteria, suggesting that a deletion of a part of the

16S rRNA occurred sometime between 121 and 23 million years ago in a dominant group of

Halobacteria which gave rise to the Halorubrum (Park et al., 2009). Living strains of

Halobacteria have also been isolated from ancient halite, such as Halobacterium and

8

Natronobacterium related strains isolated from 121-million-year-old halite by Vreeland et al.

(2007), which indicates that Halobacteria can survive for long periods within halite fluid

inclusions. However, concerns have been raised regarding the lack of independent replication of

some of these studies, as well as potential contamination of the ancient samples (Willerslev and

Cooper, 2004). A study on bacterial DNA isolated from frozen conditions also suggests that

ancient DNA can only last in the environment between ~400,000 to 1.5 million years under ideal

conditions (Willerslev et al., 2004). It is also currently unknown how the Halobacteria would be

able to survive for long periods in these extremely limited environments. A study by Fendrihan

et al. (2012) demonstrated that Halobacterium cells form into small spheres when exposed to

reduced water activity, similar in size and shape to viable particles of Halobacteria isolated from

ancient halite samples, suggesting that halobacterial cells may enter a dormant state that could

allow them to survive for extended periods isolated under extreme conditions. Many of these

Halobacteria have been observed to occur with remnants of Dunaliella cells (Schubert et al.,

2010b), and the glycerol produced by this halophilic alga has been hypothesized to be an

important carbon source for Halobacteria in these nutrient-limited environments (Jaakkola et al.,

2016).

1.4. Physiology of Halobacteria

One of the most notable features of the Halobacteria is that they are red, pink, or purple

in coloration, which results in hypersaline environments appearing reddish in color when these

organisms bloom (Oren et al., 1992; Oren and Dubinsky, 1994; Oren and Rodriguez-Valera,

2001). These colors are the result of carotenoid pigments such as C50 carotenoid and bacterioruberin produced by these organisms, which span the lipid bilayer of their membranes

(Kamekura, 1993; Rodrigo-Banos et al., 2015). These carotenoids help provide protection

9

against the harsh sunlight common to many habitats of the Halobacteria (Fendrihan et al., 2006).

In a study by Dundas and Larsen (1962), for example, a colorless mutant of H. salinarum was observed to grow more poorly when exposed to a high intensity tungsten light, which emits light at all visible wavelengths, indicating that the carotenoids help protect the cells from strong solar radiation. Carotenoid pigments from the Halobacteria have been observed to have absorbance maxima at 498 and 530 nm (Oren, 2009a), allowing them to absorb light within the emissions spectrum of tungsten light (Larrabee, 1959).

The Halobacteria have adapted to hypersaline environments through what is known as a

“salt-in” strategy, in which the cells maintain a high concentration of salt within their cytoplasm to match external salt concentrations and maintain osmotic balance (Oren, 1999). This is different from the “salt-out” strategy of other halophiles such as Dunaliella, which involves the production of a compatible solute such as glycerol to maintain equal osmotic pressure

(Borowitzka and Brown, 1974; Oren, 1999). The salt-in strategy of the Halobacteria typically requires that the cells maintain an acidic proteome so that the proteins maintain their solubility in the high-salt conditions within the cell (Fendrihan et al., 2006). The amino acid composition of halobacterial proteins was first analyzed quantitatively by Reistad (1970), in which the proteins of two types of Halobacteria, including Halobacterium, were compared to those of non- halophilic bacteria. The proteins from the halobacterial organisms were all observed to have a high percentage of acidic amino acids (aspartic and glutamic acid), indicating a highly acidic proteome (Reistad, 1970). The proteome of Halobacterium salinarum NRC-1 was analyzed computationally by Kennedy et al. (2001), in which they observed that the proteome had very few basic proteins and was highly acidic, with an isoelectric point of 4.2, which was more acidic than the other proteomes examined with the exception of Methanobacterium

10

thermoautotrophicum, an archaeon which has high internal concentration of potassium ions but

is not a halophile (Ciulla et al., 1994). Not all organisms that use a salt-in strategy have acidic proteomes, however. Members of the halophilic bacterial order Halanaerobiales, for example, use a salt-in strategy to adapt to hypersaline environments, but proteomic analyses have indicated that they do not have acidic proteomes (Elevi Bardavid and Oren, 2012; Oren, 2013). It is

possible, however, that organisms of this order use a combination of salt-in strategy and

compatible solutes, which might explain why their proteomes are less acidic than Halobacteria,

which use only salt-in. Although no evidence has been observed that Halanaerobiales produce

organic osmotic solutes, a sucrose phosphate gene has been identified in the genome of

Halothermotrix orenii (Mavromatis et al., 2009), which indicates that it might produce sucrose

as a compatible solute (Oren, 2013). Halanaerobiales also have a salt optimum of around 10-15% salinity (2.0-2.5 M)(Elevi Bardavid and Oren, 2012), lower than the optimum in many

Halobacteria, such as Haloferax mediterranei and Haloquadratum walsbyi which have salt optimums ranging between 17-36% (Oren and Hallsworth, 2014). Therefore, the internal salt concentration of Halanaerobiales cells might not be high enough to require an acidic proteome.

More research is needed to examine the unique properties of Halanaerobiales to determine how it

lives at high salinities without an acidic proteome.

Halobacteria can utilize a variety of carbon sources, such as glucose, fructose, galactose,

sucrose, xylose, glucuronate, and arabinose (Anderson et al., 2011). One important carbon source in these environments is glycerol, which becomes available in the external environment due to the leakage from organisms like Dunaliella which produce glycerol as a compatible solute

(Elevi Bardavid et al., 2008; Oren, 2017). As demonstrated in H. volcanii, the glycerol is then transported across the membrane and converted into glycerol 3-phosphate (G3P) by a glycerol

11

kinase (Sherwood et al., 2009), followed by conversion to dihydroxyacetone-phosphate (DHAP) by G3P dehydrogenase and incorporation of DHAP into the organism’s metabolic pathways

(Rawls et al., 2011). Another potentially important source of carbon for the Halobacteria is dihydroxyacetone (DHA), which has been hypothesized to be available either from leakage from

Dunaliella or as an overflow product of glycerol metabolism from Salinibacter. A study by Elevi

Bardavid and Oren (2008) detected via colorimetric assay the excretion of DHA by Salinibacter ruber, supporting the hypothesis that DHA could be a readily available carbon source of

Halobacteria. The DHA would then be transported into the cell and phosphorylated to DHAP by either a glycerol kinase or a DHA kinase (Anderson et al., 2011). Evidence of DHA metabolism has been observed in Haloquadratum walsbyi by Elevi Bardavid and Oren (2008), and Chapter 2 of this dissertation will present evidence of DHA metabolism in H. volcanii and nearly all other

Halobacteria.

Halobacteria are able to metabolize nitrogen via nitrate and nitrite reduction, in which nitrate is taken up and reduced to nitrite via a nitrate reductase, followed by reduction of nitrite or ammonium via a nitrite reductase (Bonete et al., 2008). These enzymes have been well- characterized in Haloferax mediterranei, which uses a ferredoxin dependent assimilatory nitrate reductase (Martínez-Espinosa et al., 2001a) and ferredoxin dependent assimilatory nitrite reductase (Martínez-Espinosa et al., 2001b). According to metagenomic data analyzed by

Fernández et al. (2014), the nitrogen cycle is more simplified in saltern environments, with fewer nitrate reductase and nitrite reductase sequences observed in their saltern samples.

Genes involved in the uptake of phosphorus sources such as phosphate and phosphonate have been identified in the Halobacteria, such as phoR, which is involved in detection of environmental phosphate, and phoB, which is involved in activating the phosphate regulon

12

involved in the transport and utilization of phosphate (Fernandez et al., 2014). However, in saltern environments, the high Mg+ concentrations limit the availability of inorganic phosphate

(Bolhuis et al., 2006; Fernandez et al., 2014). One potential source of phosphorus for the

Halobacteria is environmental DNA (eDNA). In a study by Chimileski et al. (2014), H. volcanii

grew well when supplemented with a carbon source, nitrogen source, and eDNA, indicating that

it is able to utilize the eDNA as a source of phosphorus. The current mechanism for uptake and

utilization of eDNA is unclear, although the extracellular, membrane-bound nuclease HVO_1477

is essential for eDNA catabolism. When the gene encoding for the nuclease is deleted from H.

volcanii, it is unable to grow on eDNA as a phosphorus source (Chimileski et al., 2014).

1.5. Genetics of Halobacteria

The genomes of most halobacterial species consist of multiple, GC-rich replicons. The

genome of H. salinarum NRC-1, the first halobacterial genome sequenced, was observed to have

a main chromosome of ~2 Mb with a GC-content of 68% and two plasmids of ~191 kb and ~365 kb with GC-contents ranging from 58-59% (Ng et al., 2000). The genomes of other sequenced halobacterial genomes have also been shown to have similarly high GC contents. Haloarcula marismortui contains a chromosome (~4.3 Mb) with a GC-content of 62% and 8 plasmids (~33-

410 kb) with GC-contents between 54-60% (Baliga et al., 2004). Natronomonas pharaonis has a chromosome (~2.6 Mb) with a GC-content of ~63% and 2 plasmids (~131 kb and ~23 kb) with

GC-contents of ~57% and ~61% (Falb et al., 2005). The chromosome of H. volcanii (~2.8 Mb) has a GC-content of ~67%, and its 4 plasmids (~6-636 kb) have GC-contents ranging between

55-67% (Hartman et al., 2010). The genome of H. walsbyi, however, has a lower GC-content than other halobacterial genomes, with both its chromosome (~3.1 Mb) and single plasmid (~47 kb) having GC-contents of ~48% (Bolhuis et al., 2006).

13

Many species of Halobacteria are polyploid, i.e., they contain multiple copies of their

chromosomes (Soppa, 2011; 2013). In H. volcanii, for example, each cell can have up to ~20 copies of their chromosome during exponential phase under favorable growth conditions

(Breuert et al., 2006). Polyploidy may have emerged in the Halobacteria as a nutrient storage strategy in order to horde extra phosphorus under nutrient-limiting conditions. In a study by

Zerulla et al. (2014), the chromosome copy number of H. volcanii cells was monitored in growth medium with and without a phosphorus source. They observed that chromosome copy number dropped from ~30 in preculture to ~2 in stationary phase after the cells were starved of phosphorus, indicating that the cells were cannibalizing their own chromosomes for phosphorus.

When the cells were given additional phosphorus after starvation, the chromosome copy numbers increased again to ~40 after 24 hours (Zerulla et al., 2014).

Unlike bacteria, the chromosomes of many halobacterial species have multiple origins of replication. A study by Coker et al. (2009) using whole-genome marker frequency analysis

(MFA) observed that the chromosome of H. salinarum NRC-1 contains 4 origins of replication.

A study by Norais et al. (2007) used various genetic, biochemical, and bioinformatics approaches on H. volcanii wild-type strain DS2 and identified at least 2 origins of replication on the main chromosome. A later study by Hawkins et al. (2013) on the replication profile of H. volcanii auxotrophic strain H26 indicated that the chromosome has 4 different origins of replication, with one coming from the replication origin of a plasmid that fused into the main chromosome. The chromosome of Haloferax mediterranei was observed to have 3 dominant origins of replication and 1 dormant origin according to replication profile analyses by Yang et al. (2015). Interestingly, in some species the origins of replication may not be essential for replication. Hawkins et al. (2013) deleted all of the origins of replication from H. volcanii H26

14

and observed that, not only were the strains viable, but they grew better than strains with origins

of replication. They observed that the origin-null strain could not grow without a functional radA

recombinase gene, indicating that the cells use homologous recombination in order to replicate

their chromosomes without origins of replication (Hawkins et al., 2013). However, this strategy

does not work for all halobacterial species, as Yang et al. (2015) were not able to obtain an

origin-null strain of H. mediterranei via gene deletion, indicating that having at least one replication origin is essential for DNA replication in H. mediterranei.

Genetic systems have been developed for a number of halobacterial systems, and the system in H. volcanii is one of the most well-developed and commonly-used systems in the

Halobacteria. A transformation system for H. volcanii was first described by Charlebois et al.

(1987), in which a polyethylene glycol (PEG)-mediated protocol had been developed from a

similar transformation protocol used in H. halobium. A similar PEG-mediated transformation

system was later described in an archaeal laboratory manual by Cline et al. (1995) which could effectively transform H. volcanii and other Halobacteria. The system was refined by Bitan-Banin et al. (2003), in which a uracil auxotrophic strain was produced which could be used alongside a plasmid containing the uracil biosynthesis gene pyrE2 as a marker gene to insert and delete genes in H. volcanii. Other auxotrophic strains were developed by Allers et al. (2004) which could be used with different selectable markers, such as the tryptophan biosynthesis gene trpA or the thymidine biosynthesis gene hdrB. A more streamlined approach of deletion strain development was later employed by Blaby et al. (2010) using the Infusion HD Cloning Kit from

Clontech to more rapidly construct deletion plasmids to be used for gene knockouts. This approach allowed for the production of 22 deletion strains, which were used to identify essential gene sets in H. volcanii (Blaby et al., 2010). This genetic system, along with access to a fully

15

sequenced genome (Hartman et al., 2010) and relative ease of growth compared to other strains

of Halobacteria, makes H. volcanii a good model system for studying the Halobacteria (Allers

and Ngo, 2003; Blaby et al., 2010).

1.6. Gene Transfer and Recombination in the Halobacteria

Halobacteria have been observed to be highly recombinogenic and engage in frequent

gene transfer (Papke et al., 2007; Fullmer et al., 2014). The origin of the Halobacteria has been

hypothesized to be the result of horizontal gene transfer (HGT) of bacterial genes into a

methanogen. A study by Nelson-Sathi et al. (2012) examined 10 halobacterial genomes and over

1000 bacterial reference genomes to determine how many genes were acquired by the common

ancestor of the Halobacteria. They identified over 1000 gene families which were of bacterial

origin that were transferred into the methanogenic common ancestor of the Halobacteria, with

functions such as carbon metabolism, membrane transporters, and the bacterial respiratory chain.

These results indicate that extensive HGT and recombination was essential in the early evolution

of the Halobacteria from the methanogens (Nelson-Sathi et al., 2012). However, the extent to which divergence of the Halobacteria from the methanogens was driven by gene transfer from bacteria was challenged by Groussin et al. (2015), who suggested that the methodology used in a follow-up study by Nelson-Sathi et al. (2015) overestimated the number of bacterial acquisitions

at the root of the Halobacteria and other archaeal lineages. A re-analysis of the data using a

maximum likelihood (ML) approach indicated that many of the bacterial genes were acquired

after the origin of the archaeal phyla, suggesting that gene transfer and acquisition from the

Bacteria has occurred gradually and continuously throughout the evolutionary history of the

Halobacteria and other archaeal lineages rather than mostly occurring early in their evolutionary

histories (Groussin et al., 2015). A more recent study of haloarchaeal genomes that identified

16

naturally occurring “chimeric” genes found similar results. Network analyses of archaeal proteins and gene families by Méheust et al. (2018) indicated that gene remodeling in 320 different composite genes emerged both early in the evolution of the Halobacteria and later in specific groups, including 126 which were derived from bacterial genes and were distinct from the bacterial genes identified by Nelson-Sathi et al. (2012) (Méheust et al., 2018). Overall, these studies indicate that not only did gene transfer occur early in the evolutionary history of the

Halobacteria, but that it is still ongoing and contributes to the diversity of genes in these organisms.

One process known to occur in the Halobacteria which facilitates genetic recombination is cell-to-cell mating. This process was studied by Rosenshine et al. (1989) in H. volcanii using two of its plasmids as cytoplasmic markers. They observed that the cells do not exchange cytoplasm based on the immobility of the indigenous plasmids, that genetic exchange was bidirectional, and that the cells appear to form cytoplasmic bridges in order to transfer DNA to each other (Rosenshine et al., 1989). The mating process was later observed to involve cell fusion into a heterodiploid state, during which genetic exchange and other cytoplasmic components can occur (Ortenberg et al., 1998; Naor et al., 2012). This process also appears to have a low species barrier, allowing for efficient mating to occur between different species. In a study by Naor et al. (2012), H. volcanii was mated with H. mediterranei, with the genes trpA and hdrB used as markers to track the formation of hybrids. They observed that both species were able to successfully mate with each other, with the resulting hybrids exchanging fragments as large as 310-530 kb. Although interspecies mating was observed to occur less frequently than within the species, comparison of the Haloferax interspecies mating efficiencies with those in the

Bacteria indicated that H. volcanii and H. mediterranei recombine across species at a greater

17

frequency than in the Bacteria, suggesting that there is a lower species barrier for genetic

exchange in the Halobacteria compared to the Bacteria, which raises questions regarding how the

species limit recombination and maintain distinctiveness from each other without fully homogenizing. However, the observation that mating efficiency was lower between species than within species indicates that there are barriers which decrease interspecies mating (Naor et al.,

2012).

Recombination has been demonstrated to drive evolution and diversity within populations of Halobacteria. In a study by Papke et al. (2007), Halorubrum isolates from adjacent saltern ponds in Santa Pola, Spain and from a hypersaline pond in Algeria were examined using multi-

locus sequence analysis (MLSA). The strains were observed to cluster into three major

phylogroups, with each phylogroup consisting mostly of strains from the same location. High sequence identity was observed within each phylogroup, and the different loci examined via

MLSA had randomly assorted alleles, indicating that frequent recombination within the phylogroups drives diversification of these populations (Papke et al., 2007). A study by Fullmer et al. (2014) performed both MLSA and average nucleotide identity (ANI) analyses on

Halorubrum isolates from a hypersaline lake in Iran to investigate the evolution of the

Halorubrum population. The results indicated that the strains cluster into two phylogroups, with isolates within each phylogroup sharing ANI greater than 98%. Frequent recombination was observed to occur within each phylogroup, but at notably lower rates between the phylogroups

(Fullmer et al., 2014).

The observation that recombination occurs at such a high rate within the Halobacteria, and that distinct phylogroups form even among populations in the same geographic location, suggests barriers to recombination exist which allow for speciation. However, little is known

18

about these potential barriers in the Halobacteria. One potential barrier might be CRISPR-Cas

systems. CRISPRs, or clustered regularly interspaced short palindromic repeats, are short,

repeated, spacer sequences acquired from previous invasive foreign elements which act as an

immune system for the host (Barrangou et al., 2007). These spacers provide the templates for

RNA sequences called crRNAs, which are used to identify foreign genetic elements and which

interact with various Cas proteins to target and degrade the invasive elements (Koonin et al.,

2017). CRISPR-Cas systems have been identified in Halobacteria such as H. volcanii and H.

mediterranei, which both have subtype I-B systems (Li et al., 2013; Maier et al., 2019). CRISPR

spacers have also been observed to be acquired during cell-to-cell mating from partner

chromosomes in H. volcanii, and CRIPSR-Cas has also been observed limit interspecies mating

between H. volcanii and H. mediterranei when one species is designed to be the target of the

other’s CRISPR-Cas system, indicating that these systems can limit gene transfer between

species but that they are not absolute barriers to mating (Turgeman-Grott et al., 2019). Surface

glycosylation of S-layer glycoproteins can also limit cell-to-cell mating. In a study by Shalev et

al. (2017), mating experiments were performed on strains of H. volcanii with surface

glycosylation genes aglB and agl15 deleted. These deletion mutants exhibited a dramatically

reduced mating efficiency when mating with each other compared to their parental strains,

indicating that proper glycosylation of S-layer glycoproteins is required for mating. Mating a deletion mutant with a parental strain also reduced mating efficiency, but not as drastically as mating two deletion mutants together, indicating that specific surface glycosylation patterns could limit mating preferences between different strains and species of Halobacteria, but might not act as an absolute barrier (Shalev et al., 2017). Cell-to-cell mating is also affected by modification of surface proteins by archaeosortases. In a study by Abdul Halim et al. (2013), a

19

strain of H. volcanii with the archaeosortase gene artA deleted exhibited a reduced mating

efficiency compared to the parental strains, but was not an absolute barrier to recombination.

Another possible barrier to recombination could be DNA methylation state and restriction-

modification (RM) systems, which are known to limit HGT in the bacteria (Thomas and Nielsen,

2005). Therefore, further examination of DNA methylation and RM systems could provide more

insight into species barriers in the Halobacteria.

2. DNA Methylation and Methyltransferases

2.1. What is DNA Methylation?

DNA methylation is an epigenetic process where the nucleotide bases of a DNA molecule are modified with methyl groups (-CH3). One of the earliest studies to detect evidence

of methylated DNA was performed by Rollin Hotchkiss (1948). In the study, Hotchkiss used

paper chromatography on nucleic acids obtained from calf thymus to separate out and analyze

the different nucleic acid bases, and noticed that a cytosine constituent migrated at a rate greater

than typical cytosine, similar to how 5-methyluracil (thymine) migrates from uracil. Hotchkiss

hypothesized that, based on this similarity, the constituent was 5-methylcytosine (m5C), a

cytosine base which contains a methyl group on the fifth carbon of the molecule. Hotchkiss also

proposed that this methylated base was a natural constituent in the DNA of the organism

(Hotchkiss, 1948). Several other methylated bases have since been discovered to occur in various

organisms (Korlach and Turner, 2012). Eukaryotic organisms predominantly use m5C

methylation; however, bacteria and archaea have been observed to use two other types of

methylated bases alongside m5C: N4-methylcytosine (m4C) and N6-methyladenine

(m6A)(Korlach and Turner, 2012).

20

2.2. What are DNA Methyltransferases?

The process of DNA methylation is catalyzed by enzymes known as DNA

methyltransferases (MTases). The first MTase was purified in Escherichia coli by Kühnlein et

al. (1969), and was named M.EcoBI. It was discovered after partial fractionation of cell extracts

of E. coli strain B. In vitro treatment of phage fd DNA with these fractions was observed to

increase the infectivity on E. coli B spheroplasts, indicating that the purified elements modified

the phage DNA in a way that made it resistant to restriction by the host. The enzyme was further

purified via chromatography on phosphocellulose and DEAE-cellulose, and was observed to

require only S-adenosylmethionine (AdoMet) as a cofactor to modify the DNA (Kuhnlein et al.,

1969). However, the type of modification performed by the enzyme was still unknown. A later

study by Kühnlein and Arber (1972) further examined the role of AdoMet as a co-factor of

M.EcoBI, and identified the type of modification. Radio-labeled AdoMet was used in in vitro treatments of phage fd DNA with purified M.EcoBI, and the results indicated that radio-label was transferred from AdoMet to the DNA. Paper chromatography of hydrolyzed bases from the radio-labeled DNA indicated that the radio-labeled bases migrated with m6A bases, providing evidence that the enzyme was using methyl groups from AdoMet to methylate adenine into m6A and confirming it to be a DNA methyltransferase (Kuhnlein and Arber, 1972). Since then, many other MTases have been discovered, and many common features have been identified.

MTases typically contain three major types of domains: an AdoMet-binding domain, which binds to the AdoMet co-factor to pick up the methyl group, a target recognition domain

(TRD) which recognizes and binds to a short sequence of DNA known as a target site, and a catalytic domain which catalyzes the transfer of the methyl group to a nucleotide base at the target site (Klimasauskas et al., 1989; Bujnicki, 2002; Bheemanaik et al., 2006). The type of

21

nucleotide targeted for methylation (either cytosine or adenine), as well as the type of methylation (either m5C, m4C, or m6A) depends on the type of MTase. Evidence indicates that m4C MTases and m6A MTases, also known as amino-MTases, are more structurally similar to each other than to m5C MTases. A study by Posfai et al. (1989) performed amino acid alignments on 27 MTase sequences, including 13 m5C MTases, and observed ten motifs (I-X) which were universally conserved in linear alignment in the 13 m5C MTase sequences which could be used as signature motifs to identify other m5C MTases. Based on presumed structural characteristics, motif I was predicted to be associated with the AdoMet binding domain

(Klimasauskas et al., 1989) and motif IV was predicted to be associated with the catalytic domain (Chen et al., 1991). This linear conservation, however, is not observed in amino-MTases.

A study by Malone et al. (1995) performed amino acid alignments of 33 m6A MTases and 9 m4C MTases guided by the structural framework of m5C MTases to see if similar motifs were present in amino-MTases. They observed that nine motifs were conserved among the amino acid sequences of these MTases, with motifs X, I, II, and III involved in AdoMet binding and motifs

IV, V, VI, VII, and VIII involved in catalytic activity. The linear order of these motifs, however, varies between different amino-MTases, resulting in three major groups of amino-MTases (α, β, and γ) based on motif order (Malone et al., 1995). Subsequent studies have confirmed the presence of these motifs in other amino-MTases, and have since identified three other groups of amino-MTases with different linear orders of these motifs (δ, ε, and ζ) (Bujnicki and Radlinska,

1999; Bujnicki, 2002).

2.3. Strategies for Detecting DNA Methylation Patterns

Many techniques have been used to identify various DNA methylation patterns in organisms, including genomic DNA methylation patterns (methylome). Early studies relied on

22

chromatographic methods to identify the specific types of methylation that occurred on the DNA, such as in a study by Dunn and Smith (1958) which identified m6A methylation in E. coli for the first time using paper chromatography of hydrolyzed nucleic acids. Later studies used methylation-sensitive restriction enzymes and microarrays to profile the methylation patterns of organisms such as Arabidopsis thaliana (Lippman et al., 2005), as well as immunoprecipitation of methylated DNA fragments using antibodies specific to m5C (Weber et al., 2005). However, these approaches are limited in their ability to detect genome-wide methylation patterns (Lister and Ecker, 2009). One technique which is more applicable to characterizing methylomes is bisulfite (BS) conversion, through which the DNA is treated with sodium bisulfate, resulting in the conversion of unmethylated cytosines into uracil (Frommer et al., 1992). Complementary synthesis of the treated DNA can then be performed, followed by sequencing to detect which cytosines remained unconverted and, therefore, determine which bases are methylated (Lister and Ecker, 2009). This technique, however, is limited to cytosine methylation. More recently, a sequencing technique was developed to detect genomic methylation patterns using single- molecule real-time (SMRT) sequencing from Pacific Biosciences, which sequences a DNA strand by observing DNA polymerases synthesize a new strand in real-time using fluorescent bases (Eid et al., 2009). By observing the kinetic signatures of the polymerases during base incorporation, this technique can be used to directly detect methylated bases in a genome

(Flusberg et al., 2010). Analysis of methylated motifs via SMRT sequencing can also be used to determine the target specificities of different MTases (Clark et al., 2012). One weakness of this approach is that m5C methylation is weakly detected by SMRT sequencing; however, this weakness can be overcome via Tet1-oxidation of the DNA, which improves detection of m5C methylation (Clark et al., 2013). SMRT sequencing is a powerful strategy for characterizing

23

methylomes. In a study by Blow et al. (2016), for example, SMRT sequencing was used to determine the methylomes of 230 different bacterial and archaeal species. They observed methylation in ~93% of the sequenced species, and using this data the authors were able to annotate the target sites of 620 MTases (Blow et al., 2016).

3. Restriction-Modification Systems

3.1. What are Restriction-Modification Systems?

In many bacteria and archaea, MTases exist as part of host defense systems called restriction-modification (RM) systems. These systems consist of a restriction endonuclease

(REase) and a MTase which both recognize the same DNA target sites. The REase typically recognizes and cleaves the target site when it is unmethylated, whereas the MTase will methylate the same target site and protect it from cleavage by the REase. This allows the host organism to target and digest foreign DNA from harmful sources such as phages, which lack proper methylation, while keeping the host genome protected from cleavage (Bickle and Kruger, 1993;

Tock and Dryden, 2005). Along with acting as defense systems, RM systems have also been observed to regulate gene transfer. In a study by Roer et al. (2015), the RM system EcoKI was observed to reduce conjugation of unmethylated plasmids which had EcoKI target sites in E. coli. A study on natural transformation in Helicobacter pylori demonstrated that DNA fragments which are imported are small in size, with a significant number of them containing endpoints which overlapped with RM target sites, suggesting that RM systems could be playing a role in limiting the size of DNA fragments imported during natural transformation (Lin et al., 2009). A multilocus sequence typing analysis of Haemophilus influenzae strains demonstrated that the strains cluster into well-defined clades, and several of the clades were observed to possess RM systems not present in the other clades, suggesting that RM systems might be providing a

24

restriction barrier limiting genetic exchange between the groups and thus driving group

formation (Erwin et al., 2008). A study by Budroni et al. (2011) on Neisseria meningitidis

genomes and populations indicated that the populations are organized into distinct clades, and

that unique RM systems were present in each clade, suggesting that RM systems were driving

population diversity among these strains. RM systems have also been characterized as selfish

genetic elements which function as addiction cassettes. This concept, as described by Ichizo

Kobayashi (2001), suggests that RM systems can be readily transferred to different hosts through

various mobile genetic elements, and that they are then maintained by the host due to post-

segregational killing which occurs when the RM system is lost. Since methylation activity

decays more readily than restriction activity, the loss of the RM system will result in cell death

due to cleavage of the exposed restriction sites. Therefore, the host cell is under selection to

maintain the RM system, resulting in the host becoming addicted to the system (Kobayashi,

2001; Ohno et al., 2008).

3.2. A Brief History of RM System Discoveries in Bacteria

Evidence of the existence of RM systems was first discovered in a study by Luria and

Human (1952), in which they infected mutants E. coli strain B and Shigella dysenteriae strain Sh with T1-T7 bacteriophages. They observed that, when the phages are used to infect certain mutants of strain B, those phages are released in a form that is modified and unable to infect other E. coli strain B mutants, but are able to infect S. dysenteriae normally (Luria and Human,

1952). A similar result was observed in a study by Bertani and Weigel (1953), in which λ bacteriophage were observed to be infect E. coli strain K-12 at a lower rate if the phage was first reproduced in E. coli strain C. Together, these studies suggested that bacterial hosts provided some kind of modification to the phage particle, and that when this modification was not present,

25

phage infection was restricted (Loenen, 2003). A later study by Arber and Dussoix (1962)

demonstrated that this modification occurred on the DNA of the phage by using radiolabeled 32P

λ phages to track the transfer of DNA during infection into E. coli strain K-12. A follow-up study (Dussoix and Arber, 1962) determined that restriction of phage DNA by the K-12 host occurred a short time after injection of the DNA. The genes responsible for restriction and modification of the phage DNA in E. coli K-12 (hsdR, hsdM, and hsdS) were later identified via mating experiments and complementation analyses with various restriction and modification mutants (Boyer, 1964; Boyer and Roulland-Dussoix, 1969; Loenen, 2003), and the first RM system enzymatic complex was isolated from E. coli K-12 by Meselson and Yuan (1968), which was named EcoKI (Loenen, 2003). The modification to the DNA was observed to be methylation based on studies of M.EcoBI, the first purified MTase (Kuhnlein et al., 1969; Kuhnlein and

Arber, 1972).

3.3. Classes of RM Systems

Many different kinds of RM systems have since been identified to exist in the Bacteria and Archaea, and a nomenclature was developed by Roberts et al. (2003) to provide a system of classification for RM systems. There are four major classifications of RM systems according to this nomenclature: Type I, Type II, Type III, and Type IV (Roberts et al., 2003). Type I RM systems consist of three different types of enzymes: REases, MTases, and site specificity subunits (Loenen et al., 2014). These enzymes act as subunits of a restriction-modification pentamer complex consisting of two REase (R) subunits, two MTase (M) subunits, and one site specificity subunit (S) which recognizes the target site for the complex (Liu et al., 2017). When the complex interacts with a target site that is methylated on one of the two strands

(hemimethylated), the M subunits methylate the target site on the other strand, whereas if both

26

strands are unmethylated the R subunits will use ATP-dependent DNA translocation to cleave

the DNA several bases upstream or downstream from the target site (Horiuchi and Zinder, 1972;

Studier and Bandyopadhyay, 1988; Dryden et al., 1993). When the Type I complex does not have active R subunits, it can methylate both hemimethylated and completely unmethylated target sites (Kelleher et al., 1991). Since the S subunits typically contain two tandem TRDs,

Type I systems usually recognize bipartite target sequences (e.g. AACN6GTGC in the Type I

RM system EcoKI)(Gough and Murray, 1983; Fuller-Pace and Murray, 1986). Type II RM

systems consist of independently functioning MTase and REases, each with their own TRDs

which typically recognize short, palindromic sequences (Ershova et al., 2015). The MTase will

methylate the target site, whereas the REase will cleave within or near the site (Pingoud et al.,

2014). Many different subtypes of Type II RM systems have been identified, such as Type IIG which consists of individual RM proteins which are capable of restriction and methylation activity (Janulaitis et al., 1992; Morgan et al., 2009). Type III RM systems are made up of REase

(Res) and MTase (Mod) subunits which work together as a trimer complex of two Mod subunits and one Res subunit (Rao et al., 2014). The Mod subunit includes the TRD of the complex, and typically recognizes short, asymmetric, non-palindromic sequences (Bachi et al., 1979;

Humbelin et al., 1988). The complex will methylate only one strand after binding, and will cleave unmethylated sites via ATP-driven translocation (Janscak et al., 2001). Type IV systems consist only of REases which cleave methylated target sites, and these modification-dependent

REases have been hypothesized to protect hosts from methylated phage DNA and to prevent the acquisition of foreign MTase genes which may belong to mobile, parasitic RM systems (Loenen and Raleigh, 2014).

27

4. Orphan Methyltransferases

4.1. What are Orphan Methyltransferases?

DNA MTases are also present in bacteria and archaea without cognate REases, which are

known as orphan MTases. Rather than providing a protective function to the host, orphan

MTases are typically involved in important cellular functions of their host organisms (Adhikari

and Curtis, 2016). As a consequence, orphan MTases are more well-conserved and prevalent among the Bacteria and Archaea than MTases associated with RM systems. In a study by

Seshasayee et al. (2012), a survey of MTase genes was performed on ~1000 bacterial genomes.

They observed that most MTases were orphans, and that orphan MTases are well-conserved within genera compared to poorly-conserved MTases associated with RM systems. Interestingly, evidence from atypical oligonucleotide composition indicated that both types of MTases may have been acquired horizontally across genera, and the presence of degraded REase genes near orphan MTases in many genera suggests that orphan MTases are the product of RM degradation following horizontal acquisition (Seshasayee et al., 2012). Blow et al. (2016) also observed that

orphan MTases were widespread and well-conserved among the 230 different bacterial and archaeal species that they analyzed via SMRT sequencing, with 57% conserved at the genus level (compared to only 9% of RM-associated MTases). The majority of these orphan MTases

(107/165) were also observed to group into 19 evolutionarily conserved orphan MTase families, with the 2 most highly-represented families consisting of orphan MTases with well-known regulatory functions (Blow et al., 2016).

4.2. DNA Adenine Methyltransferase (Dam) In Escherichia coli

One of the earliest characterized orphan MTases is DNA adenine MTase (Dam) in E.

coli, which methylates the motif GAm6TC (Hattman et al., 1978). The first mutants of Dam were

28

isolated in E. coli K-12 by Marinus and Morris (1973), in which they characterized 3 different

mutant strains which were deficient in adenine methylation. These mutants were all viable, with

no obvious detrimental phenotypical characteristics (Marinus and Morris, 1973). However, a

later study by the authors demonstrated that these Dam mutants had higher mutation rates than

their parental strains (Marinus and Morris, 1974), and a study by Boye et al. (1988) indicated

that Dam mutants have asynchronic chromosomal replication. These were the earliest

observations that Dam has an important role in DNA repair and replication. In E. coli, DNA

repair is coordinated via the MutHLS methyl-directed mismatch repair (MMR) system, which

relies on Dam methylation (Adhikari and Curtis, 2016). In this system, any mismatched bases in

a newly-synthesized DNA strand are recognized by MutS, after which MutL recruits

endonuclease MutH to cleave the new strand at the mismatched base to be degraded and re-

synthesized (Au et al., 1992; Li, 2008). MutH is dependent on Dam methylation, and will bind to

the nearest GAm6TC site when cleaving the mismatched base on the new, unmethylated strand

(Welsh et al., 1987). Therefore, Dam methylation is required for the MMR system to recognize

the old strand from the newly-synthesized strand. Dam methylation also controls DNA

replication in E. coli by regulating SeqA sequestration of the dnaA promoter and the origin of

replication (oriC)(Adhikari and Curtis, 2016). After DNA replication, the GATC sites in both the

dnaA promoter and oriC remain in a hemimethylated state slightly longer than the rest of the

chromosome (Campbell and Kleckner, 1990), resulting in preferential binding of SeqA to these

locations, since SeqA has a high binding affinity for hemimethylated sites (Kang et al., 1999).

This binding reduces dnaA expression and limits access to oriC, thus allowing for tight control of

chromosomal replication (Sanchez-Romero et al., 2010). Dam methylation has also been

observed to be involved in phase-variable gene regulation, such as by interfering with OxyR

29

binding to the promoter of antigen 43 (Waldron et al., 2002). Many homologs of Dam have been

identified in other Gammaproteobacteria, and these homologs have been observed to have similar roles in DNA repair and replication (Lobner-Olesen et al., 2005; Brezellec et al., 2006;

Adhikari and Curtis, 2016).

4.3. Cell-Cycle-Regulated Methyltransferase (CcrM) in Caulobacter crescentus

Another well-characterized orphan MTase is cell-cycle-regulated MTase (CcrM), a m6A

MTase in Caulobacter crescentus which methylates GAm6NTC (Adhikari and Curtis, 2016).

CcrM was first discovered in a study by Zweiger et al. (1994), in which they observed that, after

replication of DNA, methylation of target sites on the newly synthesized strand occurred only

just before cell division. When the identified ccrM gene was placed under the control of a

constitutive promoter, methylation occurred continuously, and resulted in cells with

morphological deformities and excessive initiation of DNA replication (Zweiger et al., 1994). A

later study by Stephens et al. (1996) demonstrated that ccrM deletion mutants of C. crescentus

cannot be obtained and that deficiencies in CcrM production result in a major decrease in viable

cells, indicating that CcrM is essential for cell viability. This was early evidence that CcrM is

involved in regulating the C. crescentus cell cycle. The cycle begins with a planktonic, motile

cell with a flagellum, called a swarmer cell, which prior to cell division ejects its flagellum and

grows a cell body protrusion called a stalk (Curtis and Brun, 2010). This stalked cell replicates

its chromosome and elongates, eventually dividing into a stalked cell and a swarmer cell (Curtis

and Brun, 2010; Adhikari and Curtis, 2016). CcrM regulates this process by controlling the

timing of the expression of global regulator CtrA (Reisenauer and Shapiro, 2002). The gene of this regulator has two promoters: P1, which is weak and expressed in early predivision, and P2, which is strong and expressed in late predivision (Domian et al., 1999). In the early predivisional

30

stage, P1 is hemimethylated at the GANTC site, which facilitates binding by regulator GcrA to

weakly express CtrA (Reisenauer and Shapiro, 2002; Fioravanti et al., 2013). As CtrA

accumulates in the cell, it binds to a half-site in P1 to inactivate weak expression, while binding

to and activating P2 to induce higher expression of CtrA in the late predivisional stage to

promote asymmetrical division of the cell into a swarmer and stalked cell (Domian et al., 1999).

This higher expression of CtrA also induces transcription of the ccrM gene, resulting in full

methylation of the chromosome and reduced activity at the P1 promoter (Stephens et al., 1995;

Adhikari and Curtis, 2016). CcrM is also involved in regulating the expression of other genes in

the genome, either directly or indirectly via CtrA expression, such as ftsZ and mipZ (Gonzalez

and Collier, 2013).

4.4. Campylobacter Transformation System Methyltransferase (CtsM) in Campylobacter

jejuni

One of the more recently characterized orphan MTases is CJJ81-176_0240 in

Campylobacter jejuni, a m6A MTase which was observed via SMRT sequencing to methylate the target site RAm6ATTY (Murray et al., 2012). Beauchamp et al. (2017) re-named this orphan

MTase Campylobacter transformation system MTase (CtsM) based on observations that it is

used by C. jejuni to select extracellular DNA for natural transformation. In their study,

Beauchamp et al. (2017) produced a mutant of C. jejuni with no functioning ctsM gene and

which did not methylate RAATTY motifs. When DNA extracted from this mutant was used to

naturally transform wild-type C. jejuni cells, the transformation efficiency was observed to be

~200-fold lower than when using wild-type, RAm6ATTY-methylated DNA. DNA from the

MTase mutant which was methylated in vitro by M.EcoRI at GAm6ATTC sites, a subset of

RAm6ATTY sites, was as efficient at transforming C. jejuni as wild-type DNA, and in vitro

31

methylation with M.EcoRI could also be used to produce E. coli plasmids which successfully

transformed in C. jejuni. A DNA binding and uptake assay indicated that unmethylated DNA

binds less efficiently to the cell wall of C. jejuni compared to RAm6ATTY-methylated DNA

(Beauchamp et al., 2017). Overall, these results indicate that RAm6ATTY methylation via CtsM

is used by C. jejuni to discriminate which DNA to take up for transformation, similar to DNA

uptake sequences (DUS) used in other naturally-transformable bacterial species such as

Haemophilus influenzae and Neisseria gonorrhoeae (Smith et al., 1999; Beauchamp et al.,

2017).

5. DNA Methylation and RM Systems in the Archaea

5.1. RM Systems in Thermophilic Archaea

Research on DNA methylation and RM systems in the Archaea has been less extensive as

research in the Bacteria. Many studies on RM systems in archaea have focused on hyperthermophilic species. The identification of the first REase in an archaeal organism occurred

in a study by McConnell et al. (1978) of the hyperthermophilic archaeon Thermoplasma

acidophilum. The enzyme, named ThaI, was isolated from cell extracts. The purified enzyme

was used to perform restriction assays on plasmid DNA, phage DNA, and purified T.

acidophilum DNA. Results indicated that the enzyme targets the site CGCG for cleavage on the

plasmid DNA. However, the T. acidophilum DNA was resistant to cleavage, indicating that it is

likely methylated by a cognate MTase (McConnell et al., 1978).

A similar study by Prangishvili et al. (1985) identified another archaeal REase from cell

extracts of the thermoacidophilic species Sulfolobus acidocaldarius, named SuaI. Restriction

analyses of the purified enzyme on plasmid DNA indicated that it targets GG/CC sites. The

enzyme, however, was not able to cleave purified S. acidocaldarius DNA, indicating that the

32

genomic DNA is methylated at the target sites (Prangishvili et al., 1985). A later study by

Grogan (2003) further examined the type of cytosine methylation performed by the SuaI RM

system in S. acidocaldarius. A restriction assay was performed on purified genomic DNA of S.

acidocaldarius using REase isoschizomers of SuaI which were sensitive to different types of

cytosine methylation (m4C or m5C), and the results indicated that the m4C-specific REase

isoschizomer could not cleave the S. acidocaldarius DNA. Furthermore, the methylation

sensitivity of the purified SuaI REase was tested on methylated plasmid and phage DNA, and the

results indicated that it was only sensitive to m4C-methylated DNA substrates. These results

indicated that the SuaI RM system methylates GGCC via m4C methylation (Grogan, 2003).

A study by Ishikawa et al. (2005) tested for the presence of REases in the

hyperthermophilic species Pyrococcus abyssi and Pyrococcus horikoshii. Using bioinformatics,

the researchers identified 32 candidate REase genes based on their annotation as nucleases or

proximity to putative MTase genes, and then cloned and expressed those genes in wheat-germ extracts to test their efficacy on cleaving DNA. Only two of these candidates exhibited REase activity: PhoI, which had been previously characterized in P. horikoshii, and PabI, a novel REase in P. abyssi which cleaves GTA/C. Examination of the genomic region containing PabI and its cognate methyltransferase indicate that it has a lower GC-content than the rest of the genome, suggesting that PabI may have been horizontally transferred into P. abyssi (Ishikawa et al.,

2005). The putative PabI MTase (M.PabI) was further examined in a study by Watanabe et al.

(2006). Bioinformatic analyses of the predicted amino acid sequence of M.PabI indicated that it was highly similar to Thermus aquaticus MTase M.TaqI, a Type II γ m6A MTase. BLASTp analyses indicated that the N-terminal region of M.PabI shares high similarity with the M subunits of Type I RM systems, and its C-terminal regions shared similarity with Type I S

33

subunits. Restriction enzyme assays and in vitro methylation assays indicated that M.PabI

methylates GTAm6C, confirming that the enzyme is the cognate MTase of the PabI RM system.

The hyperthermophilic properties of M.PabI also allowed the authors to characterize the

activation energy and thermodynamic parameters of the enzyme, a first for MTases (Watanabe et

al., 2006).

The full methylome of S. acidocaldarius was recently characterized in a study by

Couturier and Lindås (2018) using restriction digestion assays, dot blots, methylation-specific

antibodies, and SMRT sequencing. They confirmed the presence of M.SuaI GGCm4C

methylation via SMRT sequencing, which was observed by Grogan (2003), and they also

identified the presence of GAm6TC methylation via dot blot experiments (Couturier and Lindas,

2018). SMRT sequencing also identified two m6A motifs (AGAm6TCC and GGAm6TCY) which

both have GAm6TC as their core motif. Due to the low percentage of methylation of the GAm6TC

motifs compared to the GGCm4C motifs, the m6A motif was concluded to be the result of an

orphan MTase rather than a RM system like the m4C motif. The distribution of methylated

GATC motifs in the genome, especially in cell cycle regulated genes and near promoter regions,

were also investigated to determine the role that m6A methylation might have in the cell cycle

and transcription regulation. However, GATC motifs were observed in only half of the

investigated cell cycle regulated genes, and none of the investigated promoter regions contained

GATC motifs, suggesting that m6A methylation does not play a major role in cell cycle

regulation or transcriptional regulation in S. acidocaldarius. The role of m6A methylation in S.

acidocaldarius is still unclear, and the MTase responsible for this methylation was not

determined (Couturier and Lindas, 2018).

34

5.2. RM Systems in Methanogenic Archaea

RM systems have also been identified in methanogenic archaeal species. An early study

identified three novel REases in the methanogenic archaeon Methanococcus aeolicus (Schmid et

al., 1984). These REases were purified and isolated from cell extracts, and were named MaeI,

MaeII, and MaeIII. Restriction analyses on plasmid and phage DNA indicated that MaeI cleaves

C/TAG, MaeII targets A/CGT, and MaeIII cuts at the site /GTNAC. Based on the palindromic nature of these target sites, the REases were classified as belonging to Type II RM systems, but the cognate MTases were not determined (Schmid et al., 1984). An REase was also isolated from

cell extracts of the methanogenic species Methanococcus vanielii by Thomm et al. (1988). The

purified REase was named MvnI, and restriction analyses on phage and plasmid DNA indicated

that the enzyme recognizes the site CG/CG. The enzyme was also observed to operate at more

moderate temperatures compared to more hyperthermophilic isoschizomers ThaI and FnuDII,

and therefore MvnI was proposed as a more useful alternative to use in laboratory settings

(Thomm et al., 1988).

In the thermophilic, methanogenic species Methanobacterium wolfei, a Type II RM

system was isolated and characterized by Lunnen et al. (1989). The REase was partially purified

from cell extracts and then further tested on different plasmid and phage DNA to determine the

specificity of the enzyme. The results of the restriction assay indicated that the enzyme targets

the site GCN5/N2GC. The enzyme was named MwoI, and classified as a component of a Type II

RM system based on the palindromic structure of its recognition site. The genomic region

containing the candidate genes of the REase and its cognate MTase were then isolated and

cloned into E. coli. Cell extracts of the transformed E. coli strain exhibited REase activity,

indicating that the MwoI REase was being expressed, and plasmid extracts from the transformed

35

E. coli strain were observed to be resistant to cleavage by MwoI, indicating that the cognate

MTase was also being expressed and was methylating the plasmid DNA. The cognate MTase

was named M.MwoI (Lunnen et al., 1989).

Two studies by Nolling and de Vos were performed in 1992 which identified Type II RM systems in different strains of the methanogenic archaeon Methanothermobacter thermoformicicum (referred to as Methanobacterium themoformicicum at the time). The first study examined strain THF, in which the researchers identified a new Type II RM system on plasmid pFV1 (Nolling and de Vos, 1992a). During isolation of the plasmid, the DNA of the strain was observed to be resistant to cleavage by REases which target GGCC-specific sites, and subcloning of various pFV1 fragments into E. coli resulted in the isolation of two ORFs encoding a Type II RM system, which was named MthTI. Cell extractions of E. coli expressing the putative REase gene on phage DNA confirmed that the enzyme targets GGCC sites, and

DNA extracted from E. coli strains expressing the M.MthTI MTase was resistant to GGCC-

specific REases, demonstrating that MthTI targets GGCC sites. Analyses of the amino acid

sequences of the ORFs demonstrated that they have high sequence similarity to the MTase and

REase of the Type II GGCC-specific RM system NgoPII in the bacterial strain Neisseria gonorrhoeae, which suggests that horizontal gene transfer of GGCC-specific RM system families may have occurred between bacteria and archaea (Nolling and de Vos, 1992a).

In the second, follow-up study, the M. thermoformicicum strains Z-245 and FTF were examined (Nolling and de Vos, 1992b). Cell-free extracts from both strains were tested on phage and plasmid DNA for restriction enzyme activity, and distinct, matching banding patterns were observed to occur in both strains which indicated digestion at CTAG target sites. The RM systems were named MthZI for Z-245 and MthFI for FTF. Methylation assays on cell-free

36

extracts also indicated the presence of corresponding DNA methyltransferases in these two

strains which decreases restriction enzyme activity of MthZI and MthFI. The genes of these RM

systems were determined to be encoded on plasmids pFZ1 (Z-245) and pFZ2 (FTF), since extracts without the plasmids had no restriction activity. The isolated gene for M.MthZI was

identified on pFZ1 and was named mthZIM. Analysis of the amino acid sequence encoded by

this gene indicated homology with several m4C and m6A methyltransferases. These were the

fifth and sixth CTAG-specific RM systems discovered at the time, and two of the previous ones

(including MaeI) were also discovered in methanogens, indicating that CTAG-specific RM systems may be more common in archaea than in bacteria (Nolling and de Vos, 1992b).

5.3. RM Systems in the Halobacteria

Research on DNA methylation and RM systems in haloarchaea has been more limited. In an early study, evidence for two RM systems was observed in the halophilic archaeon

Halobacterium salinarum (referred to as Halobacterium cutirubrum)(Patterson and Pauling,

1985). Plating experiments were used to test the ability of H. salinarum to restrict or modify the halophage Hh3, with Halobacterium halobium used as the indicator host. The results indicated that H. salinarum has two RM systems, and that the loss of one or both of these systems resulted in four different RM phenotypes. However, the specific RM systems were not identified, nor were their target sites or methylation types (Patterson and Pauling, 1985).

A Type IV system has also been identified in the halophilic archaeon Haloferax volcanii.

In a study by Holmes et al. (1991), shuttle vectors which were cloned in E. coli were observed to have a low transformation efficiency in H. volcanii, which indicated a restriction barrier.

However, when the vectors were run through an E. coli dam- mutant, the transformation

efficiency increased three-fold, indicating that H. volcanii contained a methylation-dependent

37

restriction enzyme which targeted adenine (Holmes et al., 1991). When the genome of H.

volcanii was sequenced, a putative Type IV REase gene was identified called mrr, and this gene was proposed to be responsible for the decrease in transformation efficiency on methylated DNA in H. volcanii (Hartman et al., 2010). This hypothesis was tested in a study by Allers et al.

(2010), in which the mrr gene was deleted from a strain of H. volcanii. The transformation efficiency of this strain was then tested on plasmid DNA from dam+ and dam- strains of E. coli.

The results indicated that the Δmrr strain had an equal transformation efficiency on both

methylated and unmethylated plasmids, indicating that the mrr gene does encode a Type IV

REase in H. volcanii responsible for low transformation efficiency on methylated DNA (Allers et

al., 2010).

DNA methylation has also been observed by Chimileski et al. (2014) to affect

metabolism of eDNA in H. volcanii. In the study, H. volcanii was supplemented with DNA from

itself, herring sperm, and E. coli, and was observed to grow well only on its own DNA. This

result indicated that H. volcanii is selective in the eDNA it utilizes as a nutrient source. When H.

volcanii was supplemented with DNA from an E. coli dam-/dcm- mutant which doesn’t methylate

its DNA, H. volcanii was able to grow better than when supplemented with its own DNA,

indicating that the methylation state of the eDNA affects metabolism of the DNA. However, the

mechanism through which H. volcanii recognizes the methylation state of the eDNA utilized for

metabolism is currently unknown (Chimileski et al., 2014).

5.4. Domain-wide Investigation of Archaeal DNA Methylation and RM Systems

A more widespread study of DNA methylation in the Archaea was performed by

Lodwick et al. (1986), which examined Dam methylation in the archaea using Dam-specific

REases and testing recalcitrance of DNA to endonuclease activity. Many different archaeal

38

genera were tested, including Halobacterium, Halorubrum, Haloferax (both classified as

Halobacterium at the time of the study), Halococcus, Natronobacterium, Natronococcus,

Methanococcus, Methanogenium, Methanosarcina, , Methanobacterium,

Methanobrevibacter, Methanomicrobium, Methanospirillum, and Sulfolobus. Recalcitrance to

REase digestion was observed in several species tested, indicating the presence of Dam

methylation in these species. Archaeal species which were identified as Dam+ include:

Halorubrum saccharovorum, Halorubrum sodomense, Halorubrum trapanicum, Methanogenium

cariaci, Methanogenium marisnigri, Methanogenium thermophilicum, Methanolobus tindarius,

Methanobacterium bryantii, Methanobacterium sp. FR2, Methanobacterium jormicicum, and

Methanospirillum hungatei (Lodwick et al., 1986).

The MTases and RM systems of 13 archaeal species were more recently examined by

Blow et al. (2016) using SMRT sequencing and bioinformatics. They identified five classes of

orphan MTases present in four different classes of archaea: two groups (which methylate

GAm6TC and CAm6TG) in Thermoplasmata, one (which methylates GAm6TC) in Thermococci,

one (which methylates AGCm4T) in , and one (which methylates Cm4TAG) in

Halobacteria. The overall function of these orphan MTase families was not determined.

However, the CTAG family in Halobacteria was hypothesized to be involved in regulating DNA replication, since a high density of CTAG motifs was observed in to occur upstream of the origin of replication complex gene orthologs orc6/cdc1 in many haloarchaeal species (Blow et al.,

2016).

6. Conclusion

The Halobacteria are unique archaeal organisms which have been identified in a variety of hypersaline habitats, including solar salterns, hypersaline lakes, and the fluid inclusions of

39

ancient halite. They can utilize a variety of carbon, nitrogen, and phosphorus sources in these

environments, including extracellular DNA. Halobacteria have been observed to be highly

recombinogenic, and are able to exchange genetic information using low-species-barrier

processes such as cell-to-cell mating. Research has demonstrated that CRISPR-Cas, surface

glycosylation, and archaeosortases can limit recombination via cell-to-cell mating, but none of

these strategies act as absolute barriers to recombination in the Halobacteria. DNA methylation

and RM systems could be another potential recombination barrier.

DNA methylation and RM systems have been studied extensively in the Bacteria. Four

major types of RM systems have been identified, and they have been observed to protect their

hosts from harmful foreign DNA and regulate gene transfer. RM systems have also been observed to act as selfish genetic elements. Orphan DNA MTases, on the other hand, are involved in important cellular functions, such as regulating the initiation of DNA replication,

DNA mismatch repair, and even recognition of DNA for natural transformation.

Research into DNA methylation and RM systems in archaea, however, has mostly focused on identification and characterization of REases, and some MTases, in hyperthermophilic and methanogenic species, with little attention given to the Halobacteria.

Although some evidence suggests gene transfer of RM systems has occurred in the Archaea, research on the evolution of RM systems and orphan MTases in the Archaea has been limited, and little is known about their roles in their host organisms. The methylomes of archaeal organisms have also only recently begun to be examined in detail. This dissertation will examine

DNA methylation and RM systems in the Halobacteria in order to better understand the role of these systems in the Archaea and their impact on halobacterial speciation and genetic recombination. Chapter 3 will discuss the characterization of the methylome of a model

40

halobacterial species, H. volcanii. Next, in Chapter 4, the MTases of H. volcanii will be

examined, and the development of an RM null deletion strain will be described. Chapter 5 will

discuss the application of the H. volcanii RM null mutant in examining the impact of RM

systems on mating. Finally, Chapter 6 will explore the evolution and distribution of RM system

and orphan MTase genes throughout the Halobacteria.

References

Abdul Halim, M.F., Pfeiffer, F., Zou, J., Frisch, A., Haft, D., Wu, S., Tolić, N., Brewer, H., Payne, S.H., and Paša‐Tolić, L. (2013). Haloferax volcanii archaeosortase is required for motility, mating, and C‐terminal processing of the S‐layer glycoprotein. Molecular microbiology 88, 1164-1175.

Adhikari, S., and Curtis, P.D. (2016). DNA methyltransferases and epigenetic regulation in bacteria. FEMS Microbiol Rev 40, 575-591.

Allers, T., Barak, S., Liddell, S., Wardell, K., and Mevarech, M. (2010). Improved strains and plasmid vectors for conditional overexpression of His-tagged proteins in Haloferax volcanii. Appl Environ Microbiol 76, 1759-1769.

Allers, T., and Ngo, H.-P. (2003). "Genetic analysis of homologous recombination in Archaea: Haloferax volcanii as a model organism". Portland Press Limited).

Allers, T., Ngo, H.P., Mevarech, M., and Lloyd, R.G. (2004). Development of additional selectable markers for the halophilic archaeon Haloferax volcanii based on the leuB and trpA genes. Appl Environ Microbiol 70, 943-953.

Anderson, I., Scheuner, C., Goker, M., Mavromatis, K., Hooper, S.D., Porat, I., Klenk, H.P., Ivanova, N., and Kyrpides, N. (2011). Novel insights into the diversity of catabolic metabolism from ten haloarchaeal genomes. PLoS One 6, e20237.

Anton, J., Oren, A., Benlloch, S., Rodriguez-Valera, F., Amann, R., and Rossello-Mora, R. (2002). Salinibacter ruber gen. nov., sp. nov., a novel, extremely halophilic member of the Bacteria from saltern crystallizer ponds. Int J Syst Evol Microbiol 52, 485-491.

Anton, J., Rossello-Mora, R., Rodriguez-Valera, F., and Amann, R. (2000). Extremely halophilic bacteria in crystallizer ponds from solar salterns. Appl Environ Microbiol 66, 3052-3057.

Arber, W., and Dussoix, D. (1962). Host specificity of DNA produced by Escherichia coli. I. Host controlled modification of bacteriophage lambda. J Mol Biol 5, 18-36.

41

Au, K.G., Welsh, K., and Modrich, P. (1992). Initiation of methyl-directed mismatch repair. J Biol Chem 267, 12142-12148.

Bachi, B., Reiser, J., and Pirrotta, V. (1979). Methylation and cleavage sequences of the EcoP1 restriction-modification enzyme. J Mol Biol 128, 143-163.

Baliga, N.S., Bonneau, R., Facciotti, M.T., Pan, M., Glusman, G., Deutsch, E.W., Shannon, P., Chiu, Y., Weng, R.S., Gan, R.R., Hung, P., Date, S.V., Marcotte, E., Hood, L., and Ng, W.V. (2004). Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea. Genome Res 14, 2221-2234.

Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P., Moineau, S., Romero, D.A., and Horvath, P. (2007). CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709-1712.

Beauchamp, J.M., Leveque, R.M., Dawid, S., and Dirita, V.J. (2017). Methylation-dependent DNA discrimination in natural transformation of Campylobacter jejuni. Proc Natl Acad Sci U S A 114, E8053-E8061.

Benlloch, S., López‐López, A., Casamayor, E.O., Øvreås, L., Goddard, V., Daae, F.L., Smerdon, G., Massana, R., Joint, I., and Thingstad, F. (2002). Prokaryotic genetic diversity throughout the salinity gradient of a coastal solar saltern. Environmental Microbiology 4, 349-360.

Bertani, G., and Weigle, J.J. (1953). Host controlled variation in bacterial viruses. J Bacteriol 65, 113-121.

Bheemanaik, S., Reddy, Y.V., and Rao, D.N. (2006). Structure, function and mechanism of exocyclic DNA methyltransferases. Biochem J 399, 177-190.

Bickle, T.A., and Kruger, D.H. (1993). Biology of DNA restriction. Microbiol Rev 57, 434-450.

Bitan-Banin, G., Ortenberg, R., and Mevarech, M. (2003). Development of a gene knockout system for the halophilic archaeon Haloferax volcanii by use of the pyrE gene. J Bacteriol 185, 772-778.

Blaby, I.K., Phillips, G., Blaby-Haas, C.E., Gulig, K.S., El Yacoubi, B., and De Crecy-Lagard, V. (2010). Towards a systems approach in the genetic analysis of archaea: Accelerating mutant construction and phenotypic analysis in Haloferax volcanii. Archaea 2010, 426239.

Blow, M.J., Clark, T.A., Daum, C.G., Deutschbauer, A.M., Fomenkov, A., Fries, R., Froula, J., Kang, D.D., Malmstrom, R.R., Morgan, R.D., Posfai, J., Singh, K., Visel, A., Wetmore, K., Zhao, Z., Rubin, E.M., Korlach, J., Pennacchio, L.A., and Roberts, R.J. (2016). The epigenomic landscape of prokaryotes. PLoS Genet 12, e1005854.

42

Bolhuis, H., Palm, P., Wende, A., Falb, M., Rampp, M., Rodriguez-Valera, F., Pfeiffer, F., and Oesterhelt, D. (2006). The genome of the square archaeon Haloquadratum walsbyi : life at the limits of water activity. BMC Genomics 7, 169.

Bonete, M.J., Martinez-Espinosa, R.M., Pire, C., Zafrilla, B., and Richardson, D.J. (2008). Nitrogen metabolism in haloarchaea. Saline Systems 4, 9.

Borowitzka, L.J., and Brown, A.D. (1974). The salt relations of marine and halophilic species of the unicellular green alga, Dunaliella. Archives of Microbiology 96, 37-52.

Boye, E., Lobner-Olesen, A., and Skarstad, K. (1988). Timing of chromosomal replication in Escherichia coli. Biochim Biophys Acta 951, 359-364.

Boyer, H. (1964). Genetic control of restriction and modification in Escherichia coli. J Bacteriol 88, 1652-1660.

Boyer, H.W., and Roulland-Dussoix, D. (1969). A complementation analysis of the restriction and modification of DNA in Escherichia coli. J Mol Biol 41, 459-472.

Breuert, S., Allers, T., Spohn, G., and Soppa, J. (2006). Regulated polyploidy in halophilic archaea. PLoS One 1, e92.

Brezellec, P., Hoebeke, M., Hiet, M.S., Pasek, S., and Ferat, J.L. (2006). DomainSieve: a protein domain-based screen that led to the identification of dam-associated genes with potential link to DNA maintenance. Bioinformatics 22, 1935-1941.

Budroni, S., Siena, E., Dunning Hotopp, J.C., Seib, K.L., Serruto, D., Nofroni, C., Comanducci, M., Riley, D.R., Daugherty, S.C., Angiuoli, S.V., Covacci, A., Pizza, M., Rappuoli, R., Moxon, E.R., Tettelin, H., and Medini, D. (2011). Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc Natl Acad Sci U S A 108, 4494-4499.

Bujnicki, J.M. (2002). Sequence permutations in the molecular evolution of DNA methyltransferases. BMC Evol Biol 2, 3.

Bujnicki, J.M., and Radlinska, M. (1999). Molecular evolution of DNA-(cytosine-N4) methyltransferases: evidence for their polyphyletic origin. Nucleic Acids Res 27, 4501- 4509.

Campbell, J.L., and Kleckner, N. (1990). E. coli oriC and the dnaA gene promoter are sequestered from dam methyltransferase following the passage of the chromosomal replication fork. Cell 62, 967-979.

Charlebois, R.L., Lam, W.L., Cline, S.W., and Doolittle, W.F. (1987). Characterization of pHV2 from Halobacterium volcanii and its use in demonstrating transformation of an archaebacterium. Proceedings of the National Academy of Sciences 84, 8530-8534.

43

Chen, H., Jiang, J.G., and Wu, G.H. (2009). Effects of salinity changes on the growth of Dunaliella salina and its isozyme activities of glycerol-3-phosphate dehydrogenase. J Agric Food Chem 57, 6178-6182.

Chen, L., Macmillan, A.M., Chang, W., Ezaz-Nikpay, K., Lane, W.S., and Verdine, G.L. (1991). Direct identification of the active-site nucleophile in a DNA (cytosine-5)- methyltransferase. Biochemistry 30, 11018-11025.

Chimileski, S., Dolas, K., Naor, A., Gophna, U., and Papke, R.T. (2014). Extracellular DNA metabolism in Haloferax volcanii. Front Microbiol 5, 57.

Ciulla, R., Clougherty, C., Belay, N., Krishnan, S., Zhou, C., Byrd, D., and Roberts, M. (1994). Halotolerance of Methanobacterium thermoautotrophicum delta H and Marburg. Journal of bacteriology 176, 3177-3187.

Clark, T.A., Lu, X., Luong, K., Dai, Q., Boitano, M., Turner, S.W., He, C., and Korlach, J. (2013). Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation. BMC Biol 11, 4.

Clark, T.A., Murray, I.A., Morgan, R.D., Kislyuk, A.O., Spittle, K.E., Boitano, M., Fomenkov, A., Roberts, R.J., and Korlach, J. (2012). Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucleic Acids Res 40, e29.

Cline, S., Pfeifer, F., and Doolittle, W. (1995). Transformation of halophilic archaea. Archaea: a Laboratory Manual, 197-204.

Coker, J.A., Dassarma, P., Capes, M., Wallace, T., Mcgarrity, K., Gessler, R., Liu, J., Xiang, H., Tatusov, R., Berquist, B.R., and Dassarma, S. (2009). Multiple replication origins of Halobacterium sp. strain NRC-1: properties of the conserved orc7-dependent oriC1. J Bacteriol 191, 5253-5261.

Couturier, M., and Lindas, A.C. (2018). The DNA methylome of the hyperthermoacidophilic crenarchaeon Sulfolobus acidocaldarius. Front Microbiol 9, 137.

Curtis, P.D., and Brun, Y.V. (2010). Getting in the loop: regulation of development in Caulobacter crescentus. Microbiol Mol Biol Rev 74, 13-41.

Dassarma, P., and Dassarma, S. (2008). On the origin of prokaryotic" species": the taxonomy of halophilic Archaea. Saline Systems 4, 5.

Dassarma, P., Klebahn, G., and Klebahn, H. (2010). Translation of Henrich Klebahn's 'Damaging agents of the klippfish - a contribution to the knowledge of the salt-loving organisms'. Saline Systems 6, 7.

44

Domian, I.J., Reisenauer, A., and Shapiro, L. (1999). Feedback control of a master bacterial cell- cycle regulator. Proc Natl Acad Sci U S A 96, 6648-6653.

Dryden, D.T., Cooper, L.P., and Murray, N.E. (1993). Purification and characterization of the methyltransferase from the type 1 restriction and modification system of Escherichia coli K12. J Biol Chem 268, 13228-13236.

Dundas, I.D., and Larsen, H. (1962). The physiological role of the carotenoid pigments of Halobacterium salinarium. Archiv für Mikrobiologie 44, 233-239.

Dunn, D.B., and Smith, J.D. (1958). The occurrence of 6-methylaminopurine in deoxyribonucleic acids. Biochem J 68, 627-636.

Dussoix, D., and Arber, W. (1962). Host specificity of DNA produced by Escherichia coli. II. Control over acceptance of DNA from infecting phage lambda. J Mol Biol 5, 37-49.

Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F., Cicero, R., Clark, S., Dalal, R., Dewinter, A., Dixon, J., Foquet, M., Gaertner, A., Hardenbol, P., Heiner, C., Hester, K., Holden, D., Kearns, G., Kong, X., Kuse, R., Lacroix, Y., Lin, S., Lundquist, P., Ma, C., Marks, P., Maxham, M., Murphy, D., Park, I., Pham, T., Phillips, M., Roy, J., Sebra, R., Shen, G., Sorenson, J., Tomaney, A., Travers, K., Trulson, M., Vieceli, J., Wegener, J., Wu, D., Yang, A., Zaccarin, D., Zhao, P., Zhong, F., Korlach, J., and Turner, S. (2009). Real-time DNA sequencing from single polymerase molecules. Science 323, 133-138.

Elevi Bardavid, R., Khristo, P., and Oren, A. (2008). Interrelationships between Dunaliella and halophilic prokaryotes in saltern crystallizer ponds. Extremophiles 12, 5-14.

Elevi Bardavid, R., and Oren, A. (2008). Dihydroxyacetone metabolism in Salinibacter ruber and in Haloquadratum walsbyi. Extremophiles 12, 125-131.

Elevi Bardavid, R., and Oren, A. (2012). The amino acid composition of proteins from anaerobic halophilic bacteria of the order Halanaerobiales. Extremophiles 16, 567-572.

Ershova, A.S., Rusinov, I.S., Spirin, S.A., Karyagina, A.S., and Alexeevski, A.V. (2015). Role of restriction-modification systems in prokaryotic evolution and ecology. Biochemistry (Mosc) 80, 1373-1386.

Erwin, A.L., Sandstedt, S.A., Bonthuis, P.J., Geelhood, J.L., Nelson, K.L., Unrath, W.C., Diggle, M.A., Theodore, M.J., Pleatman, C.R., Mothershed, E.A., Sacchi, C.T., Mayer, L.W., Gilsdorf, J.R., and Smith, A.L. (2008). Analysis of genetic relatedness of Haemophilus influenzae isolates by multilocus sequence typing. J Bacteriol 190, 1473-1483.

45

Falb, M., Pfeiffer, F., Palm, P., Rodewald, K., Hickmann, V., Tittor, J., and Oesterhelt, D. (2005). Living with two extremes: conclusions from the genome sequence of Natronomonas pharaonis. Genome Res 15, 1336-1343.

Farlow, W.G. (1880). On the nature of the peculiar reddening of salted codfish during the summer season. Washington, DC: US Commission of Fish and Fisheries, Government Printing Office.

Fendrihan, S., Dornmayr‐Pfaffenhuemer, M., Gerbl, F., Holzinger, A., Grösbacher, M., Briza, P., Erler, A., Gruber, C., Plätzer, K., and Stan‐Lotter, H. (2012). Spherical particles of halophilic archaea correlate with exposure to low water activity–implications for microbial survival in fluid inclusions of ancient halite. Geobiology 10, 424-433.

Fendrihan, S., Legat, A., Pfaffenhuemer, M., Gruber, C., Weidler, G., Gerbl, F., and Stan-Lotter, H. (2006). Extremely halophilic archaea and the issue of long-term microbial survival. Rev Environ Sci Biotechnol 5, 203-218.

Fernandez, A.B., Vera-Gargallo, B., Sanchez-Porro, C., Ghai, R., Papke, R.T., Rodriguez- Valera, F., and Ventosa, A. (2014). Comparison of prokaryotic community structure from Mediterranean and Atlantic saltern concentrator ponds by a metagenomic approach. Front Microbiol 5, 196.

Fioravanti, A., Fumeaux, C., Mohapatra, S.S., Bompard, C., Brilli, M., Frandi, A., Castric, V., Villeret, V., Viollier, P.H., and Biondi, E.G. (2013). DNA binding of the cell cycle transcriptional regulator GcrA depends on N6-adenosine methylation in Caulobacter crescentus and other Alphaproteobacteria. PLoS Genet 9, e1003541.

Flusberg, B.A., Webster, D.R., Lee, J.H., Travers, K.J., Olivares, E.C., Clark, T.A., Korlach, J., and Turner, S.W. (2010). Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7, 461-465.

Frommer, M., Mcdonald, L.E., Millar, D.S., Collis, C.M., Watt, F., Grigg, G.W., Molloy, P.L., and Paul, C.L. (1992). A genomic sequencing protocol that yields a positive display of 5- methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 89, 1827- 1831.

Fuller-Pace, F.V., and Murray, N.E. (1986). Two DNA recognition domains of the specificity polypeptides of a family of type I restriction enzymes. Proc Natl Acad Sci U S A 83, 9368-9372.

Fullmer, M.S., Soucy, S.M., Swithers, K.S., Makkay, A.M., Wheeler, R., Ventosa, A., Gogarten, J.P., and Papke, R.T. (2014). Population and genomic analysis of the genus Halorubrum. Front Microbiol 5, 140.

46

Gavrieli, I., Beyth, M., and Yechieli, Y. (1998). The Dead Sea—A terminal lake in the Dead Sea rift: a short overview. Microbiology and Biogeochemistry of hypersaline environments, 121-127.

Gonzalez, D., and Collier, J. (2013). DNA methylation by CcrM activates the transcription of two genes required for the division of Caulobacter crescentus. Mol Microbiol 88, 203- 218.

Gough, J.A., and Murray, N.E. (1983). Sequence diversity among related genes for recognition of specific targets in DNA molecules. J Mol Biol 166, 1-19.

Grogan, D.W. (2003). Cytosine methylation by the SuaI restriction-modification system: implications for genetic fidelity in a hyperthermophilic archaeon. J Bacteriol 185, 4657- 4661.

Grote, M., and O'malley, M.A. (2011). Enlightening the life sciences: the history of halobacterial and microbial rhodopsin research. FEMS Microbiol Rev 35, 1082-1099.

Groussin, M., Boussau, B., Szöllõsi, G., Eme, L., Gouy, M., Brochier-Armanet, C., and Daubin, V. (2015). Gene acquisitions from bacteria at the origins of major archaeal clades are vastly overestimated. Molecular biology and evolution 33, 305-310.

Gupta, R.S., Naushad, S., and Baker, S. (2015). Phylogenomic analyses and molecular signatures for the class Halobacteria and its two major clades: a proposal for division of the class Halobacteria into an emended order Halobacteriales and two new orders, Haloferacales ord. nov. and Natrialbales ord. nov., containing the novel families Haloferacaceae fam. nov. and Natrialbaceae fam. nov. Int J Syst Evol Microbiol 65, 1050-1069.

Gupta, R.S., Naushad, S., Fabros, R., and Adeolu, M. (2016). A phylogenomic reappraisal of family-level divisions within the class Halobacteria: proposal to divide the order Halobacteriales into the families Halobacteriaceae, Haloarculaceae fam. nov., and Halococcaceae fam. nov., and the order Haloferacales into the families, Haloferacaceae and Halorubraceae fam nov. Antonie Van Leeuwenhoek 109, 565-587.

Harrison, F.C., and Kennedy, M.E. (1922). The red discolouration of cured codfish. Trans R Soc Can Sect III 16, 101-152.

Hartman, A.L., Norais, C., Badger, J.H., Delmas, S., Haldenby, S., Madupu, R., Robinson, J., Khouri, H., Ren, Q., Lowe, T.M., Maupin-Furlow, J., Pohlschroder, M., Daniels, C., Pfeiffer, F., Allers, T., and Eisen, J.A. (2010). The complete genome sequence of Haloferax volcanii DS2, a model archaeon. PLoS One 5, e9605.

Hattman, S., Brooks, J.E., and Masurekar, M. (1978). Sequence specificity of the P1 modification methylase (M.Eco P1) and the DNA methylase (M.Eco dam) controlled by the Escherichia coli dam gene. J Mol Biol 126, 367-380.

47

Hawkins, M., Malla, S., Blythe, M.J., Nieduszynski, C.A., and Allers, T. (2013). Accelerated growth in the absence of DNA replication origins. Nature 503, 544-547.

Holmes, M.L., Nuttall, S.D., and Dyall-Smith, M.L. (1991). Construction and use of halobacterial shuttle vectors and further studies on Haloferax DNA gyrase. J Bacteriol 173, 3807-3813.

Horiuchi, K., and Zinder, N.D. (1972). Cleavage of bacteriophage fl DNA by the restriction enzyme of Escherichia coli B. Proc Natl Acad Sci U S A 69, 3220-3224.

Hotchkiss, R.D. (1948). The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J Biol Chem 175, 315-332.

Houwink, A.L. (1956). Flagella, gas vacuoles and cell-wall structure in Halobacterium halobium; an electron microscope study. J Gen Microbiol 15, 146-150.

Humbelin, M., Suri, B., Rao, D.N., Hornby, D.P., Eberle, H., Pripfl, T., Kenel, S., and Bickle, T.A. (1988). Type III DNA restriction and modification systems EcoP1 and EcoP15. Nucleotide sequence of the EcoP1 operon, the EcoP15 mod gene and some EcoP1 mod mutants. J Mol Biol 200, 23-29.

Ishikawa, K., Watanabe, M., Kuroita, T., Uchiyama, I., Bujnicki, J.M., Kawakami, B., Tanokura, M., and Kobayashi, I. (2005). Discovery of a novel restriction endonuclease by genome comparison and application of a wheat-germ-based cell-free translation assay: PabI (5'- GTA/C) from the hyperthermophilic archaeon Pyrococcus abyssi. Nucleic Acids Res 33, e112.

Jaakkola, S.T., Ravantti, J.J., Oksanen, H.M., and Bamford, D.H. (2016). Buried alive: microbes from ancient halite. Trends in microbiology 24, 148-160.

Jacob, J.H., Hussein, E.I., Shakhatreh, M.a.K., and Cornelison, C.T. (2017). Microbial community analysis of the hypersaline water of the Dead Sea using high-throughput amplicon sequencing. MicrobiologyOpen 6.

Janscak, P., Sandmeier, U., Szczelkun, M.D., and Bickle, T.A. (2001). Subunit assembly and mode of DNA cleavage of the type III restriction endonucleases EcoP1I and EcoP15I. J Mol Biol 306, 417-431.

Janulaitis, A., Petrusyte, M., Maneliene, Z., Klimasauskas, S., and Butkus, V. (1992). Purification and properties of the Eco57I restriction endonuclease and methylase-- prototypes of a new class (type IV). Nucleic Acids Res 20, 6043-6049.

Kamekura, M. (1993). Lipids of extreme halophiles. The Biology of Halophilic Bacteria, 135- 161.

48

Kang, S., Lee, H., Han, J.S., and Hwang, D.S. (1999). Interaction of SeqA and Dam methylase on the hemimethylated origin of Escherichia coli chromosomal DNA replication. J Biol Chem 274, 11463-11468.

Kelleher, J.E., Daniel, A.S., and Murray, N.E. (1991). Mutations that confer de novo activity upon a maintenance methyltransferase. J Mol Biol 221, 431-440.

Kennedy, S.P., Ng, W.V., Salzberg, S.L., Hood, L., and Dassarma, S. (2001). Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res 11, 1641-1650.

Klimasauskas, S., Timinskas, A., Menkevicius, S., Butkiene, D., Butkus, V., and Janulaitis, A. (1989). Sequence motifs characteristic of DNA[cytosine-N4]methyltransferases: similarity to adenine and cytosine-C5 DNA-methylases. Nucleic Acids Res 17, 9823- 9832.

Kobayashi, I. (2001). Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res 29, 3742-3756.

Koonin, E.V., Makarova, K.S., and Zhang, F. (2017). Diversity, classification and evolution of CRISPR-Cas systems. Current opinion in microbiology 37, 67-78.

Korlach, J., and Turner, S.W. (2012). Going beyond five bases in DNA sequencing. Curr Opin Struct Biol 22, 251-261.

Kuhnlein, U., and Arber, W. (1972). Host specificity of DNA produced by Escherichia coli. XV. The role of nucleotide methylation in in vitro B-specific modification. J Mol Biol 63, 9- 19.

Kuhnlein, U., Linn, S., and Arber, W. (1969). Host specificity of DNA produced by Escherichia coli. XI. In vitro modification of phage fd replicative form. Proc Natl Acad Sci U S A 63, 556-562.

Larrabee, R.D. (1959). Spectral emissivity of tungsten. Journal of the Optical Society of America 49, 619-625.

Li, G.M. (2008). Mechanisms and functions of DNA mismatch repair. Cell Res 18, 85-98.

Li, M., Liu, H., Han, J., Liu, J., Wang, R., Zhao, D., Zhou, J., and Xiang, H. (2013). Characterization of CRISPR RNA biogenesis and Cas6 cleavage-mediated inhibition of a provirus in the haloarchaeon Haloferax mediterranei. Journal of bacteriology 195, 867- 875.

Lin, E.A., Zhang, X.S., Levine, S.M., Gill, S.R., Falush, D., and Blaser, M.J. (2009). Natural transformation of helicobacter pylori involves the integration of short DNA fragments interrupted by gaps of variable size. PLoS Pathog 5, e1000337.

49

Lippman, Z., Gendrel, A.V., Colot, V., and Martienssen, R. (2005). Profiling DNA methylation patterns using genomic tiling microarrays. Nat Methods 2, 219-224.

Lister, R., and Ecker, J.R. (2009). Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-966.

Liu, Y.P., Tang, Q., Zhang, J.Z., Tian, L.F., Gao, P., and Yan, X.X. (2017). Structural basis underlying complex assembly and conformational transition of the type I R-M system. Proc Natl Acad Sci U S A 114, 11151-11156.

Lobner-Olesen, A., Skovgaard, O., and Marinus, M.G. (2005). Dam methylation: coordinating cellular processes. Curr Opin Microbiol 8, 154-160.

Lodwick, D., Ross, H.N., Harris, J.E., Almond, J.W., and Grant, W.D. (1986). dam methylation in the archaebacteria. J Gen Microbiol 132, 3055-3059.

Loenen, W.A. (2003). Tracking EcoKI and DNA fifty years on: a golden story full of surprises. Nucleic Acids Res 31, 7059-7069.

Loenen, W.A., Dryden, D.T., Raleigh, E.A., and Wilson, G.G. (2014). Type I restriction enzymes and their relatives. Nucleic Acids Res 42, 20-44.

Loenen, W.A., and Raleigh, E.A. (2014). The other face of restriction: modification-dependent enzymes. Nucleic Acids Res 42, 56-69.

Lowenstein, T.K., Timofeeff, M.N., Brennan, S.T., Hardie, L.A., and Demicco, R.V. (2001). Oscillations in Phanerozoic seawater chemistry: Evidence from fluid inclusions. Science 294, 1086-1088.

Lunnen, K.D., Morgan, R.D., Timan, C.J., Krzycki, J.A., Reeve, J.N., and Wilson, G.G. (1989). Characterization and cloning of MwoI (GCN7GC), a new type-II restriction-modification system from Methanobacterium wolfei. Gene 77, 11-19.

Luria, S.E., and Human, M.L. (1952). A nonhereditary, host-induced variation of bacterial viruses. J Bacteriol 64, 557-569.

Magrum, L.J., Luehrsen, K.R., and Woese, C.R. (1978). Are extreme halophiles actually “bacteria”? Journal of Molecular Evolution 11, 1-8.

Maier, L.-K., Stachler, A.-E., Brendel, J., Stoll, B., Fischer, S., Haas, K.A., Schwarz, T.S., Alkhnbashi, O.S., Sharma, K., and Urlaub, H. (2019). The nuts and bolts of the Haloferax CRISPR-Cas system IB. RNA biology 16, 469-480.

Malone, T., Blumenthal, R.M., and Cheng, X. (1995). Structure-guided analysis reveals nine sequence motifs conserved among DNA amino-methyltransferases, and suggests a catalytic mechanism for these enzymes. J Mol Biol 253, 618-632.

50

Marinus, M.G., and Morris, N.R. (1973). Isolation of deoxyribonucleic acid methylase mutants of Escherichia coli K-12. J Bacteriol 114, 1143-1150.

Marinus, M.G., and Morris, N.R. (1974). Biological function for 6-methyladenine residues in the DNA of Escherichia coli K12. J Mol Biol 85, 309-322.

Martínez-Espinosa, R.M., Marhuenda-Egea, F.C., and Bonete, M.a.J. (2001a). Assimilatory nitrate reductase from the haloarchaeon Haloferax mediterranei: purification and characterisation. FEMS microbiology letters 204, 381-385.

Martínez-Espinosa, R.M., Marhuenda-Egea, F.C., and Bonete, M.J. (2001b). Purification and characterisation of a possible assimilatory nitrite reductase from the halophile archaeon Haloferax mediterranei. FEMS microbiology letters 196, 113-118.

Mavromatis, K., Ivanova, N., Anderson, I., Lykidis, A., Hooper, S.D., Sun, H., Kunin, V., Lapidus, A., Hugenholtz, P., and Patel, B. (2009). Genome analysis of the anaerobic thermohalophilic bacterium Halothermothrix orenii. PLoS One 4, e4192.

Mcconnell, D.J., Searcy, D.G., and Sutcliffe, J.G. (1978). A restriction enzyme Tha I from the thermophilic mycoplasma Thermoplasma acidophilum. Nucleic Acids Res 5, 1729-1739.

Méheust, R., Watson, A.K., Lapointe, F.-J., Papke, R.T., Lopez, P., and Bapteste, E. (2018). Hundreds of novel composite genes and chimeric genes with bacterial origins contributed to haloarchaeal evolution. Genome biology 19, 75.

Meselson, M., and Yuan, R. (1968). DNA restriction enzyme from E. coli. Nature 217, 1110- 1114.

Morgan, R.D., Dwinell, E.A., Bhatia, T.K., Lang, E.M., and Luyten, Y.A. (2009). The MmeI family: type II restriction-modification enzymes that employ single-strand modification for host protection. Nucleic Acids Res 37, 5208-5221.

Mullakhanbhai, M.F., and Larsen, H. (1975). Halobacterium volcanii spec. nov., a Dead Sea halobacterium with a moderate salt requirement. Arch Microbiol 104, 207-214.

Murray, I.A., Clark, T.A., Morgan, R.D., Boitano, M., Anton, B.P., Luong, K., Fomenkov, A., Turner, S.W., Korlach, J., and Roberts, R.J. (2012). The methylomes of six bacteria. Nucleic Acids Res 40, 11450-11462.

Naor, A., Lapierre, P., Mevarech, M., Papke, R.T., and Gophna, U. (2012). Low species barriers in halophilic archaea and the formation of recombinant hybrids. Curr Biol 22, 1444-1448.

Nelson-Sathi, S., Dagan, T., Landan, G., Janssen, A., Steel, M., Mcinerney, J.O., Deppenmeier, U., and Martin, W.F. (2012). Acquisition of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc Natl Acad Sci U S A 109, 20537-20542.

51

Nelson-Sathi, S., Sousa, F.L., Roettger, M., Lozada-Chávez, N., Thiergart, T., Janssen, A., Bryant, D., Landan, G., Schönheit, P., and Siebers, B. (2015). Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517, 77.

Ng, W.V., Kennedy, S.P., Mahairas, G.G., Berquist, B., Pan, M., Shukla, H.D., Lasky, S.R., Baliga, N.S., Thorsson, V., Sbrogna, J., Swartzell, S., Weir, D., Hall, J., Dahl, T.A., Welti, R., Goo, Y.A., Leithauser, B., Keller, K., Cruz, R., Danson, M.J., Hough, D.W., Maddocks, D.G., Jablonski, P.E., Krebs, M.P., Angevine, C.M., Dale, H., Isenbarger, T.A., Peck, R.F., Pohlschroder, M., Spudich, J.L., Jung, K.W., Alam, M., Freitas, T., Hou, S., Daniels, C.J., Dennis, P.P., Omer, A.D., Ebhardt, H., Lowe, T.M., Liang, P., Riley, M., Hood, L., and Dassarma, S. (2000). Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci U S A 97, 12176-12181.

Nolling, J., and De Vos, W.M. (1992a). Characterization of the archaeal, plasmid-encoded type II restriction-modification system MthTI from Methanobacterium thermoformicicum THF: homology to the bacterial NgoPII system from Neisseria gonorrhoeae. J Bacteriol 174, 5719-5726.

Nolling, J., and De Vos, W.M. (1992b). Identification of the CTAG-recognizing restriction- modification systems MthZI and MthFI from Methanobacterium thermoformicicum and characterization of the plasmid-encoded mthZIM gene. Nucleic Acids Res 20, 5047-5052.

Norais, C., Hawkins, M., Hartman, A.L., Eisen, J.A., Myllykallio, H., and Allers, T. (2007). Genetic and physical mapping of DNA replication origins in Haloferax volcanii. PLoS Genet 3, e77.

Oh, D., Porter, K., Russ, B., Burns, D., and Dyall-Smith, M. (2010). Diversity of Haloquadratum and other haloarchaea in three, geographically distant, Australian saltern crystallizer ponds. Extremophiles 14, 161-169.

Ohno, S., Handa, N., Watanabe-Matsui, M., Takahashi, N., and Kobayashi, I. (2008). Maintenance forced by a restriction-modification system can be modulated by a region in its modification enzyme not essential for methyltransferase activity. J Bacteriol 190, 2039-2049.

Oren, A. (1993). The Dead Sea—alive again. Experientia 49, 518-522.

Oren, A. (1999). Bioenergetic aspects of halophilism. Microbiol Mol Biol Rev 63, 334-348.

Oren, A. (2009a). Microbial diversity and microbial abundance in salt-saturated brines: why are the waters of hypersaline lakes red? Natural Resources and Environmental Issues 15, 49.

Oren, A. (2009b). Saltern evaporation ponds as model systems for the study of primary production processes under hypersaline conditions. Aquatic Microbial Ecology 56, 193- 204.

52

Oren, A. (2012). Taxonomy of the family Halobacteriaceae: a paradigm for changing concepts in prokaryote systematics. Int J Syst Evol Microbiol 62, 263-271.

Oren, A. (2013). Life at high salt concentrations, intracellular KCl concentrations, and acidic proteomes. Frontiers in microbiology 4, 315.

Oren, A. (2017). Glycerol metabolism in hypersaline environments. Environ Microbiol 19, 851- 863.

Oren, A., Arahal, D.R., and Ventosa, A. (2009). Emended descriptions of genera of the family Halobacteriaceae. Int J Syst Evol Microbiol 59, 637-642.

Oren, A., and Dubinsky, Z. (1994). On the red coloration of saltern crystallizer ponds. II. Additional evidence for the contribution of halobacterial pigments. International Journal of Salt Lake Research 3, 9-13.

Oren, A., and Hallsworth, J.E. (2014). Microbial weeds in hypersaline habitats: the enigma of the weed-like Haloferax mediterranei. FEMS microbiology letters 359, 134-142.

Oren, A., and Rodriguez-Valera, F. (2001). The contribution of halophilic Bacteria to the red coloration of saltern crystallizer ponds. FEMS Microbiol Ecol 36, 123-130.

Oren, A., and Shilo, M. (1982). Population dynamics of Dunaliella parva in the Dead Sea. Limnology and Oceanography 27, 201-211.

Oren, A., Stambler, N., and Dubinsky, Z. (1992). On the red coloration of saltern crystallizer ponds. International Journal of Salt Lake Research 1, 77-89.

Ortenberg, R., Tchlet, R., and Mevarech, M. (1998). A model for the genetic exchange system of the extremely halophilic archaeon Haloferax volcanii. Microbiology and biogeochemistry of hypersaline environments 5, 331.

Papke, R.T., White, E., Reddy, P., Weigel, G., Kamekura, M., Minegishi, H., Usami, R., and Ventosa, A. (2011). A multilocus sequence analysis approach to the phylogeny and taxonomy of the Halobacteriales. International journal of systematic and evolutionary microbiology 61, 2984-2995.

Papke, R.T., Zhaxybayeva, O., Feil, E.J., Sommerfeld, K., Muise, D., and Doolittle, W.F. (2007). Searching for species in haloarchaea. Proc Natl Acad Sci U S A 104, 14092-14097.

Park, J., Vreeland, R., Cho, B., Lowenstein, T., Timofeeff, M., and Rosenzweig, W. (2009). Haloarchaeal diversity in 23, 121 and 419 MYA salts. Geobiology 7, 515-523.

Pasic, L., Bartual, S.G., Ulrih, N.P., Grabnar, M., and Velikonja, B.H. (2005). Diversity of halophilic archaea in the crystallizers of an Adriatic solar saltern. FEMS Microbiol Ecol 54, 491-498.

53

Patterson, N.H., and Pauling, C. (1985). Evidence for two restriction-modification systems in Halobacterium cutirubrum. J Bacteriol 163, 783-784.

Pingoud, A., Wilson, G.G., and Wende, W. (2014). Type II restriction endonucleases--a historical perspective and more. Nucleic Acids Res 42, 7489-7527.

Posfai, J., Bhagwat, A.S., Posfai, G., and Roberts, R.J. (1989). Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res 17, 2421-2435.

Prangishvili, D.A., Vashakidze, R.P., Chelidze, M.G., and Gabriadze, I. (1985). A restriction endonuclease SuaI from the thermoacidophilic archaebacterium Sulfolobus acidocaldarius. FEBS Lett 192, 57-60.

Rao, D.N., Dryden, D.T., and Bheemanaik, S. (2014). Type III restriction-modification enzymes: a historical perspective. Nucleic Acids Res 42, 45-55.

Rawls, K.S., Martin, J.H., and Maupin-Furlow, J.A. (2011). Activity and transcriptional regulation of bacterial protein-like glycerol-3-phosphate dehydrogenase of the haloarchaea in Haloferax volcanii. J Bacteriol 193, 4469-4476.

Reisenauer, A., and Shapiro, L. (2002). DNA methylation affects the cell cycle transcription of the CtrA global regulator in Caulobacter. EMBO J 21, 4969-4977.

Reistad, R. (1970). On the composition and nature of the bulk protein of extremely halophilic bacteria. Archiv für Mikrobiologie 71, 353-360.

Rhodes, M.E., Oren, A., and House, C.H. (2012). Dynamics and persistence of Dead Sea microbial populations as shown by high-throughput sequencing of rRNA. Appl. Environ. Microbiol. 78, 2489-2492.

Roberts, R.J., Belfort, M., Bestor, T., Bhagwat, A.S., Bickle, T.A., Bitinaite, J., Blumenthal, R.M., Degtyarev, S., Dryden, D.T., Dybvig, K., Firman, K., Gromova, E.S., Gumport, R.I., Halford, S.E., Hattman, S., Heitman, J., Hornby, D.P., Janulaitis, A., Jeltsch, A., Josephsen, J., Kiss, A., Klaenhammer, T.R., Kobayashi, I., Kong, H., Kruger, D.H., Lacks, S., Marinus, M.G., Miyahara, M., Morgan, R.D., Murray, N.E., Nagaraja, V., Piekarowicz, A., Pingoud, A., Raleigh, E., Rao, D.N., Reich, N., Repin, V.E., Selker, E.U., Shaw, P.C., Stein, D.C., Stoddard, B.L., Szybalski, W., Trautner, T.A., Van Etten, J.L., Vitor, J.M., Wilson, G.G., and Xu, S.Y. (2003). A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res 31, 1805-1812.

Rodrigo-Banos, M., Garbayo, I., Vilchez, C., Bonete, M.J., and Martinez-Espinosa, R.M. (2015). Carotenoids from haloarchaea and their potential in biotechnology. Mar Drugs 13, 5508- 5532.

54

Rodriguez-Valera, F., Ventosa, A., Juez, G., and Imhoff, J.F. (1985). Variation of environmental features and microbial populations with salt concentrations in a multi-pond saltern. Microbial Ecology 11, 107-115.

Roer, L., Aarestrup, F.M., and Hasman, H. (2015). The EcoKI type I restriction-modification system in Escherichia coli affects but is not an absolute barrier for conjugation. J Bacteriol 197, 337-342.

Rosenshine, I., Tchelet, R., and Mevarech, M. (1989). The mechanism of DNA transfer in the mating system of an archaebacterium. Science 245, 1387-1389.

Sanchez-Romero, M.A., Busby, S.J., Dyer, N.P., Ott, S., Millard, A.D., and Grainger, D.C. (2010). Dynamic distribution of SeqA protein across the chromosome of Escherichia coli K-12. MBio 1.

Sankaranarayanan, K., Lowenstein, T.K., Timofeeff, M.N., Schubert, B.A., and Lum, J.K. (2014). Characterization of ancient DNA supports long-term survival of Haloarchaea. Astrobiology 14, 553-560.

Satterfield, C.L., Lowenstein, T.K., Vreeland, R.H., and Rosenzweig, W.D. (2005). Paleobrine temperatures, chemistries, and paleoenvironments of Silurian Salina Formation F-1 Salt, Michigan Basin, USA, from petrography and fluid inclusions in halite. Journal of Sedimentary Research 75, 534-546.

Schmid, K., Thomm, M., Laminet, A., Laue, F.G., Kessler, C., Stetter, K.O., and Schmitt, R. (1984). Three new restriction endonucleases MaeI, MaeII and MaeIII from Methanococcus aeolicus. Nucleic Acids Res 12, 2619-2628.

Schubert, B.A., Lowenstein, T.K., Timofeeff, M.N., and Parker, M.A. (2010a). Halophilic archaea cultured from ancient halite, Death Valley, California. Environmental microbiology 12, 440-454.

Schubert, B.A., Timofeeff, M.N., Lowenstein, T.K., and Polle, J.E. (2010b). Dunaliella cells in fluid inclusions in halite: significance for long-term survival of prokaryotes. Geomicrobiology Journal 27, 61-75.

Seshasayee, A.S., Singh, P., and Krishna, S. (2012). Context-dependent conservation of DNA methyltransferases in bacteria. Nucleic Acids Res 40, 7066-7073.

Shalev, Y., Turgeman-Grott, I., Tamir, A., Eichler, J., and Gophna, U. (2017). Cell surface glycosylation is required for efficient mating of Haloferax volcanii. Frontiers in microbiology 8, 1253.

Sherwood, K.E., Cano, D.J., and Maupin-Furlow, J.A. (2009). Glycerol-mediated repression of glucose metabolism and glycerol kinase as the sole route of glycerol catabolism in the haloarchaeon Haloferax volcanii. J Bacteriol 191, 4307-4315.

55

Smith, H.O., Gwinn, M.L., and Salzberg, S.L. (1999). DNA uptake signal sequences in naturally transformable bacteria. Res Microbiol 150, 603-616.

Soppa, J. (2011). Ploidy and gene conversion in Archaea. Biochem Soc Trans 39, 150-154.

Soppa, J. (2013). Evolutionary advantages of polyploidy in halophilic archaea. Biochem Soc Trans 41, 339-343.

Stephens, C., Reisenauer, A., Wright, R., and Shapiro, L. (1996). A cell cycle-regulated bacterial DNA methyltransferase is essential for viability. Proc Natl Acad Sci U S A 93, 1210- 1214.

Stephens, C.M., Zweiger, G., and Shapiro, L. (1995). Coordinate cell cycle control of a Caulobacter DNA methyltransferase and the flagellar genetic hierarchy. J Bacteriol 177, 1662-1669.

Studier, F.W., and Bandyopadhyay, P.K. (1988). Model for how type I restriction enzymes select cleavage sites in DNA. Proc Natl Acad Sci U S A 85, 4677-4681.

Thomas, C.M., and Nielsen, K.M. (2005). Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 3, 711-721.

Thombre, R.S., Shinde, V.D., Oke, R.S., Dhar, S.K., and Shouche, Y.S. (2016). Biology and survival of extremely halophilic archaeon Haloarcula marismortui RR12 isolated from Mumbai salterns, India in response to salinity stress. Sci Rep 6, 25642.

Thomm, M., Frey, G., Bolton, B.J., Laue, F., Kessler, C., and Stetter, K.O. (1988). Mvn I: a restriction enzyme in the archaebacterium Methanococcus vannielii. FEMS microbiology letters 52, 229-233.

Tock, M.R., and Dryden, D.T. (2005). The biology of restriction and anti-restriction. Curr Opin Microbiol 8, 466-472.

Torreblanca, M., Rodriguez-Valera, F., Juez, G., Ventosa, A., Kamekura, M., and Kates, M. (1986). Classification of non-alkaliphilic halobacteria based on numerical taxonomy and polar lipid composition, and description of Haloarcula gen. nov. and Haloferax gen. nov. Systematic and Applied Microbiology 8, 89-99.

Turgeman-Grott, I., Joseph, S., Marton, S., Eizenshtein, K., Naor, A., Soucy, S.M., Stachler, A.- E., Shalev, Y., Zarkor, M., and Reshef, L. (2019). Pervasive acquisition of CRISPR memory driven by inter-species mating of archaea can limit gene transfer and influence speciation. Nature microbiology 4, 177.

Volcani, B. (1944). The microorganisms of the Dead Sea. Papers collected to commemorate the 70th anniversary of Dr. Chaim Weizmann 71, 85.

56

Vreeland, R., Jones, J., Monson, A., Rosenzweig, W., Lowenstein, T., Timofeeff, M., Satterfield, C., Cho, B., Park, J., and Wallace, A. (2007). Isolation of live Cretaceous (121–112 million years old) halophilic Archaea from primary salt crystals. Geomicrobiology Journal 24, 275-282.

Waldron, D.E., Owen, P., and Dorman, C.J. (2002). Competitive interaction of the OxyR DNA- binding protein and the Dam methylase at the antigen 43 gene regulatory region in Escherichia coli. Mol Microbiol 44, 509-520.

Walsh, D.A., Bapteste, E., Kamekura, M., and Doolittle, W.F. (2004). Evolution of the RNA polymerase B′ subunit gene (rpoB′) in Halobacteriales: a complementary molecular marker to the SSU rRNA gene. Molecular biology and evolution 21, 2340-2351.

Watanabe, M., Yuzawa, H., Handa, N., and Kobayashi, I. (2006). Hyperthermophilic DNA methyltransferase M.PabI from the archaeon Pyrococcus abyssi. Appl Environ Microbiol 72, 5367-5375.

Weber, M., Davies, J.J., Wittig, D., Oakeley, E.J., Haase, M., Lam, W.L., and Schubeler, D. (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 37, 853-862.

Welsh, K.M., Lu, A.L., Clark, S., and Modrich, P. (1987). Isolation and characterization of the Escherichia coli mutH gene product. J Biol Chem 262, 15624-15629.

Willerslev, E., and Cooper, A. (2004). Ancient DNA. Proceedings of the Royal Society B: Biological Sciences 272, 3-16.

Willerslev, E., Hansen, A.J., and Poinar, H.N. (2004). Isolation of nucleic acids and cultures from fossil ice and permafrost. Trends in Ecology & Evolution 19, 141-147.

Woese, C.R., and Fox, G.E. (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A 74, 5088-5090.

Yang, H., Wu, Z., Liu, J., Liu, X., Wang, L., Cai, S., and Xiang, H. (2015). Activation of a dormant replication origin is essential for Haloferax mediterranei lacking the primary origins. Nat Commun 6, 8321.

Zerulla, K., Chimileski, S., Nather, D., Gophna, U., Papke, R.T., and Soppa, J. (2014). DNA as a phosphate storage polymer and the alternative advantages of polyploidy for growth or survival. PLoS One 9, e94819.

Zweiger, G., Marczynski, G., and Shapiro, L. (1994). A Caulobacter DNA methyltransferase that functions only in the predivisional cell. J Mol Biol 235, 472-485.

57

Chapter 2. Dihydroxyacetone metabolism in Haloferax volcanii

Matthew Ouellette, Andrea M. Makkay, and R. Thane Papke

This chapter examines the ability of the model haloarchaeon, Haloferax volcanii, to

metabolize dihydroxyacetone. This research was published in Frontiers in Microbiology in 2013.

The purpose of this work was to learn how to grow and manage cultures of H. volcanii, as well as become more familiar with the genetic deletion system of the organism. The experience gained in working with H. volcanii in this chapter was helpful for studying DNA methylation and RM systems in later chapters using this organism. Regarding my contributions to this chapter, I designed the experiments and wrote the manuscript with assistance from Andrea M.

Makkay and R. Thane Papke. I also conducted the research, with assistance from Andrea M.

Makkay.

Citation:

Ouellette, M., Makkay, A.M., & Papke, R.T. (2013) Dihydroxyacetone metabolism in

Haloferax volcanii. Front. Microbiol., 4(376). doi: 10.3389/fmicb.2013.00376

http://www.frontiersin.org/extreme_microbiology/10.3389/fmicb.2013.00376/abstract

58

ORIGINAL RESEARCH ARTICLE published: 16 December 2013 doi: 10.3389/fmicb.2013.00376 Dihydroxyacetone metabolism in Haloferax volcanii

Matthew Ouellette , Andrea M. Makkay and R. Thane Papke*

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA

Edited by: Dihydroxyacetone (DHA) is a ketose sugar that can be produced by oxidizing glycerol. DHA Aharon Oren, The Hebrew in the environment is taken up and phosphorylated to DHA-phosphate by glycerol kinase University of Jerusalem, Israel or DHA kinase. In hypersaline environments, it is hypothesized that DHA is produced Reviewed by: as an overflow product from glycerol utilization by organisms such as Salinibacter ruber. Jerry Eichler, Ben Gurion University of the Negev, Israel Previous research has demonstrated that the halobacterial species Haloquadratum walsbyi Maria-Jose Bonete, University of can use DHA as a carbon source, and putative DHA kinase genes were hypothesized to be Alicante, Spain involved in this process. However, DHA metabolism has not been demonstrated in other *Correspondence: halobacterial species, and the role of the DHA kinase genes was not confirmed. In this R. Thane Papke, Department study, we examined the metabolism of DHA in Haloferax volcanii because putative DHA Molecular and Cell Biology, University of Connecticut, 91 N. kinase genes were annotated in its genome, and it has an established genetic system Eagleville Rd. Unit 3125, Storrs, to assay growth of mutant knockouts. Experiments in which Hfx. volcanii was grown CT 06268, USA on DHA as the sole carbon source demonstrated growth, and that it is concentration e-mail: [email protected] dependent. Three annotated DHA kinase genes (HVO_1544, HVO_1545, and HVO_1546), which are homologous to the putative DHA kinase genes present in Hqm. walsbyi,as well as the glycerol kinase gene (HVO_1541), were deleted to examine the effect of these genes on the growth of Hfx. volcanii on DHA. Experiments demonstrated that the DHA kinase deletion mutant exhibited diminished, but not absence of growth on DHA compared to the parent strain. Deletion of the glycerol kinase gene also reduced growth on DHA, and did so more than deletion of the DHA kinase. The results indicate that Hfx. volcanii can metabolize DHA and that DHA kinase plays a role in this metabolism. However, the glycerol kinase appears to be the primary enzyme involved in this process. BLASTp analyses demonstrate that the DHA kinase genes are patchily distributed among the Halobacteria, whereas the glycerol kinase gene is widely distributed, suggesting a widespread capability for DHA metabolism.

Keywords: dihydroxyacetone metabolism, dihydroxyacetone kinase, glycerol kinase, archaea, Halobacteria, Haloarchaea

INTRODUCTION be phosphorylated and subsequently metabolized. Two types of Dihydroxyacetone (DHA) is a simple ketose sugar commonly kinases phosphorylate DHA: glycerol kinase and DHA kinase. used in sunless tanning lotions and sprays (Faurschou et al., Glycerol kinase is considered less specific, and it is capable of 2004). DHA can be used as a carbon source by many different phosphorylating both glycerol and DHA using ATP (Hayashi and bacteria, yeast, and protists, and there are a number of differ- Lin, 1967; Weinhouse and Benziman, 1976; Jin et al., 1982). DHA ent pathways in which it can be produced. In bacteria such kinase is more specific, and it is only able to phosphorylate DHA as Klebsiella pneumoniae, DHA is produced anaerobically via and its isomer, D-glyceraldehyde (Erni et al., 2006). There are two glycerol oxidation by an NAD-dependent glycerol dehydroge- major families of DHA kinases. The first consists of two subunits nase (Forage and Lin, 1982). Gluconobacter oxydans and related (DhaK and DhaL) and which are ATP-dependent. The DhaK sub- bacteria also use glycerol oxidation to produce DHA, but they unit binds to the DHA substrate, and the DhaL subunit binds to utilize a glycerol dehydrogenase that is pyrroloquinoline quinone ATP and transfers a phosphate group from ATP to DhaK-DHA (PQQ)-dependent and attached to the outer membrane. This (Daniel et al., 1995; Siebold et al., 2003). In the second family, the pathway releases the DHA directly into the surrounding envi- DHA kinases are made up of three subunits (DhaK, DhaL, and ronment, which makes the Gluconobacter bacteria useful for DhaM) and are phosphoenolpyruvate (PEP)-dependent. This industrial production of DHA (Deppenmeier et al., 2002). DHA family of DHA kinases uses the PEP:sugar phosphotransferase can also be produced by methylotrophic yeast such as Candida system (PTS) to transfer a phosphate group from PEP to the boidinii by first oxidizing methanol to formaldehyde, after which DhaM subunit, a multidomain protein with one domain pre- a pyrophosphate-dependent transketolase transfers a two-carbon dicted to be a member of the mannose (EIIAMan) family of the hydroxyethyl group to the formaldehyde to form DHA (Waites PTS (Gutknecht et al., 2001; Zurbriggen et al., 2008). The DhaM and Quayle, 1981). then transfers the phosphate group to DhaL, which picks up the Once DHA is obtained by a cell either via glycerol oxida- phosphate using an ADP cofactor bound to the subunit (Bachler tion or uptake from the surrounding environment, it can then et al., 2005). The phosphate is then transferred from DhaL to

www.frontiersin.org December 2013 | Volume 4 | Article 376 | 1 59 Ouellette et al. DHA metabolism in Halobacteria

the DhaK subunit, which phosphorylates the bound DHA sub- MATERIALS AND METHODS strate to DHA phosphate. The ATP-dependent family of DHA STRAINS AND GROWTH CONDITIONS kinases is present in eukaryotes and some bacteria, whereas the Strains and plasmids used in this study are listed in Table 1. PEP-dependent family of DHA kinases is present only in bacteria All Hfx. volcanii strains were grown in either Hv-YPC or Hv- and archaea (Erni et al., 2006). CA medium at 42◦C while shaking at 200 rpm. Hv-YPC and DHA has been hypothesized as a potential carbon source in Hv-CA media were produced using the formulas outlined in hypersaline environments for heterotrophic halobacterial species The Halohandbook (Dyall-Smith, 2009). Hv-min medium used (Elevi Bardavid et al., 2008). This hypothesis is supported by in growth experiments was modified from the formula in The previous studies on glycerol oxidation in Salinibacter ruber,a Halohandbook to exclude a carbon source (Hv-min -C). Media halophilic bacterium common in hypersaline environments. In were supplemented with uracil (50 µg/mL) and 5-fluoroorotic astudybySher et al. (2004), which examined the oxidation of acid (50 µg/mL) as needed. For growth on Petri plates, 2% agar radio-labeled glycerol by S. ruber, an unknown soluble product (w/v) was added to the media. consisting of 20% of the radioactivity from the added glycerol All Escherichia coli strains were grown in either S.O.C. media was observed to be excreted by the cells. This soluble product was or LB-media at 37◦C while shaking at 200 rpm. S.O.C. media was later analyzed in a study by Elevi Bardavid and Oren (2008) using provided by Clontech (Cat. # 636763) and New England BioLabs a colorimetric assay, and was identified as DHA; indicating that (Cat. # B9020S). LB medium was produced by adding 5 g NaCl, S. ruber could produce DHA in hypersaline environments as an 5 g tryptone, and 2.5 g of yeast extract to deionized water to a overflow product via glycerol oxidation. final volume of 500 mL and pH set to 7.0. LB was supplemented The ability of Haloquadratum walsbyi, a common halobacterial with ampicillin (100 µg/mL) as needed. When LB cell culture species, to metabolize DHA further supports the hypothesis that plates were produced, 1.5% agar (w/v) was added. LB plates were DHA is a carbon source in hypersaline environments. Hqm. wals- supplemented with 40 µL of X-gal (20 mg/mL) as needed. byi was first hypothesized to metabolize DHA after examination of the sequenced genome in a study Bolhuis et al. (2006) identi- PCR AND DNA ISOLATION fied an uptake system for DHA involving three genes (HQ2672A, All primers used in this study are listed in Table 2.DNAused HQ2673A, and HQ2674A) encoding the subunits of a puta- for plasmid construction and screening was amplified via PCR. tive PEP-dependent DHA kinase. The DHA kinase encoded by Reactions for PCR were assembled as 10 µL volumes and con- these genes was hypothesized to use a phosphate group from the tained the following reagents: 5.9 µLofdeionizedwater,2µLof PTS system to phosphorylate DHA to DHA phosphate, which 5x GC Phusion buffer (Thermo Scientific, Cat. # F-519), 1 µL could then be incorporated into the metabolism of the cell. Elevi of 100% DMSO (Thermo Scientific, Cat. # TS-20684), 0.4 µL Bardavid and Oren (2008) tested DHA metabolism in Hqm. of 10 mM dNTP (Promega, Cat. # U1511), 0.2 µLof10µM walsbyi by adding DHA to a cell culture of Hqm. walsbyi and mea- forward primer, 0.2 µLof10µM reverse primer, 0.2 µLoftem- suring the change in DHA concentration over time. A decrease plate DNA, and 0.1 µL of Phusion High-Fidelity DNA Polymerase in DHA concentration was observed, indicating that the DHA (Thermo Scientific, Cat. # F-530S). When needed, water was sub- was being taken up and metabolized by the Hqm. walsbyi stituted with 20% acetamide. The reactions were performed in a cultures. Mastercycler EP Gradient (Eppendorf) with the following cycle: Overall, the current evidence supports a model where halobac- a DNA melting step at 94◦C for 22 s, an annealing step at 58.1◦C terial species Hqm. walsbyi metabolizes DHA in hypersaline for35s,andanextensionstepat72◦C for 90 s. This cycle was environments produced by S. ruber; however, there is still lit- repeated 40 times, after which a final annealing step at 72◦Cfor tle known about DHA metabolism in Halobacteria. While DHA 5 min was performed. Template DNA included Hfx. volcanii DS2 metabolism has been observed to occur in Hqm. walsbyi,no genomic DNA (20 ng/µL), plasmid DNA listed in Table 1,and other halobacterial species has been shown to be able to metab- DNA from E. coli and Hfx. volcanii colonies. olize DHA. Additionally, the putative DHA kinase genes in Hqm. Gel electrophoresis was performed to separate and analyze walsbyi were never confirmed to be involved in DHA phosphory- the PCR products using 0.8% (w/v) agarose in 1 × TAE buffer lation and metabolism. In this study, we sought to elucidate our (40mMTrisacetate,2mMEDTA).Aftergelelectrophoresis,PCR understanding of halobacterial metabolism of DHA by examin- products were excised from the gel and purified using the Wizard ing DHA utilization in Haloferax volcanii, a halobacterial species SV Gel and PCR Clean-Up System (Promega). Plasmids from isolated from Dead Sea sediment (Mullakhanbhai and Larsen, E. coli strains were extracted and purified using the PureYield 1975). We used Hfx. volcanii because it has three putative PEP- Plasmid Miniprep System (Promega). Plasmids linearized via dependent DHA kinase genes that are homologous to Hqm. digestion with restriction enzymes (BamHI, HindIII, XhoI, or walsbyi (Anderson et al., 2011), and it has an established genetic XbaI) were also purified using the Wizard SV Gel and PCR system that can be used to delete genes and test their function Clean-Up System. (Bitan-Banin et al., 2003; Allers et al., 2004; Blaby et al., 2010). We also used DHA metabolism genes in Hfx. volcanii to search GENE DELETION IN Hfx. volanii the other sequenced halobacterial genomes to better understand Three Hfx. volcanii genes (dhaKLM; HVO_1544, HVO_1545, the distribution of these genes among the Halobacteria. Our data and HVO_1546), which encode homologs to the putative DHA provide important new insights into the metabolism of DHA in kinase genes in Hqm. walsbyi,andaglycerolkinasegene(glpK; halobacterial organisms. HVO_1541), were targeted for deletion in Hfx. volcanii strain H26

Frontiers in Microbiology | Extreme Microbiology December 2013 | Volume 4 | Article 376 | 2 60 Ouellette et al. DHA metabolism in Halobacteria

Table 1 | List of plasmids and strains used in this study.

Plasmid or Strain Description References pTA131 Cloning vector used for gene deletion in Hfx. volcanii. Contains lacZ cloning site, ampicillin Allers et al., 2004 resistance gene for screening in E. coli and pyrE2 gene for screening in Hfx. volcanii. pTA409 Cloning vector used for gene complementation in Hfx. volcanii. Contains lacZ cloning site, Holzle et al., 2008 ampicillin resistance gene for screening in E. coli and pyrE2 gene for screening in Hfx. volcanii. pdhaKLM Derivative of pTA131 used to delete dhaKLM in Hfx. volcanii. This study pglpK Derivative of pTA131 used to delete glpK in Hfx. volcanii. This study pdhaKLM Derivative of pTA409 used to complement dhaKLM in dhaKLM strain. This study pglpK Derivative of pTA409 used to complement glpK in glpK strain. This study HST08 An E. coli strain used for screening of constructed plasmids. Clontech, Cat. # 636763 dam−/dcm− An E. coli strain used to demethylate constructed plasmids. New England BioLabs, Cat. # C2925H H26 Uracil auxotrophic strain of Hfx. volcanii. Allers et al., 2004 dhaKLM Derivative strain of H26 with dhaKLM operon deleted. This study dhaKLM + pdhaKLM Derivative strain of dhaKLM with complementation of dhaKLM operon. This study glpK Derivative strain of H26 with glpK gene deleted. This study glpK + pglpK Derivative strain of glpK with complementation of glpK gene. This study dhaKLM glpK Derivative strain of H26 with dhaKLM operon and glpK gene deleted. This study dhaKLM glpK + pdhaKLM Derivative strain of dhaKLM glpK with complementation of dhaKLM operon. This study dhaKLM glpK + pglpK Derivative strain of dhaKLM glpK with complementation of glpK gene. This study

Table 2 | List of primers used in this study.

Primer name Description Sequence dhaKLM_FR1_F Used to amplify flanking regions of dhaKLM for insertion 5- CGG TAT CGA TAA GCT GCC CTA CGC ACC CTA CAT G -3 dhaKLM_FR1_R into pTA131 digested with HindIII and BamHI to delete 5- TAG AAC TAG TGG ATC GCC TTC GGC TAC CCG CTC AT -3 dhaKLM_FR2_F the operon. 5- GGA ATT CTA CCA GGC TCT GCG CTG AAC CGG CCG AA -3 dhaKLM_FR2_R 5- GCC TGG TAG AAT TCC GAC TCA CCG TCC CTC ACG TT -3 dhaKLMF Used to amplify dhaKLM and native promoter for 5- TAG AAC TAG TGG ATC AGG CGG TCG CGC GTT TCC GT -3 dhaKLMR insertion into pTA409 digested with BamHI and XhoI to 5- CGG GCC CCC CCT CGA ATC AGT TCA GCT TCC GGT AGT CGC G -3 complement the operon. glpK_FR1F Used to amplify flanking regions of glpK for insertion into 5- CGG GCC CCC CCT CGA TCG ACG ACC AGG CGT -3 glpK_FR1R pTA131 digested with XhoI and XbaI to delete gene 5- TGG CGG CCG CTC TAG ACG ATG ACA ACG ATG T -3 glpK_FR2F [external primers based on designs from Sherwood et al. 5- GCC TGG GCA GAT CTC AAC ACG TGT TCG AAG -3 (2009)]. glpK_FR2R 5- GAG ATC TGC CCA GGC TTC TAA CCA ACC TCG ATA CG -3 glpKF Used to amplify glpK and native promoter for insertion 5- CGG GCC CCC CCT CGA CGC ACA ACT GAC GAA CGG GA -3 glpKR into pTA409 digested with BamHI and XhoI to 5- TAG AAC TAG TGG ATC TTA TTC CTC CCG TGC CCA GTC -3 complement gene. using the In-Fusion HD Cloning Kit (Clontech). The strategy # 636763), according to the directions of the provider, and were for gene deletion was based on the methodology outlined in a plated on LB-amp plates with X-gal. White colonies were screened study by Blaby et al. (2010) with a few modifications. Flanking via colony PCR using the external primers of the target gene regions of the targeted genes were developed to be between 800 flanking regions. Confirmed deletion plasmids (listed in Table 1) and 1000 bp in length. The 15-bp linker used to combine the were subcloned in dam−/dcm− Competent E. coli (New England flanking regions was altered to so that EcoRI and BstOI sites BioLabs, Cat. # C2925H) to produce demethylated plasmids for were included for the dhaKLM deletion linker and BglI and BstOI transformation of Hfx. volcanii. Hfx. volcanii H26 colonies were sites were included for the glpK deletion linker. The pTA131 was screened for deleted genes via PCR using the external primers linearized with HindIII and BamHI for the dhaKLM deletion of the target gene flanking regions. The size of PCR products of and XhoI and XbaI for the glpK deletion. Constructed plasmids screened cells were compared to those produced with wild-type were transformed into Stellar Competent Cells (Clontech, Cat. DNA (Figure 1). Smaller product size indicated that the gene had

www.frontiersin.org December 2013 | Volume 4 | Article 376 | 3 61 Ouellette et al. DHA metabolism in Halobacteria

a blank. Three wells of each culture were treated with 10 µLof either 0.1 M DHA (final concentration of 5 mM DHA), 0.05 M DHA (final concentration of 2.5 mM DHA), 0.02 M DHA (final concentration of 1 mM DHA), or deionized water (negative con- trol). The 96-well plate was then placed into a Multiscan FC plate reader (Fisher Scientific), which incubated the plate at 42◦Cwhile shaking it at low speed. The plate reader measured the OD620 of each well every hour for 72 h.

BIOINFORMATICS The amino acid sequences of the Hfx. volcanii putative DHA kinase gene dhaK (HVO_1546) and glycerol kinase gene glpK were used to perform BLASTp (http://blast.ncbi.nlm.nih.gov/ Blast.cgi) searches of the NCBI database to determine other halobacterial species with DHA kinase and glycerol kinase genes. FIGURE 1 | PCR analysis of H26, dhaKLM, glpK,anddhaKLM The amino acid sequences were retrieved from the NCBI database glpK. Analysis examined presence or absence of the dhaKLM operon (dhaK GI number 292655696; glpK GI number 292655691). The (lanes 2–5) and glpK (lanes 6–9). Lane 1 contained exACTGene Mid Range search was restricted to the Halobacteriales (taxid 2235) with Plus DNA Ladder (Fisher Scientific). Lanes 2 and 6 contained amplicons an E-value cut-off of 1e-20. Reciprocal BLASTp was performed from H26. Lanes 3 and 7 contained amplicons from dhaKLM. Lanes 4 and 8 contained amplicons from glpK. Lanes 5 and 9 contained amplicons to analyze only orthologous genes. The halobacterial genomes from dhaKLM glpK. queriedinthisBLASTpsearcharelistedinTable 3.

RESULTS been deleted. The Hfx. volcanii H26 deletion strains produced by DHA KINASE IS PATCHILY DISTRIBUTED AMONG THE HALOBACTERIA this process are listed in Table 1. Three DHA kinase genes (HQ2672A, HQ2673A, and HQ2674A) have been annotated in the genome of Hqm. walsbyi (Bolhuis COMPLEMENTATION OF DELETED GENES et al., 2006), a halobacterial species which is able to metabolize The dhaKLM and glpK genes deleted in Hfx. volcanii H26 were external DHA (Elevi Bardavid and Oren, 2008). Homologs of resuscitated by constructing complementation plasmids. Primers these three genes are also annotated in Hfx. volcanii (HVO_1544, were designed which amplified the upstream native promoter HVO_1545, and HVO_1546). In order to determine the and the coding region of the targeted genes in Hfx. volcanii. prevalence of DHA kinase genes among the Halobacteria, the The primers were also designed to have 15 bp of homology with Hfx. volcanii dhaK gene (HVO_1546) was used to perform a pTA409. Restriction digestion of pTA409 was performed using BLASTp search against the database of Halobacteria genomes BamHI and XhoI to linearize the plasmid. After the linearized available on NCBI. The search yielded significant hits among pTA409 and gene fragments were gel-purified, the DNA frag- 31 different halobacterial species (Table 4). Except for Haloferax ments were combined together using the In-Fusion HD Cloning larsenii and Haloferax elongans,allqueriedHaloferax species Kit according to the instructions of the provider. The constructed yielded significant hits in the BLASTp search. Species from the plasmids were cloned, screened, and demethylated as described Halobiforma, Halococcus, Halorubrum,andNatronococcus genera in the above gene deletion protocol. Purified constructed plas- also yielded significant hits, but not all queried species from these mids (listed in Table 1) were then transformed into the Hfx. genera produced results. All representatives from the genera volcanii H26 deletion strains using the PEG mediated transfor- Haladaptatus, Halalkalicoccus, Halarchaeum, Haloquadratum, mation of Haloarchaea protocol from The Halohandbook.PCR Halosarcina,andSalinarchaeum yielded significant hits. was used to confirm transformation success. The Hfx. volcanii Halobacteria genera that did not yield significant hits in the complementation strains produced by this process are listed in BLASTp search (E-value cut-off of 1e-20) include Haloarcula, Table 1. Halobacterium, , Halogeometricum, Halogranum, Halomicrobium, Halopiger, Haloplanus, Halorhabdus, DHA GROWTH EXPERIMENTS Halosimplex, Halostagnicola, Haloterrigena, Halovivax, Natrialba, Hfx. volcanii strains listed in Table 1 were grown to late- Natrinema, Natronobacterium, Natronolimnobius, Natronomonas, exponential phase (OD600 =∼0.6 − 0.8) in Hv-YPC medium. and Natronorubrum. The cell cultures were then centrifuged at 3220 RCF for 15 min and resuspended in Hv-min -C media supplemented with uracil. GROWTH ON DHA IN Hfx. volcanii IS CONCENTRATION DEPENDENT Centrifugation was repeated a total of three times to wash the cells Although putative DHA kinase genes are present in Hfx. volcanii, of residual Hv-YPC media. During the final resuspension of the no previous research has demonstrated that Hfx. volcanii is able cells in Hv-min -C media, the cell cultures were diluted to OD600 to grow on DHA as a carbon source. Therefore, experiments were ∼0.01. Each cell culture was then distributed into the wells of performed to test the growth of Hfx. volcanii strain H26 on 5 mM, a 96-well plate, with each well receiving 190 µL of cell culture. 2.5 mM, and 1 mM DHA. The results indicated that H26 was Also, 200 µL of Hv-min -C was added to the plate to be used as capable of growth on DHA as the sole carbon source. The cell

Frontiers in Microbiology | Extreme Microbiology December 2013 | Volume 4 | Article 376 | 4 62 Ouellette et al. DHA metabolism in Halobacteria

Table 3 | List of halobacterial genomes queried in BLASTp search.

Queried halobacterial genomes

Haladaptatus paucihalophilus Haloferax larsenii JCM 13917 Halorubrum aidingense JCM 13560 Natrialba asiatica DSM 12278 DX253

Halalkalicoccus jeotgali B3 Haloferax lucentense DSM 14919 Halorubrum tebenquichense DSM Natrialba chahannaoensis JCM 10990 14210

Halarchaeum acidiphilum Haloferax denitrificans ATCC 35960 Halorubrum terrestre JCM 10247 Natrialba hulunbeirensis JCM 10989 MH1-52-1

Haloarcula amylolytica JCM Haloferax elongans ATCC BAA-1513 Halorubrum arcis JCM 13916 Natrialba magadii ATCC 43099 13557

Haloarcula argentinensis Haloferax gibbonsii ATCC 33959 Halorubrum californiensis DSM 19288 Natrialba taiwanensis DSM 12281 DSM 12282

Haloarcula californiae ATCC Haloferax mediterranei ATCC 33500 Halorubrum coriense DSM 10284 Natrinema altunense JCM 12890 33799

Haloarcula hispanica ATCC Haloferax mucosum ATCC Halorubrum distributum JCM 9100 Natrinema gari JCM 14663 33960 BAA-1512

Haloarcula japonica DSM Haloferax prahovense DSM 18310 Halorubrum ezzemoulense DSM Natrinema pallidum DSM 3751 6131 17463

Haloarcula marismortui ATCC Haloferax sulfurifontis ATCC Halorubrum hochstenium ATCC Natrinema pellirubrum DSM 15624 43049 BAA-897 700873

Haloarcula sinaiiensis ATCC Haloferax volcanii DS2 Halorubrum kocurii JCM 14978 Natrinema versiforme JCM 10478 33800

Haloarcula vallismortis ATCC Haloferax sp. ATCC BAA-644 Halorubrum lacusprofundi ATCC Natrinema sp. CX2021 29715 49239

Haloarcula sp. AS7094 Haloferax sp. ATCC BAA-645 Halorubrum lipolyticum DSM 21995 Natrinema sp. J7-1

Halobacterium salinarum Haloferax sp. ATCC BAA-646 Halorubrum litoreum JCM 13561 Natrinema sp. J7-2 NRC-1

Halobacterium sp. DL1 Haloferax sp. BAB2207 Halorubrum saccharovorum DSM Natronobacterium gregoryi SP2 1137

Halobacterium sp. GN101 Halogeometricum borinquense Halorubrum sp. T3 Natronobacterium sp. AS-7091 DSM 11551

Halobaculum gomorrense Halogranum salarium B-1 Halosarcina pallida JCM 14848 Natronococcus amylolyticus DSM JCM 9908 10524

Halobiforma lacisalsi AJ5 Halomicrobium katesii DSM 19301 Halosimplex carlsbadense 2-9-1 Natronococcus jeotgali DSM 18795

Halobiforma nitratireducens Halomicrobium mukohataei DSM Halostagnicola larsenii XH-48 Natronococcus occultus SP4 JCM 10879 12286

Halococcus hamelinensis Halopiger xanaduensis SH-6 Haloterrigena limicola JCM 13563 Natronolimnobius innermongolicus 100A6 JCM 12255

Halococcus morrhuae DSM Halopiger sp. IIH2 Haloterrigena salina JCM 13891 Natronomonas moolapensis 8.8.11 1307

Halococcus saccharolyticus Halopiger sp. IIH3 Haloterrigena thermotolerans DSM Natronomonas pharaonis DSM 2160 DSM 5350 11522

(Continued)

www.frontiersin.org December 2013 | Volume 4 | Article 376 | 5 63 Ouellette et al. DHA metabolism in Halobacteria

Table 3 | Continued

Queried halobacterial genomes

Halococcus salifodinae DSM Haloplanus natans DSM 1798 Haloterrigena turkmenica DSM 5511 Natronorubrum bangense JCM 8989 10635

Halococcus thailandensis Haloquadratum walsbyi DSM 16790 Halovivax asiaticus JCM 14624 Natronorubrum sulfidifaciens JCM JCM 13552 14089

Halococcus sp. 197A Halorhabdus tiamatea SARL4B Halovivax ruber XH-70 Natronorubrum tibetense GA33

Haloferax alexandrinus JCM Halorhabdus utahensis DSM 12940 Natrialba aegyptia DSM 13077 Salinarchaeum sp. Harcht-Bsk1 10717

Table 4 | Results of BLASTp search using dhaK (Performed on July 29, 2013).

Species name GI number E-value Species name GI number E-value

Haloferax volcanii DS2 292655696 0.0 Natronococcus amylolyticus DSM 10524 491710546 1e-169 Haloferax sp. BAB2207 493648700 0.0 Halarchaeum acidiphilum MH1-52-1 519064717 2e-169 Haloferax alexandrinus JCM 10717 445742333 0.0 Halogranum salarium B-1 496767283 3e-165 Haloferax sulfurifontis ATCC BAA-897 494484188 0.0 Halorubrum lipolyticum DSM 21995 495278338 7e-165 Haloferax lucentense DSM 14919 490164612 0.0 Halorubrum sp. T3 515912844 2e-164 Haloferax denitrificans ATCC 35960 491112269 0.0 Halorubrum kocurii JCM 14978 496125287 3e-164 Haloferax mediterranei ATCC 33500 389847061 0.0 Halococcus hamelinensis 100A6 494968649 2e-162 Haloferax sp. ATCC BAA-644 445718309 0.0 Halosarcina pallida JCM 14848 495659148 2e-160 Haloferax sp. ATCC BAA-645 445712370 0.0 Halorubrum lacusprofundi ATCC 49239 222479879 6e-158 Haloferax sp. ATCC BAA-646 495849737 0.0 Halococcus saccharolyticus DSM 5350 492981238 3e-157 Haloferax gibbonsii ATCC 33959 491118466 0.0 Halorubrum aidingense JCM 13560 495274943 3e-157 Haloferax prahovense DSM 18310 445719493 0.0 Haloquadratum walsbyi DSM 16790 110668578 2e-154 Haloferax mucosum ATCC BAA-1512 495592772 0.0 Halorubrum coriense DSM 10284 493055434 6e-154 Haladaptatus paucihalophilus DX253 495255891 0.0 Salinarchaeum sp. Harcht-Bsk1 495690630 2e-150 Halobiforma lacisalsi AJ5 494236904 9e-180 Halalkalicoccus jeotgali B3 300710867 3e-145 Natronococcus jeotgali DSM 18795 495699224 2e-178 density at which H26 reached stationary phase was also depen- this growth deficiency (Figure 3). However, the dhaKLM was dent on the initial concentration of DHA provided to the cells still capable of growth on DHA, exhibiting a 33% decrease in (Figure 2). H26 cells grown in medium supplemented with 1 mM growth compared to H26. These results indicate that the dhaKLM DHA reached stationary phase at the lowest cell density, whereas genesareusedbyHfx. volcanii in DHA metabolism, most likely cells grown with the highest tested concentration of 5 mM DHA for the phosphorylation of DHA to DHA phosphate, and that reached stationary phase at the highest cell density. These data the genes are apparently not essential. Since it is still capable of indicate that growth of Hfx. volcanii on DHA as a carbon source growth on DHA there must be additional genes involved in the is concentration dependent. phosphorylation step.

DHA KINASE IS USED IN DHA METABOLISM IN Hfx. volcanii GLYCEROL KINASE IS MORE IMPORTANT THAN DHA KINASE Evidence indicates that Hfx. volcanii,likeHqm. walsbyi, can use In other organisms, glycerol kinase is also capable of phosphory- DHA as a carbon source. Although both organisms have DHA lating DHA (Hayashi and Lin, 1967; Weinhouse and Benziman, kinase genes, no previous studies demonstrated these putative 1976; Jin et al., 1982). Therefore, the other gene involved DHA DHA kinase genes have a role in DHA metabolism. In order metabolism in Hfx. volcanii was hypothesized to be the glycerol to determine that DHA metabolism in Hfx. volcanii utilizes kinase gene glpK (HVO_1542). In order to test this hypothesis, the annotated DHA kinase, the operon dhaKLM (HVO_1544— the glpK gene was deleted in H26. The deletion strain (glpK), HVO_1546) was deleted in Hfx. volcanii strain H26. The growth and its complementation strain (glpK + pglpK), were both of this deletion strain (dhaKLM) on 5 mM DHA was then tested grown on 5 mM DHA along with the parent strain H26. The in comparison to the parent strain H26 as well as a comple- results indicate that the deletion of glpK caused a reduction in mentation strain (dhaKLM +pdhaKLM). The results indicate growth on DHA even greater than deletion of dhaKLM, and that that the deletion of dhaKLM causes a reduction in growth on complementation of the glpK gene restores growth to normal lev- DHA, and that complementation of the deleted genes negates els (Figure 4). In comparison to the parent strain H26, glpK

Frontiers in Microbiology | Extreme Microbiology December 2013 | Volume 4 | Article 376 | 6 64 Ouellette et al. DHA metabolism in Halobacteria

FIGURE 2 | Cell density of H. volcanii H26 at stationary phase vs. FIGURE 4 | Cell density of H26, glpK,andglpK + pglpK at concentration of DHA. Cell density is represented by the average optical stationary phase when grown on 5 mM DHA. Cell density is represented density (OD620) reading of three cell culture replicates. Error bars depict the by the average optical density (OD620) reading of three cell culture standard deviation of the averages. The depicted line represents the line of replicates. Error bars depict the standard deviation of the averages. ANOVA best fit for the data. ANOVA single factor, p < 0.001. single factor, p < 0.001.

The results indicate that the deletion of both kinases abolishes growth on DHA, and that complementation with glycerol kinase restores growth to a greater degree than complementation with DHA kinase (Figure 5). The dhaKLM glpK strain did not exhibit any growth, remaining at the initial OD620 of 0.0035. The dhaKLM glpK + pdhaKLM strain was able to grow on DHA, but demonstrated an 84% decrease compared to the H26 parent strain. The dhaKLM glpK + pglpK was also capable of lim- ited growth on DHA, but demonstrated a 39% growth decrease from H26 and a 390% growth increase compared with dhaKLM glpK + pdhaKLM. Overall, these data confirm that glycerol kinase is more important for DHA metabolism in Hfx. volcanii than DHA kinase.

GLYCEROL KINASE IS WIDELY DISTRIBUTED AMONG THE

FIGURE 3 | Cell density of H26, dhaKLM,anddhaKLM + pdhaKLM HALOBACTERIA at stationary phase when grown on 5 mM DHA. Cell density is Since growth experiments indicated that glycerol kinase has represented by the average optical density (OD620) reading of three cell a significant role in DHA metabolism, the presence of this culture replicates. Error bars depict the standard deviation of the averages. gene in halobacterial species could potentially be a determinant ANOVA single factor, p < 0.05. of DHA metabolism in those species. Although the distribu- tion of glpK homologs has been examined in previous studies (Sherwood et al., 2009; Anderson et al., 2011), a greater num- strain demonstrated an 83% decrease in growth. This decrease ber of halobacterial genomes have become available since those is far greater than the 33% decrease exhibited by the dhaKLM studies. Therefore, the glpK gene in Hfx. volcanii was used to deletion mutant. These results indicate that the glpK gene is perform a BLASTp search against the halobacterial genomes used by Hfx. volcanii in DHA metabolism, and that its role is available on NCBI. The search yielded 90 significant hits among potentially greater than that of the dhaKLM operon. 82 different species of Halobacteria (Table 5), indicating a much In order to further test the roles of the DHA kinase and glyc- wider distribution of glycerol kinase compared to DHA kinase erol kinase in DHA metabolism in Hfx. volcanii,thedhaKLM among the Halobacteria. Six species yielded more than one sig- operon and glpK gene were both deleted in H26. This double nificant hit: Halogeometricum borinquense (3 hits), Haladaptatus deletion mutant (dhaKLM glpK), along with a DHA kinase paucihalophilus (3 hits), Haloferax prahovense (2 hits), Haloferax complementation strain (dhaKLM glpK + pdhaKLM), a glyc- mucosum (2 hits), Haloferax gibbonsii (2 hits), and Natronomonas erol kinase complementation strain (dhaKLM glpK + pglpK), moolapensis (2 hits). The multiple hits indicate the presence and the parent strain H26, were then grown on 5 mM DHA. of glpK paralogs in these species. Only 18 of the 100 queried

www.frontiersin.org December 2013 | Volume 4 | Article 376 | 7 65 Ouellette et al. DHA metabolism in Halobacteria

Dunaliella parva, a halophilic alga that is the most prominent photosynthetic organism in the Dead Sea and is able to produce DHA (Ben-Amotz and Avron, 1974; Oren and Shilo, 1982). Elevi Bardavid and Oren (2008) hypothesized that the Dunaliella cell membrane could be permeable to DHA, allowing excess DHA produced by the cells to leak into the external environment. If D. parva produces a significant amount of DHA overflow, the sub- strate would be readily available for Hfx. volcanii to utilize as a source of carbon. When Elevi Bardavid and Oren (2008) demonstrated that Hqm. walsbyi could utilize DHA as a carbon source, they hypoth- esized that the organism used a system involving a PEP-dependent DHA kinase to phosphorylate DHA to DHA kinase, based on genomic analysis from Bolhuis et al. (2006). However, their study did not demonstrate a direct connection between the putative DHA kinase and DHA metabolism. In our model halobacterial

FIGURE 5 | Cell density of H26, dhaKLM glpK, dhaKLM glpK + organism, Hfx. volcanii, we have demonstrated that DHA kinase pdhaKLM,anddhaKLM glpK + pglpK at stationary phase when is involved in metabolism of DHA. When the DHA kinase operon grown on 5 mM DHA. Cell density is represented by the average optical dhaKLM is deleted, growth of Hfx. volcanii on DHA is impeded, density (OD620) reading of three cell culture replicates. Error bars depict the and complementation of the deleted genes with the dhaKLM < × −9 standard deviation of the averages. ANOVA single factor, p 1 10 . operon restores growth. The growth of Hfx. volcanii is not com- pletely abolished, however, and further analysis using a strain wherein the glycerol kinase gene glpK has been deleted indicates halobacterial species did not yield significant hits: Haloarcula that Hfx. volcanii also uses glycerol kinase for DHA metabolism. sp. AS7094, Halobacterium sp. DL1, Halobacterium sp. GN101, Deletion of the glpK gene reduces growth on DHA more dra- Halobaculum gomorrense, Halococcus sp. 197A, Halopiger sp. IIH2, matically than the dhaKLM deletion,indicatingthattheroleof Halopiger sp. IIH3, Haloplanus natans, Halorubrum ezzemoulense, glycerol kinase is more pronounced in DHA metabolism than that Halosarcina pallida, Halostagnicola larsenii, Halovivax asiaticus, of DHA kinase for Hfx. volcanii. This enzyme primacy is further Halovivax ruber, Natrinema sp. CX2021, Natrinema sp. J7-1, supported by the observation that, in the double deletion mutant Natronobacterium gregoryi, Natronobacterium sp. AS-7091,and dhaKLM glpK, complementation with glpK restores growth Natronomonas pharaonis. It should be noted, however, that only better than complementation with dhaKLM. the genomes of Halovivax ruber, Natronobacterium gregoryi,and The primacy of the glycerol kinase in DHA metabolism is Natronomonas pharaonis are completely sequenced, whereas the unexpected, since DHA kinase is usually the primary enzyme other genomes without significant hits are incomplete, leaving involved in DHA phosphorylation in other organisms due to open the possibility that these species might have glpK homologs. the lower affinity of glycerol kinase for DHA. In Klebsiella pneu- −3 With the exception of Halosarcina pallida, which has an incom- moniae, the glycerol kinase has a Km of 1 × 10 MforDHA, −5 pletely sequenced genome, all halobacterial species that yielded whereas the DHA kinase has a Km of 1 × 10 M(Jin et al., −4 significant hits in the dhaK BLASTp search also yielded significant 1982). The glycerol kinase in E. coli has a Km of 5 × 10 Mfor hits in the glpK BLASTp search. DHA (Hayashi and Lin, 1967), but the DHA kinase has a Km of 4.5 × 10−7 M(Gutknecht et al., 2001). One possible explana- DISCUSSION tion for the primacy of the glycerol kinase in Hfx. volcanii DHA Previously, Hqm. walsbyi was the only halobacterial species metabolism is the glycerol kinase might have a higher affinity than known to be able to utilize DHA as a carbon source (Elevi DHA kinase for DHA. Another possible explanation might be Bardavid and Oren, 2008). In this study, we have identified Hfx. differences in expression of the kinases. DHA kinase might be volcanii as the second halobacterial species known to be capable expressed at lower levels than glycerol kinase early in the Hfx. of metabolizing DHA. When DHA was added to growth medium volcanii growth cycle, which would cause the glycerol kinase to as the sole carbon source, Hfx. volcanii was capable of growth. be the primary DHA phosphorylating enzyme despite a possi- This growth was variable based on the concentration of DHA ble lower affinity for DHA. Later in the growth cycle, however, present in the growth medium. The ability of Hfx. volcanii to Hfx. volcanii may increase expression of DHA kinase, leading to metabolize DHA suggests that the substrate could be an impor- the higher affinity enzyme becoming the new primary enzyme tant carbon source in the Dead Sea environment where Hfx. for DHA phosphorylation. Growth experiments of dhaKLM volcanii naturally lives. Elevi Bardavid and Oren (2008) have sug- glpK + pdhaKLM, in which the strain was grown beyond 72 h gested that Salinibacter might be a source of DHA in hypersaline on 5 mM DHA, support this hypothesis, since growth of the environments, since it can produce DHA as an overflow prod- strain on DHA increased significantly after 80 h, and actually sur- uct. However, Salinibacter has not been identified in the Dead passed dhaKLM glpK + pglpK after 96 h (data not shown). Sea, making it an unlikely candidate for DHA producer. The In-depth analysis into the enzymatic activity and kinetic constants DHA could potentially be produced as an overflow product from of these enzymes toward DHA, as well as their expression levels,

Frontiers in Microbiology | Extreme Microbiology December 2013 | Volume 4 | Article 376 | 8 66 Ouellette et al. DHA metabolism in Halobacteria

Table 5 | Results of BLASTp search using glpK (Performed on September 17, 2013).

Species name GI number E-value Species name GI number E-value

Haloferax volcanii DS2 292655691 0.0 Haloferax mucosum ATCC BAA-1512 445745541 0.0 Haloferax sp. BAB2207 432200129 0.0 Natrialba hulunbeirensis JCM 10989 445640226 0.0 Haloferax lucentense DSM 14919 445722906 0.0 Halarchaeum acidiphilum MH1-52-1 543417579 0.0 Haloferax alexandrinus JCM 10717 445742338 0.0 Halorubrum californiensis DSM 19288 445688091 0.0 Haloferax sp. ATCC BAA-646 445709004 0.0 Halorubrum lipolyticum DSM 21995 445813038 0.0 Haloferax sp. ATCC BAA-645 445712375 0.0 Halorubrum lacusprofundi ATCC 49239 222479549 0.0 Haloferax sp. ATCC BAA-644 445718304 0.0 Salinarchaeum sp. Harcht-Bsk1 510882182 0.0 Haloferax sulfurifontis ATCC BAA-897 445746251 0.0 Natrialba chahannaoensis JCM 10990 445643664 0.0 Haloferax denitrificans ATCC 35960 445749875 0.0 Halorubrum hochstenium ATCC 700873 445701406 0.0 Haloferax prahovense DSM 18310 445719488 0.0 Halomicrobium mukohataei DSM 12286 257388556 0.0 Haloferax elongans ATCC BAA-1513 445734605 0.0 Halorubrum tebenquichense DSM 14210 445687222 0.0 Haloferax larsenii JCM 13917 445729767 0.0 Haloarcula amylolytica JCM 13557 445772086 0.0 Haloferax gibbonsii ATCC 33959 445726194 0.0 Halomicrobium katesii DSM 19301 517069632 0.0 Haloferax mediterranei ATCC 33500 389847056 0.0 Halosimplex carlsbadense 2-9-1 445671661 0.0 Haloferax mucosum ATCC BAA-1512 445747425 0.0 Haloarcula vallismortis ATCC 29715 445755712 0.0 Halogeometricum borinquense DSM 11551 313125210 0.0 Haloarcula argentinensis DSM 12282 445773756 0.0 Halogeometricum borinquense DSM 11551 313126426 0.0 Halorubrum litoreum JCM 13561 445813470 0.0 Halobiforma nitratireducens JCM 10879 445784518 0.0 Haloarcula marismortui ATCC 43049 55377424 0.0 Natrinema pallidum DSM 3751 445622526 0.0 Haloarcula sinaiiensis ATCC 33800 445762583 0.0 Haladaptatus paucihalophilus DX253 320548735 0.0 Haloarcula californiae ATCC 33799 445763060 0.0 Haloterrigena salina JCM 13891 445666802 0.0 Natronolimnobius innermongolicus JCM 12255 445597617 0.0 Haloterrigena thermotolerans DSM 11522 445659630 0.0 Natronorubrum tibetense GA33 445585740 0.0 Natrinema pellirubrum DSM 15624 433590333 0.0 Haloarcula japonica DSM 6131 445778554 0.0 Halococcus morrhuae DSM 1307 445795889 0.0 Natrialba aegyptia DSM 13077 445651647 0.0 Haloterrigena limicola JCM 13563 445665007 0.0 Natrialba taiwanensis DSM 12281 445642534 0.0 Halococcus salifodinae DSM 8989 445798601 0.0 Haloquadratum walsbyi DSM 16790 110667688 0.0 Halococcus hamelinensis 100A6 445790305 0.0 Natrialba magadii ATCC 43099 289580614 0.0 Natrinema sp. J7-2 397773488 0.0 Natrialba asiatica DSM 12278 445650101 0.0 Natrinema altunense JCM 12890 445633695 0.0 Natronomonas moolapensis 8.8.11 452208319 0.0 Natronorubrum sulfidifaciens JCM 14089 445594250 0.0 Haloferax gibbonsii ATCC 33959 445728401 0.0 Natrinema gari JCM 14663 445628815 0.0 Natronococcus jeotgali DSM 18795 445603927 0.0 Halococcus thailandensis JCM 13552 445801492 0.0 Halorubrum saccharovorum DSM 1137 445683831 0.0 Natrinema versiforme JCM 10478 445613765 0.0 Halorhabdus utahensis DSM 12940 257052548 0.0 Halobiforma lacisalsi AJ5 445778236 0.0 Halogranum salarium B-1 399240308 0.0 Haladaptatus paucihalophilus DX253 320549923 0.0 Halorubrum arcis JCM 13916 445822264 0.0 Haloterrigena turkmenica DSM 5511 284166225 0.0 Halorubrum terrestre JCM 10247 445683460 0.0 Haladaptatus paucihalophilus DX253 516847391 0.0 Halorubrum distributum JCM 9100 445698917 0.0 Halalkalicoccus jeotgali B3 300711495 0.0 Haloarcula hispanica ATCC 33960 344211542 0.0 Natronococcus occultus SP4 435847946 0.0 Halorubrum kocurii JCM 14978 445806839 0.0 Halopiger xanaduensis SH-6 336253699 0.0 Halorhabdus tiamatea SARL4B 529078002 0.0 Natronococcus amylolyticus DSM 10524 445599450 0.0 Halorubrum sp. T3 515912305 0.0 Halogeometricum borinquense DSM 11551 445572938 0.0 Halorubrum aidingense JCM 13560 445818937 0.0 Natronorubrum bangense JCM 10635 445597786 0.0 Halorubrum coriense DSM 10284 445694991 0.0 Haloferax prahovense DSM 18310 445713901 0.0 Natronomonas moolapensis 8.8.11 452206238 0.0 Halococcus saccharolyticus DSM 5350 445793423 0.0 Halobacterium salinarum NRC-1 15790841 0.0

would enhance understanding on glycerol kinase primacy in Hfx. Since our data indicate that the dhaKLM genes in Hfx. volcanii volcanii DHA metabolism. are involved in DHA metabolism, the homologs of these genes in DHA metabolism among the Halobacteria may extend beyond other halobacterial species likely also have this function, allow- Hfx. volcanii and Hqm. walsbyi. Our BLASTp results for dhaK ing those species to utilize DHA. Halobacterial species without indicate that 29 other halobacterial species have a DHA kinase DHA kinase might also be capable of utilizing DHA if they pos- gene homologous to dhaK in Hfx. volcanii and Hqm. walsbyi. sess a glpK gene, since our results indicate that glycerol kinase

www.frontiersin.org December 2013 | Volume 4 | Article 376 | 9 67 Ouellette et al. DHA metabolism in Halobacteria

also plays a role in DHA metabolism. BLASTp results for glpK Ben-Amotz, A., and Avron, M. (1974). Isolation, characterization, and par- indicate that 82 halobacterial species have homologs, and 51 of tial purification of a reduced nicotinamide adenine dinucleotide phosphate- these species do not have dhaKLM homologs. We suspect that dependent dihydroxyacetone reductase from the halophilic alga Dunaliella parva. Plant Physiol. 53, 628–631. doi: 10.1104/pp.53.4.628 these species are also able to metabolize DHA. Eighteen halobac- Bitan-Banin, G., Ortenberg, R., and Mevarech, M. (2003). Development of terial species are missing DHA and glycerol kinase genes, sug- a gene knockout system for the halophilic archaeon Haloferax volcanii by gesting that they cannot metabolize DHA. However, only three of use of the pyrE gene. J. Bacteriol. 185, 772–778. doi: 10.1128/JB.185.3.772- those genomes, Halovivax ruber, Natronobacterium gregoryi,and 778.2003 Natronomonas pharaonis,arenotindraftform,leavingopenthe Blaby, I. K., Phillips, G., Blaby-Haas, C. E., Gulig, K. S., El Yacoubi, B., and de Crecy- Lagard, V. (2010). Towards a systems approach in the genetic analysis of archaea: possibility for a near universal distribution of DHA metabolism in accelerating mutant construction and phenotypic analysis in Haloferax volcanii. Halobacteria. Archaea 2010, 426239. doi: 10.1155/2010/426239 The broad taxonomic distribution of DHA and glycerol kinase Bolhuis, H., Palm, P., Wende, A., Falb, M., Rampp, M., Rodriguez-Valera, F., et al. genes among the Halobacteria suggests two interwoven hypothe- (2006). The genome of the square archaeon Haloquadratum walsbyi:lifeatthe ses: (i) DHA is a common carbon source in hypersaline envi- limits of water activity. BMC Genomics 7:169. doi: 10.1186/1471-2164-7-169 Daniel, R., Stuertz, K., and Gottschalk, G. (1995). Biochemical and molecular ronments and (ii) DHA metabolism is widespread among the characterization of the oxidative branch of glycerol utilization by Citrobacter Halobacteria. A study by Elevi Bardavid and Oren (2008) detailed freundii. J. Bacteriol. 177, 4392–4401. the conversion by the halophilic bacterium S. ruber of glycerol Deppenmeier, U., Hoffmeister, M., and Prust, C. (2002). Biochemistry and biotech- to DHA, which was then used as a growth substrate by Hqm. nological applications of Gluconobacter strains. Appl. Microbiol. Biotechnol. 60, 233–242. doi: 10.1007/s00253-002-1114-5 walsbyi. They speculated that DHA could be a common car- Dyall-Smith, M. (2009). Protocols for Haloarchaeal Genetics ©Ver. 7.1.Available bon source due to incomplete oxidation of glycerol, and from online at: http://www.haloarchaea.com/resources/halohandbook it being an intermediate of glycerol synthesis in Dunaliella.Our Elevi Bardavid, R., Khristo, P., and Oren, A. (2008). Interrelationships data demonstrating the extensive incidence of DHA and glycerol between Dunaliella and halophilic prokaryotes in saltern crystallizer ponds. kinase genes provides support for their hypothesis that DHA is a Extremophiles 12, 5–14. doi: 10.1007/s00792-006-0053-y Elevi Bardavid, R., and Oren, A. (2008). Dihydroxyacetone metabolism in common carbon source, and extends it to include that many if not Salinibacter ruber and in Haloquadratum walsbyi. Extremophiles 12, 125–131. most Halobacteria are capable of metabolizing it. However, future doi: 10.1007/s00792-007-0114-x research on DHA production and turnover rates, and analysis on Erni, B., Siebold, C., Christen, S., Srinivas, A., Oberholzer, A., and Baumann, U. strains we predict to have DHA metabolism is necessary to elu- (2006). Small substrate, big surprise: fold, function and phylogeny of dihy- cidate the significance of this substrate to hypersaline ecosystems droxyacetone kinases. Cell. Mol. Life Sci. 63, 890–900. doi: 10.1007/s00018-005- 5518-0 and Halobacteria. Faurschou, A., Janjua, N. R., and Wulf, H. C. (2004). Sun protection effect of dihy- droxyacetone. Arch. Dermatol. 140, 886–887. doi: 10.1001/archderm.140.7.886 AUTHORS CONTRIBUTIONS Forage, R. G., and Lin, E. C. (1982). DHA system mediating aerobic and anaerobic dissimilation of glycerol in Klebsiella pneumoniae NCIB 418. J. Bacteriol. 151, R. Thane Papke, Andrea M. Makkay, and Matthew Ouellette 591–599. conceived the researched and wrote the manuscript. Andrea M. Gutknecht, R., Beutler, R., Garcia-Alles, L. F., Baumann, U., and Erni, B. Makkay and Matthew Ouellette performed the research. (2001). The dihydroxyacetone kinase of Escherichia coli utilizes a phospho- protein instead of ATP as phosphoryl donor. EMBO J. 20, 2480–2486. doi: 10.1093/emboj/20.10.2480 ACKNOWLEDGMENTS Hayashi, S. I., and Lin, E. C. (1967). Purification and properties of glycerol kinase We would like to thank Dr. Thorsten Allers from the University of from Escherichia coli. J. Biol. Chem. 242, 1030–1035. Nottingham for supplying us with Haloferax volcanii strains and Holzle, A., Fischer, S., Heyer, R., Schutz, S., Zacharias, M., Walther, P., et al. (2008). Maturation of the 5S rRNA 5 end is catalyzed in vitro by the plasmids, and Dr. Aharon Oren from the Hebrew University of endonuclease tRNase Z in the archaeon H. volcanii. RNA 14, 928–937. doi: Jerusalem for supplying us with dihydroxyacetone. This research 10.1261/rna.933208 was supported by the National Science Foundation (award num- Jin, R. Z., Forage, R. G., and Lin, E. C. (1982). Glycerol kinase as a substitute for bers, DEB0919290 and DEB0830024) and NASA Astrobiology: dihydroxyacetone kinase in a mutant of Klebsiella pneumoniae. J. Bacteriol. 152, Exobiology and Evolutionary Biology Program Element (Grant 1303–1307. Mullakhanbhai, M. F., and Larsen, H. (1975). Halobacterium volcanii spec. nov., a Number NNX12AD70G). Dead Sea halobacterium with a moderate salt requirement. Arch. Microbiol. 104, 207–214. doi: 10.1007/BF00447326 Oren, A., and Shilo, M. (1982). Population dynamics of Dunaliella parva in the REFERENCES Dead Sea. Limnol. Oceanogr. 27, 201–211. doi: 10.4319/lo.1982.27.2.0201 Allers, T., Ngo, H. P., Mevarech, M., and Lloyd, R. G. (2004). Development of Sher, J., Elevi, R., Mana, L., and Oren, A. (2004). Glycerol metabolism in the additional selectable markers for the halophilic archaeon Haloferax volcanii extremely halophilic bacterium Salinibacter ruber. FEMS Microbiol. Lett. 232, based on the leuB and trpA genes. Appl. Environ. Microbiol. 70, 943–953. doi: 211–215. doi: 10.1016/S0378-1097(04)00077-1 10.1128/AEM.70.2.943-953.2004 Sherwood, K. E., Cano, D. J., and Maupin-Furlow, J. A. (2009). Glycerol-mediated Anderson, I., Scheuner, C., Goker, M., Mavromatis, K., Hooper, S. D., Porat, I., repression of glucose metabolism and glycerol kinase as the sole route of glycerol et al. (2011). Novel insights into the diversity of catabolic metabolism from catabolism in the haloarchaeon Haloferax volcanii. J. Bacteriol. 191, 4307–4315. ten haloarchaeal genomes. PLoS ONE 6:e20237. doi: 10.1371/journal.pone. doi: 10.1128/JB.00131-09 0020237 Siebold, C., Arnold, I., Garcia-Alles, L. F., Baumann, U., and Erni, B. (2003). Bachler, C., Flukiger-Bruhwiler, K., Schneider, P., Bahler, P., and Erni, Crystal structure of the Citrobacter freundii dihydroxyacetone kinase reveals an B. (2005). From ATP as substrate to ADP as coenzyme: functional eight-stranded alpha-helical barrel ATP-binding domain. J. Biol. Chem. 278, evolution of the nucleotide binding subunit of dihydroxyace- 48236–48244. doi: 10.1074/jbc.M305942200 tone kinases. J. Biol. Chem. 280, 18321–18325. doi: 10.1074/jbc. Waites, M. J., and Quayle, J. R. (1981). The interrelation transketolase and M500279200 dihydroxyacetone synthase activities in the methylotrophic yeast Candida

Frontiers in Microbiology | Extreme Microbiology December 2013 | Volume 4 | Article 376 | 10 68 Ouellette et al. DHA metabolism in Halobacteria

boidinii. J. Gen. Microbiol. 124, 309–316. doi: 10.1099/00221287-124- Received: 16 October 2013; paper pending published: 01 November 2013; accepted: 21 2-309 November 2013; published online: 16 December 2013. Weinhouse, H., and Benziman, M. (1976). Phosphorylation of glycerol and dihy- Citation: Ouellette M, Makkay AM and Papke RT (2013) Dihydroxyacetone droxyacetone in Acetobacter xylinum and its possible regulatory role. J. Bacteriol. metabolism in Haloferax volcanii. Front. Microbiol. 4:376. doi: 10.3389/fmicb. 127, 747–754. 2013.00376 Zurbriggen, A., Jeckelmann, J. M., Christen, S., Bieniossek, C., Baumann, U., and This article was submitted to Extreme Microbiology, a section of the journal Frontiers Erni, B. (2008). X-ray structures of the three Lactococcus lactis dihydroxyacetone in Microbiology. kinase subunits and of a transient intersubunit complex. J. Biol. Chem. 283, Copyright © 2013 Ouellette, Makkay and Papke. This is an open-access article dis- 35789–35796. doi: 10.1074/jbc.M804893200 tributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original Conflict of Interest Statement: The authors declare that the research was con- author(s) or licensor are credited and that the original publication in this jour- ducted in the absence of any commercial or financial relationships that could be nal is cited, in accordance with accepted academic practice. No use, distribution or construed as a potential conflict of interest. reproduction is permitted which does not comply with these terms.

www.frontiersin.org December 2013 | Volume 4 | Article 376 | 11 69

Chapter 3. Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein

HVO_A0006

Matthew Ouellette, Laura Jackson, Scott Chimileski, and R. Thane Papke

In this chapter, the genomic methylation patterns (methylome) of Haloferax volcanii are characterized and the restriction-modification (RM) gene HVO_A0006 is examined. This research was published in Frontiers in Microbiology in 2015. The purpose of this work was to sequence the first archaeal methylome, at the time, and use the methylome data to better understand DNA methylation and RM systems in a model halobacterial organism. This work was also performed to characterize the annotated adenine methyltransferase HVO_A0006 in H. volcanii and determine what role, if any, it has in DNA methylation. Regarding my contributions to this chapter, I designed the experiments and trained Laura Jackson to perform the research, with assistance from R. Thane Papke. I worked with Scott Chimileski to perform the bioinformatic analyses, and Scott Chimileski designed the figures. Laura Jackson and I wrote the manuscript, with assistance from Scott Chimileski and R. Thane Papke.

Citation:

Ouellette, M., Jackson, L., Chimileski, S., & Papke, R.T. (2015) Genome-wide DNA

methylation analysis of Haloferax volcanii H26 and identification of DNA

methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006. Front.

Microbiol., 6(251). doi: 10.3389/fmicb.2015.00251

http://journal.frontiersin.org/article/10.3389/fmicb.2015.00251/abstract

70

ORIGINAL RESEARCH published: 08 April 2015 doi: 10.3389/fmicb.2015.00251

Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006

Matthew Ouellette †, Laura Jackson †, Scott Chimileski and R. Thane Papke * Edited by: Jesse Dillon, Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA California State University, Long Beach, USA Reviewed by: Restriction-modification (RM) systems have evolved to protect the cell from invading Antonio Ventosa, DNAs and are composed of two enzymes: a DNA methyltransferase and a restriction University of Sevilla, Spain endonuclease. Although RM systems are present in both archaeal and bacterial Ichizo Kobyashi, University of Tokyo, Japan genomes, DNA methylation in archaea has not been well defined. In order to characterize *Correspondence: the function of RM systems in archaeal species, we have made use of the model R. Thane Papke, haloarchaeon Haloferax volcanii. A genomic DNA methylation analysis of H. volcanii Department Molecular and Cell Biology, University of Connecticut, 91 strain H26 was performed using PacBio single molecule real-time (SMRT) sequencing. N. Eagleville Rd. Unit 3125, Storrs, This analysis was also performed on a strain of H. volcanii in which an annotated CT 06268, USA DNA methyltransferase gene HVO_A0006 was deleted from the genome. Sequence [email protected] analysis of H26 revealed two motifs which are modified in the genome: Cm4TAG and †These authors have contributed m6 equally to this work. GCA BN6VTGC. Analysis of the 1HVO_A0006 strain indicated that it exhibited reduced adenine methylation compared to the parental strain and altered the detected adenine Specialty section: motif. However, protein domain architecture analysis and amino acid alignments revealed This article was submitted to Extreme Microbiology, a section of the journal that HVO_A0006 is homologous only to the N-terminal endonuclease region of Type IIG Frontiers in Microbiology RM proteins and contains a PD-(D/E)XK nuclease motif, suggesting that HVO_A0006 is Received: 15 November 2014 a PD-(D/E)XK nuclease family protein. Further bioinformatic analysis of the HVO_A0006 Accepted: 13 March 2015 gene demonstrated that the gene is rare among the Halobacteria. It is surrounded by two Published: 08 April 2015 HVO_A0006 Citation: transposition genes suggesting that is a fragment of a Type IIG RM gene, Ouellette M, Jackson L, Chimileski S which has likely been acquired through gene transfer, and affects restriction-modification and Papke RT (2015) Genome-wide activity by interacting with another RM system component(s). Here, we present the first DNA methylation analysis of Haloferax volcanii H26 and identification of DNA genome-wide characterization of DNA methylation in an archaeal species and examine methyltransferase related PD-(D/E)XK the function of a DNA methyltransferase related gene HVO_A0006. nuclease family protein HVO_A0006. Front. Microbiol. 6:251. Keywords: DNA methylation, restriction modification system, haloarchaea, Haloferax volcanii, Halobacteria, doi: 10.3389/fmicb.2015.00251 methylome, PD-(D/E)XK nuclease, methylation

Frontiers in Microbiology | www.frontiersin.org 1 April 2015 | Volume 6 | Article 251 71 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

Introduction consist of genes that target methylated recognition sequences (Roberts et al., 2003; Loenen and Raleigh, 2014). Restriction-modification (RM) systems in bacteria and archaea Although RM systems and DNA methylation have been provide individuals the ability to recognize self from non-self extensively studied in bacteria, few studies have examined DNA and function as host defense mechanisms, protecting these systems in archaea. Most research on archaeal RM sys- their genomes from foreign DNA invasion (Arber and Dus- tems has focused on the activity of restriction endonucleases soix, 1962; Meselson et al., 1972; Vasu and Nagaraja, 2013). and methyltransferases in hyperthermophilic organisms, such These systems are composed of a pair of enzymes: a restriction as those belonging to the genus Pyrococcus (Chinen et al., endonuclease and its cognate methyltransferase, which recog- 2000; Ishikawa et al., 2005; Watanabe et al., 2006). Studies nize identical short DNA sequences known as the recognition have also examined cytosine methylation in Sulfolobus acidocal- sequence. Endonucleases typically cleave dsDNA by hydrolyz- darius (Grogan, 2003), the structure of a type I S subunit in ing the phosphodiester bonds at one location after detection of Methanococcus jannaschii (Kim et al., 2005), and the activity of an unmethylated recognition sequence (Loenen et al., 2014b). a type II methyltransferase in a virus infecting Natrialba magadii Methyltransferases catalyze a reaction that transfers a methyl (Baranyi et al., 2000). However, research on RM systems in other group from a cofactor, S-adenosyl-L-methoinine (AdoMet), and archaeal organisms, and the overall role of these systems, has been modifies a specific nucleotide base (Wilson and Murray, 1991; limited. Blumenthal and Cheng, 2002). Three separate types of methy- An organism that could prove useful as a model for archaeal 6 lated bases have been identified: N6 methyl-adenine (m A), C5 RM systems and DNA methylation is Haloferax volcanii DS2 5 4 methyl-cytosine (m C), and M4 methyl-cytosine (m C) (Wion (Kuo et al., 1997; Allers and Ngo, 2003; Allers and Mevarech, and Casadesus, 2006). In nearly all cases, the addition of a methyl 2005; Leigh et al., 2011), an archaeal species of the class Halobac- group to a base in the recognition sequence by the cognate teria first isolated from the Dead Sea (Mullakhanbhai and Larsen, methyltransferase prevents cleavage by the restriction endonucle- 1975). H. volcanii is useful because it is easy to grow in the lab and ase, thus protecting the host DNA from being digested (Kuhnlein has an advanced genetic system (Cline et al., 1989; Allers et al., and Arber, 1972). RM systems also play roles in recombination, 2004, 2010; Blaby et al., 2010). Also, the genome of wild-type where they mediate integration of horizontally transferred DNA strain DS2 has been fully-sequenced (Hartman et al., 2010). into the host genome (Alm et al., 1999; Nobusato et al., 2000) Previous research indicated that H. volcanii DS2 uses DNA and have been implicated in genetic isolation and subsequent methylation to identify its own DNA from foreign sources. A speciation (Jeltsch, 2003). study on DNA extracted from H. volcanii demonstrated that RM systems are categorized into four types based on required it is resistant to digestion from restriction endonucleases, such cofactors and mechanism of activity. Type I RM systems func- as XbaI, which recognize motifs containing CTAG (Charlebois tion as pentamer complexes consisting of two restriction endonu- et al., 1987). These results indicated that H. volcanii DNA was clease (R) subunits, two methyltransferase (M) subunits, and a methylated at CTAG tetranucleotide regions; it has since been sequence specificity (S) subunit. The target DNA sequences are hypothesized that a putative Type II CTAG methyltransferase detected by the two tandem target recognition domains (TRDs) HVO_0794 is responsible for this methylation (Hartman et al., of the S subunit, and each TRD recognizes one half of the 2010). Another study demonstrated that transformation effi- two-part target sites (Loenen et al., 2014a). When the complex ciency in H. volcanii is greater when using unmethylated DNA encounters an unmethylated recognition sequence, the complex from a dam− E. coli strain (Holmes et al., 1991). The difference in binds the DNA and cleaves it at an unpredictable distance from transformation was hypothesized to be the result of cleavage from the binding site in an ATP-dependent manner (Murray, 2000). the putative type IV Mrr restriction endonuclease HVO_0682 Type II RM systems, by contrast, have restriction endonucleases (Hartman et al., 2010). This hypothesis was confirmed in another and methyltransferases that act independently. Type II restric- study in which the HVO_0682 gene was deleted, resulting in tion endonucleases recognize unmethylated sequences (Pingoud higher transformation efficiency when adding methylated DNA et al., 2014). The same recognition sequences are also targeted for (Allers et al., 2010). Overall, this evidence supports the hypothe- modification by the equivalent type II methyltransferases (Mur- sis that archaeal organisms such as the archaeon H. volcanii use phy et al., 2013). There are also different subgroups of Type II RM systems to identify and defend against foreign DNA. systems, which function differently but primarily as indepen- Another potential role for DNA methylation in archaea was dent RM proteins (Roberts et al., 2003). Type IIG RM proteins, uncovered in a recent study on extracellular DNA (eDNA) for example, are fusion proteins containing three domains: an metabolism in H. volcanii (Chimileski et al., 2014). H. volcanii N-terminal restriction endonuclease, a central methyltransferase, was provided with eDNA from different species as a growth sub- and C-terminal site specificity (Roberts et al., 2003, 2014; Morgan strate and was able to grow using its own DNA as a phosphorus et al., 2009; Shen et al., 2011). Type III RM systems are two- source, but not with herring sperm DNA or methylated E. coli subunit enzymes consisting of a methyltransferase (mod) subunit DNA. However, H. volcanii was able to grow on unmethylated and a restriction endonuclease (res) subunit (Bickle and Kruger, E. coli DNA isolated from a dam−/dcm− mutant strain (i.e., 1993; Rao et al., 2014). The recognition sequences of a Type III without methyltransferase genes). Therefore, methylation may RM system are asymmetrical and only modified on one strand of also be used as a means for cells to recognize self and non-self DNA, with cleavage requiring two inversely-oriented, unmethy- when exploiting eDNA for nutritional purposes and possibly for lated recognition sequences (Meisel et al., 1992). Type IV systems natural transformation. Determining the methylation patterns

Frontiers in Microbiology | www.frontiersin.org 2 April 2015 | Volume 6 | Article 251 72 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii of haloarchaeal DNA may shed light on the phenomenon of Deletion of HVO_A0006 Gene discriminatory eDNA metabolism. The annotated adenine methyltransferase gene HVO_A0006 Recently, a new DNA sequencing technique has been devel- (accession number ADE01899) was selected for deletion in oped which can detect methylated bases, known as single H. volcanii strain H53. The gene deletion strategy used in this molecule, real-time (SMRT) sequencing. This process determines study was modified from the methodology previously devel- the sequence of DNA as a new strand is synthesized in real time, oped (Blaby et al., 2010) and uses the In-Fusion HD Cloning and the kinetic signals of incorporated bases can also be mea- Kit (Clontech). The primer sequences used in this study are sured (Flusberg et al., 2010). Unique kinetic signals produced at listed in Table 2. H. volcanii strain H53 was then transformed modified bases during strand synthesis are used to detect methy- with p1HVO_A0006 using the PEG-mediated transformation of lation patterns of sequenced DNA. This process has been used haloarchaea protocol from Cline et al. (1989), Bitan-Banin et al. to sequence the genomic methylation patterns, or methylomes, (2003), Allers et al. (2004), and Blaby et al. (2010). The trans- of several different bacterial species (Fang et al., 2012; Murray formed cells were then plated on Hv-CA agar media without et al., 2012; Lluch-Senar et al., 2013; Furuta et al., 2014; Krebes uracil and incubated for 5–7 days at 42◦C. Colony PCR was per- et al., 2014). In this study, we used SMRT sequencing to char- formed using forward and reverse M13 primers to screen for acterize the methylomes of H. volcanii H26, a laboratory strain pop-ins. The positive colonies from the pop-in screen were plated derived from wild-type strain DS2, and a derivative strain in on Hv-CA plates with 50 µg mL−1 5-FOA and 50 µg mL−1 uracil which the gene HVO_A0006 was deleted. HVO_A0006 is anno- to obtain pop-outs. Final pop-outs were confirmed through PCR tated in the H. volcanii genome as an adenine methyltransferase (see primers in Table 2) as visualized through gel electrophoresis and is located on the native replicon pHV4 (Hartman et al., 2010). (Figure 1). It is predicted to encode a protein that is 219 amino acids in length and with a molecular weight of 24.794 kDa. HVO_A0006 DNA Preparation for PacBio Sequencing 1 was selected for further characterization because, despite being To extract DNA from H. volcanii strains H26 and HVO_A0006 annotated as a methyltransferase, it is not recognized as an RM for PacBio SMRT sequencing, 5 mL of cell cultures in log phase protein by the RM database REBASE (Roberts et al., 2014). were pelleted and resuspended in 5 mL DNA buffer (10 mM Tris- HCl, pH 8.0) to lyse the cells. Lysates were treated with 100 µg mL−1 RNase A and incubated overnight at 42◦C to degrade RNA. Materials and Methods Proteinase K 10 mg mL−1 was added to the lysates to a final concentration of 50 µg mL−1 and incubated at 37◦C for 1 h to Strains and Growth Conditions degrade protein, followed by ethanol precipitation. Three rounds Descriptions of all strains and plasmids used in this study are pro- of phenol/chloroform extraction were then performed to purify vided in Table 1. E. coli strains were grown in Lysogeny Broth the DNA (until no interphase was visible). The top aqueous layer −1 (LB; Ampicillin was added at 100 µg mL when necessary). H. from each tube was removed and a final ethanol precipitation was volcanii strains were grown in Hv-YPC (rich medium) or Hv- performed as described above. The 260/280 ratio and 260/230 CA (selective rich medium) developed by Allers et al. (2004) ratio for each DNA sample were measured for purity (H26: per instructions in the Halohandbook (Dyall-Smith, 2008). Uracil 260/280 = 1.83, 260/230 = 2.28; 1HVO_A0006: 260/280 = 1.78, −1 −1 (50 µg mL ), tryptophan (50 µg mL ), and 5-fluoroorotic 260/230 = 2.27). acid (at 50 µg mL−1) were added to the media as needed for 1pyrE2 and 1trpA strains. All H. volcanii cultures were PacBio SMRT Sequencing incubated at 42◦C and shaken at 200 rpm unless otherwise The methylation patterns of DNA extracted from H. volcanii specified. H26 and the 1HVO_A0006 strain were sequenced using PacBio

TABLE 1 | Strains and plasmids used in this study.

Strain/plasmid name Description Source

STRAINS Escherichia coli HST08 E. coli cloning strain Clontech, Cat. # 636763 Escherichia coli dam-/dcm- Used for producing unmethylated plasmids for Haloferax transformations Clontech Cat # C2925H Haloferax volcanii DS2 Wild-type Mullakhanbhai and Larsen, 1975 Haloferax volcanii H26 1pyrE2; laboratory strain derived from DS2 Bitan-Banin et al., 2003 Haloferax volcanii H53 1pyrE2/1trpA; derived from H26 Allers et al., 2004 Haloferax volcanii H26 HVO_A0006 deletion strain; derived from H53 This study 1HVO_A0006 PLASMIDS pTA131 Vector used to create p1HVO_A0006 gene deletion. Contains ampicillin resistance for Allers et al., 2004 selectivity, lacZ cloning site for blue-white screening, and pyrE2 marker for H. volcanii screening p1HVO_A0006 pTA131 vector construct with flanking region insert used to knockout HVO_A0006 This study

Frontiers in Microbiology | www.frontiersin.org 3 April 2015 | Volume 6 | Article 251 73 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

TABLE 2 | Oligonucleotide primers used in this study.

Primer name Primer sequence Primer description

HVO_A0006 FRIF 5′-CGG GCC CCC CCT CGA GTC AAG CAG TAC CTC AAC Used to amplify the flanking regions of HVO_A0006, with XhoI and XbaI ACG GAA CA-3′ HVO_A0006 FR1R 5′-ATT CGA TAT CAA GCT GTC CTC AAG GAC GGC CTG CA-3′ HVO_A0006 FR2F 5′-GAC GCG TTG ATA TCC CGA AGA ATC CAG TTG CTG TCT GTT G-3′ HVO_A0006 FR2R 5′-GGA TAT CAA CGC GTC GGC ATT ATG CAA TTC-3′

M13F 5′-GTA AAA CGA CGG CCA GT-3′ Primers used to amplify the flanking regions of HVO_A0006 inserted into the M13R 5′-AGG AAA CAG CTA TGA CCA T-3′ multiple cloning site of pTA131. Used for the screening of recombinant plasmids

“RS_Modification_and_Motif_Analysis.1” program in SMRT Portal under default parameter settings, with the H. volcanii DS2 genome used as a reference (Hartman et al., 2010). The “motifs.gff” output files for both strains are available in Sup- plementary Data (Supplementary Data Sheet 1 for H26 and Supplementary Data Sheet 2 for 1HVO_A0006).

Protein Domain Architecture Analysis and Multiple Alignment Homologs of the HVO_A0006 protein were identified through a blastp search (Altschul et al., 1990) of the non-redundant protein database (based on an E-value threshold for inferring homology of 1e-4). Structural Classification of Proteins (SCOP) database superfamilies (Gough and Chothia, 2002) and sequence features shown in domain architecture diagrams were detected with InterProScan (Quevillon et al., 2005). Multiple alignments were generated with Clustal Omega (Sievers et al., 2011). Sec- ondary structure predictions for the multiple alignment were made using PRALINE (Heringa, 1999) and PSIPRED (Jones, 1999). PD-(D/E)XK motifs were identified with the PD-EXK web server (Laganeckas et al., 2011).

Results

FIGURE 1 | PCR confirmation of HVO_A0006 deletion strain. The template DNA amplified was from H. volcanii DS2 (Lane 1), p1HVO_A0006 Characterization of DNA Methylation in Haloferax (lane 2; as a positive control), unsuccessful pop-outs (lanes 3, 4, and 10) and volcanii H26 successful pop-out colonies (lanes 5–9). A Mid-Range DNA ladder (Fisher In order to characterize the methylome of H. volcanii, the genome Scientific) is shown in Lane 11. of laboratory strain H26 (Table 1) was sequenced via SMRT sequencing. The data from the SMRT sequencing analysis for H26 is summarized in Table 3. SMRT sequencing of H. vol- SMRT sequencing. The prepared samples were processed by canii H26 identified two modification motifs for H26: one in the Keck Sequencing facility of the Yale School of Medicine for which cytosine was methylated (m4C) and another in which ade- analysis using PacBio SMRT sequencing. The SMRT method nine was methylated (m6A). The m4C motif was identified as is described in detail in the PacBio manual “Detecting DNA Cm4TAG and the m6A sequence motif was identified as GCAm6 Base Modifications: SMRT Analysis of Microbial Methylomes” BN6VTGC. These two motifs are also the same as those pre- (http://www.pacb.com/pdf/TN_Detecting_DNA_Base_Modifica dicted for H. volcanii DS2, the parental strain of H26, in REBASE tions.pdf). Using an estimated input DNA size range of ∼4000 kb, (Roberts et al., 2014). The CTAG motif was identified 1342 times 500–800 bp libraries were prepared for each strain and were run in the H26 genome, and methylation was detected at 374 of these in one SMRT cell, yielding ∼60x coverage for H26 and ∼80x motifs (28% methylation). The GCABN6VTGC motif occurred coverage for 1HVO_A0006. Analysis of the methylated 410 times in the H26 genome, and 316 of the copies were detected bases and motifs in each strain was performed using the to have methylated bases (77%).

Frontiers in Microbiology | www.frontiersin.org 4 April 2015 | Volume 6 | Article 251 74 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

TABLE 3 | DNA methylation patterns detected for H. volcanii H26 and CTAG was higher in 1HVO_A0006 (62.7) than in H26 (48.4). 1HVO_A0006. It is also important to note that SMRT sequencing only reports Strain H26 1HVO_A0006 coverage of a motif above a certain modification quality value (QV) score. For both the adenine and cytosine motifs, the Motif GCAm6 Cm4TAG GCAm6 Cm4TAG knockout strain QV was higher overall with an average value BN6VTGC BGN5VTGC of 60.9 as compared to 49.7 for the H26 strain. It is highly plausible that the cytosine motifs were equally methylated but Methylated 3 1 3 1 some of the methylated bases did not reach the QV thresh- position old and were therefore not counted. Therefore, the discrepancy 6 4 6 4 Methylation type m A m C m A m C in methylated m4C sites is unlikely to be caused by the dele- Number of 316 374 141 662 tion of the HVO_A0006 gene. Overall, the results are conclu- methylated motifs sive that deletion of HVO_A0006 reduces m6A methylation in Number of motifs 410 1342 160 1342 in genome H. volcanii. Percent of 77 28 88 49 methylated motifs Mean modification 57.0 49.7 70.9 60.0 HVO_A0006 is a Fragment Homologous to QV* Larger, Multi-domain RM Proteins Mean motif 30.7 48.4 42.9 62.7 A blastp search indicated that HVO_A0006 is a rare pro- coverage tein among the sequenced Halobacteria. The only significant

*Modification QV refers to level of confidence that a base is methylated. A QV of 30 or hit within this group was an annotated adenine specific DNA higher is considered significant. methyltransferase domain protein belonging to Halorubrum sp. AJ67 (Table 4). Out of the top 10 significant hits only a few belonged to archaeal species and the rest were bacterial (Table 4). Deletion of HVO_A0006 Increases Specificity of The blastp analysis also found homologs experimentally charac- m6A Motif and Reduces m6A Methylation terized as methyltransferases from Borrelia burgdorferi and Heli- In order to determine the effect of HVO_A0006 on genome cobacter pylori (Rego et al., 2011; Krebes et al., 2014). However, wide methylation, the chromosome of strain H53 with the those homologs are much larger than HVO_A0006, spanning HVO_A0006 gene deleted (1HVO_A0006) was sequenced via between 700 and 1000 amino acids (Figure 2). A reciprocal blastp SMRT sequencing. The data from the SMRT sequencing analysis of those hits against the H. volcanii DS2 and Halorubrum sp. is summarized in Table 3 for 1HVO_A0006. In 1HVO_A0006, AJ67 genomes revealed there are additional homologs. One is the Cm4TAG motif was identified like in H26, although the H. volcanii gene HVO_A0237 (accession number ADE02204), sequencing coverage was much higher and more methylated also annotated as an adenine methyltransferase, listed in REBASE motifs were detected. However, the m6A motif differed sig- as HvoDSORF237P, and located on pHV4 231 genes down- nificantly for the knockout strain. First, the methylated m6A stream from HVO_A0006. Another identified homolog is a small motif in the 1HVO_A0006 strain was more specific than in annotated adenine methyltransferase in Halorubrum sp. AJ67. its H26 counterpart, with one of the unspecified nucleotides in In REBASE, homologs of HVO_A0006 are predicted to belong the H26 sequence (N) being identified as a guanine (G) in the to the Type IIG subgroup. A multiple sequence alignment of 1HVO_A0006 strain. The resulting methylation motif from the HVO_A0006 (Figure 2) and several of its homologs indicate that m6 6 1HVO_A0006 strain is GCA BGN5VTGC. Thus, the m A the shared sequence identity occurs only in the N-terminal, puta- sequence became more specific with the deletion of HVO_A0006. tive endonuclease region. The homologs of HVO_A0006 con- Secondly, the total number of detected motifs, and the num- tain known methyltransferase features in their central regions, ber of detected methylated m6A motifs in comparison to the including the s-adenosyl-L-methionine-dependent methyltrans- reference genome also decreased when the HVO_A0006 gene ferse superfamily domain (SSF53335) within the SCOP database was deleted. In 1HVO_A0006, the total number of detected and predicted N6 adenine specific DNA methyltransferase sig- GCABGN5VTGC motifs was 160, 61% less than H26, and natures (IPR002296; Figure 2A), but these do not overlap m6 the number of methylated GCA BGN5VTGC sequences was with HVO_A0006 in the alignment. The N-terminal region 141, a decrease of 55% from H26. This decrease was due to of HVO_A0237 contains this methyltransferase domain, but it GCABHN5VTGC sequences not being included in the motif total aligns with the central region of the other sequences, not with for 1HVO_A0006, since those sites were not methylated in that HVO_A0006. The smaller annotated adenine methyltransferase strain. This should also explain why the percentage of methy- in Halorubrum sp. AJ67 also did not align with HVO_A0006, lated m6A motifs is higher in 1HVO_A0006 (88%) compared or with the larger annotated methyltransferase in the same to H26 (77%) even though 1HVO_A0006 has fewer methylated Halorubrum strain, but instead aligned with the C-terminal m6A motifs. Differences were also observed in the number of region of the other homologs in Figure 2 (see also Figure S1). methylated m4C sites detected, with 1HVO_A0006 exhibiting Overall, these results indicate that HVO_A0006 is not an ade- an increase in the number of methylated Cm4TAG sequences nine methyltransferase, as it was annotated, but is instead a frag- compared to H26. However, this change was likely the result ment homologous to the N terminus of Type IIG RM fusion of a sequence coverage artifact, since the motif coverage for proteins.

Frontiers in Microbiology | www.frontiersin.org 5 April 2015 | Volume 6 | Article 251 75 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

TABLE 4 | Top 10 blastp hits for H. volcanii HVO_A0006.

Species name Accession Annotated function E-value Predicted amino Reciprocal blastp number acid length E-value

Halorubrum sp. AJ67 CDK39740 adenine specific DNA methyltransferase 7e-120 734 2e-123 domain protein Peptococcaceae bacterium KFD42170 DNA methyltransferase 3e-59 1040 1e-62 SCADC1_2_3 Unclassified Atribacteria WP_018206408 hypothetical protein 3e-56 1041 9e-60 Aerophobetes bacterium WP_029964223 DNA methyltransferase, partial 2e-55 1024 7e-59 SCGC AAA255-F10 Smithella sp. SCADC KFO67113 DNA methyltransferase 1e-54 1049 4e-58 Unclassified Aminicenantes WP_020261071 hypothetical protein 3e-54 1034 1e-57 Aminicenantes bacterium WP_020260848 hypothetical protein, partial 3e-54 980 6e-59 SCGC AAA252-G21 Atribacteria bacterium WP_029955983 DNA methyltransferase 4e-54 1037 1e-57 SCGC AAA255-N14 Melioribacter roseus P3M-2 WP_014855429 adenine specific DNA methyltransferase 7e-54 1042 2e-57 Cloacimonetes bacterium WP_024562979 DNA methyltransferase 1e-52 1041 5e-56 JGI OTU-1

HVO_A0006 contains a Conserved PD-(D/E)XK methyltransferases, with no emphasis on overall genomic methy- Nuclease Motif lation (Baranyi et al., 2000; Chinen et al., 2000; Grogan, 2003; Analysis of the HVO_A0006 sequence for endonuclease signa- Ishikawa et al., 2005; Kim et al., 2005; Watanabe et al., 2006). tures revealed that it contains a PD-(D/E)XK nuclease motif In this study, we characterized genome-wide DNA methylation 60 75 77 2+ (PD -X14-E AK ), a Mg -dependent catalytic motif com- patterns of an archaeal organism. We used H. volcanii due to mon to many restriction endonucleases (Kosinski et al., 2005). its developed genetic system and ease of growth, which allowed The active sites of this motif (D60,E75, and K77) are conserved us to examine the effect of deleting one of the seven annotated in all homologs in the multiple sequence alignment which align methyltransferases in the H. volcanii genome. with HVO_A0006 (Figure 2B). Analysis of these sequences in The SMRT sequencing analysis detected two motifs, which are the alignment using the PD-EXK web server (Laganeckas et al., modified throughout the H. volcanii H26 genome: a Cm4TAG m6 2011) predicted each sequence belonged to a PD-(D/E)XK fam- motif and a GCA BN6VTGC motif. The presence of methy- ily with a probability of 1.0. Secondary structural analysis also lated CTAG motifs is consistent with observations from previous demonstrated that the region containing the PD-(D/E)XK motif restriction digest experiments, and the putative Type II CTAG in the aligned sequences all share an αβββαβ core structure com- methyltransferase HVO_0794 may be responsible for modify- m6 mon to this domain family (Venclovas et al., 1994; Kinch et al., ing these motifs (Charlebois et al., 1987). The GCA BN6VTGC 2005). Overall, this evidence indicates that HVO_A0006 is a motif resembles the type of sequence targeted by Type I RM sys- PD-(D/E)XK nuclease family protein. tems, which typically consist of two partial sequences separated by a gap of unspecified nucleotides (Loenen et al., 2014a). The HVO_A0006 and HVO_A0237 are Flanked by only Type I RM system predicted in H. volcanii, according to Predicted Integrase and Transposase Genes the RM database REBASE, is the operon HVO_2269-2271 named The gene neighborhoods of HVO_A0006 and HVO_A0237 rmeRMS (Roberts et al., 2014). Therefore, we predict that the were examined. Analysis of the surrounding genomic region of Type I complex encoded by the rmeRMS operon is responsible for HVO_A0006 (Figure 3A) revealed that the gene is flanked by at least some of the modifications detected by the SMRT sequenc- three transposition genes: a XerC/D-like integrase and two puta- ing. Further studies would need to be performed to confirm the tive transposases. A putative transposase was also found directly role of these putative RM systems in methylating the identified upstream of HVO_A0237 (Figure 3B). This predicted protein motifs, as well as determining the role of the other annotated (HVO_A0238) is the same length and 100% identical to the ISH5 methyltransferase genes in the H. volcanii genome. transposase adjacent to HVO_A0006: HVO_A0007 (Figure 3). The gene HVO_A0006 was selected for investigation via gene knockout in this study because in genomic analysis it Discussion was annotated as an adenosine specific methyltransferase (Hart- man et al., 2010). The results indicate that the HVO_A0006 Previous research on DNA methylation has focused primarily on gene has an effect on m6A methylation in H. volcanii accord- eukaryotic and bacterial organisms. The few studies which have ing to the motif analysis produced by SMRT sequencing. This examined DNA methylation and RM systems in archaeal organ- is evidenced by the observed change in adenine motif recog- m6 isms have concentrated on a few restriction endonucleases and nition between the H26 strain, which was GCA BN6VTGC,

Frontiers in Microbiology | www.frontiersin.org 6 April 2015 | Volume 6 | Article 251 76 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

FIGURE 2 | Domain architecture and multiple alignment of Type IIG and H. pylori and several annotated methyltransferases from archaeal homologs. (A) The position of predicted domains present within species. The common regions of homology between all proteins are shown HVO_A0006 (accession number ADE01899) homologs is shown, including in gray. Detected s-adenosyl-L-methionine-dependent experimentally characterized DNA methyltransferases from B. burgdorferi (Continued)

Frontiers in Microbiology | www.frontiersin.org 7 April 2015 | Volume 6 | Article 251 77 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

FIGURE 2 | Continued DNA methlytransferase signatures (blue) and PD-(D/E)XK signatures (purple) methyltransferase superfamily domains (SSF53335) are shown in light blue, from part A are highlighted. Conserved secondary structure predictions are along with N6 adenine specific DNA methlytransferase signatures shown above: as red boxes for α-helices and yellow arrows for β-sheets. (IPR002296; dark blue). The conserved structural core (αβββαβ) of a Components of the conserved αβββαβ core of the predicted PD-(D/E)XK PD-(D/E)XK nuclease domain is shown in purple. All sequences shown nuclease domain shown in part A are outlined in black. Amino acid shading contained a PD-(D/E)XK motif with a confidence score of 1.0 (Laganeckas represents Clustal sequence similarity. All sequences other than HVO_A0006 et al., 2011). (B) Multiple alignment of HVO_A0006 homologs. Predicted are truncated (see Figure S1 for entire alignment).

FIGURE 3 | Diagram of the genomic neighborhood for with annotated functional predictions below. Gene sizes and intergenic H. volcanii HVO_A0006 (A) and HVO_A0237 (B). Genes are spaces are shown in nucleotides. Both regions shown are found on depicted along with two upstream and downstream flanking genes, replicon pHV4.

m6 and the HVO_A0006 knockout, which was GCA BGN5VTGC. HVO_A0006, we detect a helical region in the C-terminal region Our conclusion is also supported by the reduction in the total of the protein. This region is likely part of the helical connec- amount of motifs recognized, with the knockout strain exhibit- tor domain that links the restriction endonuclease and methyl- ing a 61% decrease in the total number of recognized motifs transferase domains in Type IIG RM proteins (Shen et al., 2011). compared to the parental strain and a 55% reduction in the The Type IIG RM systems are hypothesized to be evolutionarily number of methylated motifs. These data appear to support related to the Type I systems, and this helical connector domain the genome annotation of HVO_A0006 as an adenine methyl- is predicted to be homologous to the N-terminal helical region transferase. However, the amino acid alignment and domain of the Type I M subunits, with a similar function of molecular architecture analysis of the protein and its homologs reveals communication between the methyltransferase and restriction that HVO_A0006 is only homologous to the N-terminal region endonuclease domains (Nakonieczna et al., 2009). Therefore, it is of characterized and predicted Type IIG RM fusion proteins, possible that HVO_A0006 uses its partial helical domain to inter- which typically contain a restriction endonuclease domain. Anal- act with the Type I RmeRMS complex and acts as an R subunit, ysis of the conserved region in HVO_A0006 and its homologs thus altering the restriction and modification activity of the com- revealed the presence of a PD-(D/E)XK nuclease motif with con- plex and resulting in the different methylation patterns seen in served active sites and secondary structure. While methyltrans- the parental strain of H. volcanii compared to the HVO_A0006 ferase domains were detected in homologs of HVO_A0006, they knockout. However, since the RmeRMS system already has an R do not correspond to the region of shared sequence identity subunit encoded in the operon, any possible interaction between with HVO_A0006. Therefore, HVO_A0006 likely functions as HVO_A0006 and the Type I complex would be complicated by a nuclease, not a complete methyltransferase as its annotation competition with the native subunit. suggests. Another possible candidate protein for interaction with We hypothesize that HVO_A0006 influences adenine methy- HVO_A0006 is HVO_A0237 which are both located on the lation patterns through interaction with another RM protein or native plasmid pHV4. The multiple alignment analysis (Figure 2) system in H. volcanii. One possible candidate for interaction is indicates that HVO_A0237 is also a homolog of a Type IIG sys- the Type I RM system encoded by rmeRMS. In Type I RM sys- tem, but it is missing an N-terminal restriction endonuclease tems, the M subunits are known to interact with each other to domain. This absent region might be supplied by HVO_A0006, coordinate restriction and modification activity in the Type I which interacts with HVO_A0237 via the helical connector complex. This interaction primarily occurs via N-terminal heli- region to form a complete Type IIG RM enzyme, and forms cal regions in the M subunits, which coordinate to detect adenine a split protein complex, rather than a fused one as is typical methylation when the complex binds to the target sequence and of these systems. Previous analysis has demonstrated that the mediate methylation if the site is hemimethylated (maintenance activity of a Type IIG protein is coordinated by the interac- methylation), or cleavage if the site is unmethylated (Kelleher tion between the restriction endonuclease domain and the other et al., 1991). Mutation of this N-terminal region alters the activ- two regions of the enzyme (Shen et al., 2011). Therefore, dele- ity of the Type I complex to de novo methylation (modification tion of the HVO_A0006 gene might compromise the integrity of unmethylated target sites) rather than maintenance methyla- of the required Type IIG protein-protein interactions and thus tion, indicating that changes to molecular communication in the prevent methylation by the remaining methyltransferase encoded Type I complex can alter RM activity (Kelleher et al., 1991). In by HVO_A0237, resulting in the change in adenine methylation

Frontiers in Microbiology | www.frontiersin.org 8 April 2015 | Volume 6 | Article 251 78 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii observed in the deletion mutant. Complicating this proposed to each other, as well as new independent RM proteins via mechanism is that motifs modified by Type IIG systems do not fragmentation. typically resemble Type I motifs. For example, the motif modified Using SMRT sequencing, this study was able to characterize by the H. pylori homolog is GGWTAAm6 (Krebes et al., 2014). the methylome of the archaeal organism H. volcanii. We were However, some Type IIG proteins in Campylobacter jejuni have also able to demonstrate that an annotated methyltransferase been reported to use S subunits similar to those found in Type I S gene, HVO_A0006, affects adenine methylation patterns in the RM systems (Furuta et al., 2010). Further studies will be needed H. volcanii genome, although it is likely a PD-(D/E)XK nuclease to characterize the interactions between HVO_A0006 and other family protein that interacts with another RM protein or system. RM systems. Future SMRT sequencing studies utilizing a methyltransferase Based on a multiple alignment and the presence of inte- null mutant could help provide more precise characterization of grase and transposition genes upstream and downstream of the methyltransferase genes and provide a more detailed picture of gene, we propose that HVO_A0006 is likely the result of a gene DNA methylation in H. volcanii and the role of RM systems in transfer event of a Type IIG RM in which a gene fragment the organism. (the N-terminal restriction endonuclease region) of the whole gene was acquired. The other regions of the Type IIG RM gene Acknowledgments were likely also acquired or were the result of internal genomic rearrangements from a once intact gene in H. volcanii, since We would like to thank Thorsten Allers from the University HVO_A0237 is missing an N-terminal restriction endonucle- of Nottingham, UK for the Haloferax volcanii strains and plas- ase region, but is homologous to the same Type IIG RM genes mids. This work was supported by the National Science foun- as HVO_A0006. A similar event also appears to have occurred dation (award numbers DEB-0910290 and DEB-0830024) and with a Type IIG RM gene in Halorubrum sp. AJ67, with the the NASA Astrobiology: Exobiology and Evolutionary Biology C-terminal site specificity region fragmenting from the restric- Program grant number NNX12AD70G. tion endonuclease and methyltransferase regions. This is consis- tent with previous findings of gene transfer that have resulted in Supplementary Material gene fragmentation (Kobayashi et al., 1999; Chan et al., 2009). These RM systems are highly mobile, and gene transfer events The Supplementary Material for this article can be found of RM domains and subunits likely results in the formation online at: http://www.frontiersin.org/journal/10.3389/fmicb. of new fusion proteins, when the domains are transferred next 2015.00251/abstract

References Bitan-Banin, G., Ortenberg, R., and Mevarech, M. (2003). Development of a gene knockout system for the halophilic archaeon Haloferax volcanii by Allers, T., Barak, S., Liddell, S., Wardell, K., and Mevarech, M. (2010). Improved use of the pyrE gene. J. Bacteriol. 185, 772–778. doi: 10.1128/JB.185.3.772- strains and plasmid vectors for conditional overexpression of His-tagged pro- 778.2003 teins in Haloferax volcanii. Appl. Environ. Microbiol. 76, 1759–1769. doi: Blaby, I. K., Phillips, G., Blaby-Haas, C. E., Gulig, K. S., El Yacoubi, B., and De 10.1128/AEM.02670-09 Crecy-Lagard, V. (2010). Towards a systems approach in the genetic analysis of Allers, T., and Mevarech, M. (2005). Archaeal genetics - the third way. Nat. Rev. archaea: accelerating mutant construction and phenotypic analysis in Haloferax Genet. 6, 58–73. doi: 10.1038/nrg1504 volcanii. Archaea 2010:426239. doi: 10.1155/2010/426239 Allers, T., and Ngo, H. P. (2003). Genetic analysis of homologous recombination Blumenthal, R. M., and Cheng, X. (2002). “Restriction-modification systems,” in in Archaea: Haloferax volcanii as a model organism. Biochem. Soc. Trans. 31, Modern Microbial Genetics, 2nd Edn., eds U. N. Streips and R. E. Yasbin (New 706–709. doi: 10.1042/BST0310706 York, NY: John Wiley & Sons, Inc.), 177–217. Allers, T., Ngo, H. P., Mevarech, M., and Lloyd, R. G. (2004). Development of Chan, C. X., Beiko, R. G., Darling, A. E., and Ragan, M. A. (2009). Lateral transfer additional selectable markers for the halophilic archaeon Haloferax volcanii of genes and gene fragments in prokaryotes. Genome Biol. Evol. 1, 429–438. doi: based on the leuB and trpA genes. Appl. Environ. Microbiol. 70, 943–953. doi: 10.1093/gbe/evp044 10.1128/AEM.70.2.943-953.2004 Charlebois, R. L., Lam, W. L., Cline, S. W., and Doolittle, W. F. (1987). Charac- Alm, R. A., Ling, L. S., Moir, D. T., King, B. L., Brown, E. D., Doig, P. C., et al. terization of pHV2 from Halobacterium volcanii and its use in demonstrat- (1999). Genomic-sequence comparison of two unrelated isolates of the human ing transformation of an archaebacterium. Proc. Natl. Acad. Sci. U.S.A. 84, gastric pathogen Helicobacter pylori. Nature 397, 176–180. doi: 10.1038/16495 8530–8534. doi: 10.1073/pnas.84.23.8530 Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic Chimileski, S., Dolas, K., Naor, A., Gophna, U., and Papke, R. T. (2014). Extra- local alignment search tool. J. Mol. Biol. 215, 403–410. doi: 10.1016/S0022- cellular DNA metabolism in Haloferax volcanii. Front. Microbiol. 5:57. doi: 2836(05)80360-2 10.3389/fmicb.2014.00057 Arber, W., and Dussoix, D. (1962). Host specificity of DNA produced by Chinen, A., Uchiyama, I., and Kobayashi, I. (2000). Comparison between Pyro- Escherichia coli. I. Host controlled modification of bacteriophage lambda. J. coccus horikoshii and Pyrococcus abyssi genome sequences reveals linkage of Mol. Biol. 5, 18–36. doi: 10.1016/S0022-2836(62)80058-8 restriction-modification genes with large genome polymorphisms. Gene 259, Baranyi, U., Klein, R., Lubitz, W., Kruger, D. H., and Witte, A. (2000). The archaeal 109–121. doi: 10.1016/S0378-1119(00)00459-5 halophilic virus-encoded Dam-like methyltransferase M. phiCh1-I methylates Cline, S. W., Lam, W. L., Charlebois, R. L., Schalkwyk, L. C., and Doolittle, adenine residues and complements dam mutants in the low salt environ- W. F. (1989). Transformation methods for halophilic archaebacteria. Can. J. ment of Escherichia coli. Mol. Microbiol. 35, 1168–1179. doi: 10.1046/j.1365- Microbiol. 35, 148–152. doi: 10.1139/m89-022 2958.2000.01786.x Dyall-Smith, M. (ed.). (2008). The Halohandbook: Protocols for Haloar- Bickle, T. A., and Kruger, D. H. (1993). Biology of DNA restriction. Microbiol. Rev. chaeal Genetics. Available online at: http://www.haloarchaea.com/resources/ 57, 434–450. halohandbook/index.html

Frontiers in Microbiology | www.frontiersin.org 9 April 2015 | Volume 6 | Article 251 79 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

Fang, G., Munera, D., Friedman, D. I., Mandlik, A., Chao, M. C., Banerjee, O., et al. Laganeckas, M., Margelevicius, M., and Venclovas, C. (2011). Identification of new (2012). Genome-wide mapping of methylated adenine residues in pathogenic homologs of PD-(D/E)XK nucleases by support vector machines trained on Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, data derived from profile-profile alignments. Nucleic Acids Res. 39, 1187–1196. 1232–1239. doi: 10.1038/nbt.2432 doi: 10.1093/nar/gkq958 Flusberg, B. A., Webster, D. R., Lee, J. H., Travers, K. J., Olivares, E. C., Clark, T. Leigh, J. A., Albers, S. V., Atomi, H., and Allers, T. (2011). Model organisms A., et al. (2010). Direct detection of DNA methylation during single-molecule, for genetics in the domain Archaea: methanogens, halophiles, Thermococ- real-time sequencing. Nat. Methods 7, 461–465. doi: 10.1038/nmeth.1459 cales and Sulfolobales. FEMS Microbiol. Rev. 35, 577–608. doi: 10.1111/j.1574- Furuta, Y., Abe, K., and Kobayashi, I. (2010). Genome comparison and con- 6976.2011.00265.x text analysis reveals putative mobile forms of restriction-modification sys- Lluch-Senar, M., Luong, K., Llorens-Rico, V., Delgado, J., Fang, G., Spittle, K., tems and related rearrangements. Nucleic Acids Res. 38, 2428–2443. doi: et al. (2013). Comprehensive methylome characterization of Mycoplasma gen- 10.1093/nar/gkp1226 italium and Mycoplasma pneumoniae at single-base resolution. PLoS Genet. Furuta, Y., Namba-Fukuyo, H., Shibata, T. F., Nishiyama, T., Shigenobu, S., Suzuki, 9:e1003191. doi: 10.1371/journal.pgen.1003191 Y., et al. (2014). Methylome diversification through changes in DNA methyl- Loenen, W. A., Dryden, D. T., Raleigh, E. A., and Wilson, G. G. (2014a). Type transferase sequence specificity. PLoS Genet. 10:e1004272. doi: 10.1371/jour- I restriction enzymes and their relatives. Nucleic Acids Res. 42, 20–44. doi: nal.pgen.1004272 10.1093/nar/gkt847 Gough, J., and Chothia, C. (2002). SUPERFAMILY: HMMs representing all pro- Loenen, W. A., Dryden, D. T., Raleigh, E. A., Wilson, G. G., and Murray, N. teins of known structure. SCOP sequence searches, alignments and genome E. (2014b). Highlights of the DNA cutters: a short history of the restriction assignments. Nucleic Acids Res. 30, 268–272. doi: 10.1093/nar/30.1.268 enzymes. Nucleic Acids Res. 42, 3–19. doi: 10.1093/nar/gkt990 Grogan, D. W. (2003). Cytosine methylation by the SuaI restriction-modification Loenen, W. A., and Raleigh, E. A. (2014). The other face of restric- system: implications for genetic fidelity in a hyperthermophilic archaeon. tion: modification-dependent enzymes. Nucleic Acids Res. 42, 56–69. doi: J. Bacteriol. 185, 4657–4661. doi: 10.1128/JB.185.15.4657-4661.2003 10.1093/nar/gkt747 Hartman, A. L., Norais, C., Badger, J. H., Delmas, S., Haldenby, S., Madupu, R., Meisel, A., Bickle, T. A., Kruger, D. H., and Schroeder, C. (1992). Type III restric- et al. (2010). The complete genome sequence of Haloferax volcanii DS2, a model tion enzymes need two inversely oriented recognition sites for DNA cleavage. archaeon. PLoS ONE 5:e9605. doi: 10.1371/journal.pone.0009605 Nature 355, 467–469. doi: 10.1038/355467a0 Heringa, J. (1999). Two strategies for sequence comparison: profle-preprocessed Meselson, M., Yuan, R., and Heywood, J. (1972). Restriction and and secondary structure-induced multiple alignment. Comput. Chem. 23, modification of DNA. Annu. Rev. Biochem. 41, 447–466. doi: 341–364. doi: 10.1016/S0097-8485(99)00012-1 10.1146/annurev.bi.41.070172.002311 Holmes, M. L., Nuttall, S. D., and Dyall-Smith, M. L. (1991). Construction and use Morgan, R. D., Dwinell, E. A., Bhatia, T. K., Lang, E. M., and Luyten, Y. A. (2009). of halobacterial shuttle vectors and further studies on Haloferax DNA gyrase. The MmeI family: type II restriction-modification enzymes that employ single- J. Bacteriol. 173, 3807–3813. strand modification for host protection. Nucleic Acids Res. 37, 5208–5221. doi: Ishikawa, K., Watanabe, M., Kuroita, T., Uchiyama, I., Bujnicki, J. M., Kawakami, 10.1093/nar/gkp534 B., et al. (2005). Discovery of a novel restriction endonuclease by genome com- Mullakhanbhai, M. F., and Larsen, H. (1975). Halobacterium volcanii spec. nov., parison and application of a wheat-germ-based cell-free translation assay: PabI a Dead Sea halobacterium with a moderate salt requirement. Arch. Microbiol. (5’-GTA/C) from the hyperthermophilic archaeon Pyrococcus abyssi. Nucleic 104, 207–214. doi: 10.1007/BF00447326 Acids Res. 33:e112. doi: 10.1093/nar/gni113 Murphy, J., Mahony, J., Ainsworth, S., Nauta, A., and Van Sinderen, D. (2013). Jeltsch, A. (2003). Maintenance of species identity and controlling speciation of Bacteriophage orphan DNA methyltransferases: insights from their bacterial bacteria: a new function for restriction/modification systems? Gene 317, 13–16. origin, function, and occurrence. Appl. Environ. Microbiol. 79, 7547–7555. doi: doi: 10.1016/S0378-1119(03)00652-8 10.1128/AEM.02229-13 Jones, D. T. (1999). Protein secondary structure prediction based on Murray, I. A., Clark, T. A., Morgan, R. D., Boitano, M., Anton, B. P., Luong, position-specific scoring matrices. J. Mol. Biol. 292, 195–202. doi: K., et al. (2012). The methylomes of six bacteria. Nucleic Acids Res. 40, 10.1006/jmbi.1999.3091 11450–11462. doi: 10.1093/nar/gks891 Kelleher, J. E., Daniel, A. S., and Murray, N. E. (1991). Mutations that confer de Murray, N. E. (2000). Type I restriction systems: sophisticated molecular machines novo activity upon a maintenance methyltransferase J. Mol. Biol. 221, 431–440. (a legacy of Bertani and Weigle). Microbiol. Mol. Biol. Rev. 64, 412–434. doi: doi: 10.1016/0022-2836(91)80064-2 10.1128/MMBR.64.2.412-434.2000 Kim, J. S., Degiovanni, A., Jancarik, J., Adams, P. D., Yokota, H., Kim, R., et al. Nakonieczna, J., Kaczorowski, T., Obarska-Kosinska, A., and Bujnicki, J. M. (2009). (2005). Crystal structure of DNA sequence specificity subunit of a type I Functional analysis of MmeI from methanol utilizer Methylophilus methylotro- restriction-modification enzyme and its functional implications. Proc. Natl. phus, a subtype IIC restriction-modification enzyme related to type I enzymes. Acad. Sci. U.S.A. 102, 3248–3253. doi: 10.1073/pnas.0409851102 Appl. Environ. Microbiol. 75, 212–223. doi: 10.1128/AEM.01322-08 Kinch, L. N., Ginalski, K., Rychlewski, L., and Grishin, N. V. (2005). Identifica- Nobusato, A., Uchiyama, I., Ohashi, S., and Kobayashi, I. (2000). Insertion with tion of novel restriction endonuclease-like fold families among hypothetical long target duplication: a mechanism for gene mobility suggested from compar- proteins. Nucleic Acids Res. 33, 3598–3605. doi: 10.1093/nar/gki676 ison of two related bacterial genomes. Gene 259, 99–108. doi: 10.1016/S0378- Kobayashi, I., Nobusato, A., Kobayashi-Takahashi, N., and Uchiyama, I. (1999). 1119(00)00456-X Shaping the genome–restriction-modification systems as mobile genetic ele- Pingoud, A., Wilson, G. G., and Wende, W. (2014). Type II restriction ments. Curr. Opin. Genet. Dev. 9, 649–656. doi: 10.1016/S0959-437X(99) endonucleases–a historical perspective and more. Nucleic Acids Res. 42, 00026-X 7489–7527. doi: 10.1093/nar/gku447 Kosinski, J., Feder, M., and Bujnicki, J. M. (2005). The PD-(D/E)XK superfam- Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., ily revisited: identification of new members among proteins involved in DNA et al. (2005). InterProScan: protein domains identifier. Nucleic Acids Res. 33, metabolism and functional predictions for domains of (hitherto) unknown W116–W120. doi: 10.1093/nar/gki442 function. BMC Bioinformatics 6:172. doi: 10.1186/1471-2105-6-172 Rao, D. N., Dryden, D. T., and Bheemanaik, S. (2014). Type III restriction- Krebes, J., Morgan, R. D., Bunk, B., Sproer, C., Luong, K., Parusel, R., et al. (2014). modification enzymes: a historical perspective. Nucleic Acids Res. 42, 45–55. The complex methylome of the human gastric pathogen Helicobacter pylori. doi: 10.1093/nar/gkt616 Nucleic Acids Res. 42, 2415–2432. doi: 10.1093/nar/gkt1201 Rego, R. O., Bestor, A., and Rosa, P. A. (2011). Defining the plasmid-borne Kuhnlein, U., and Arber, W. (1972). Host specificity of DNA produced by restriction-modification systems of the Lyme disease spirochete Borrelia Escherichia coli. XV. The role of nucleotide methylation in in vitro B-specific burgdorferi. J. Bacteriol. 193, 1161–1171. doi: 10.1128/JB.01176-10 modification. J. Mol. Biol. 63, 9–19. doi: 10.1016/0022-2836(72)90518-9 Roberts, R. J., Belfort, M., Bestor, T., Bhagwat, A. S., Bickle, T. A., Bitinaite, J., Kuo, Y., Thompson, D. K., Jean, A., Charlebois, R. L., and Daniels, C. (1997). Char- et al. (2003). A nomenclature for restriction enzymes, DNA methyltransferases, acterization of two heat shock genes from Haloferax volcanii: a model system homing endonucleases and their genes. Nucleic Acids Res. 31, 1805–1812. doi: for transcription regulation in the Archaea. J. Bacteriol. 179, 6318–6324. 10.1093/nar/gkg274

Frontiers in Microbiology | www.frontiersin.org 10 April 2015 | Volume 6 | Article 251 80 Ouellette et al. Methylome of the model haloarchaeon Haloferax volcanii

Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2014). REBASE-a database for Wilson, G. G., and Murray, N. E. (1991). Restriction and modification systems. DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Annu. Rev. Genet. 25, 585–627. doi: 10.1146/annurev.ge.25.120191.003101 Res. 43, D298–D299. doi: 10.1093/nar/gku1046 Wion, D., and Casadesus, J. (2006). N6-methyl-adenine: an epigenetic signal for Shen, B. W., Xu, D., Chan, S. H., Zheng, Y., Zhu, Z., Xu, S. Y., et al. (2011). DNA-protein interactions. Nat. Rev. Microbiol. 4, 183–192. doi: 10.1038/nrmi- Characterization and crystal structure of the type IIG restriction endonuclease cro1350 RM.BpuSI. Nucleic Acids Res. 39, 8223–8236. doi: 10.1093/nar/gkr543 Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., et al. (2011). Conflict of Interest Statement: The Review Editor Antonio Ventosa declares that, Fast, scalable generation of high-quality protein multiple sequence alignments despite having published with author Thane Papke, the review process was handled using Clustal Omega. Mol. Syst. Biol. 7:539. doi: 10.1038/msb.2011.75 objectively and no conflict of interest exists. The authors declare that the research Vasu, K., and Nagaraja, V. (2013). Diverse functions of restriction-modification was conducted in the absence of any commercial or financial relationships that systems in addition to cellular defense. Microbiol. Mol. Biol. Rev. 77, 53–72. doi: could be construed as a potential conflict of interest. 10.1128/MMBR.00044-12 Venclovas, C., Timinskas, A., and Siksnys, V. (1994). Five-stranded β-sheet sand- Copyright © 2015 Ouellette, Jackson, Chimileski and Papke. This is an open-access wiched with two α-helices: a structural link between restriction endonucleases article distributed under the terms of the Creative Commons Attribution License (CC EcoRI and EcoRV. Proteins 20, 279–282. doi: 10.1002/prot.340200308 BY). The use, distribution or reproduction in other forums is permitted, provided the Watanabe, M., Yuzawa, H., Handa, N., and Kobayashi, I. (2006). Hyperther- original author(s) or licensor are credited and that the original publication in this mophilic DNA methyltransferase M.PabI from the archaeon Pyrococcus abyssi. journal is cited, in accordance with accepted academic practice. No use, distribution Appl. Environ. Microbiol. 72, 5367–5375. doi: 10.1128/AEM.00433-06 or reproduction is permitted which does not comply with these terms.

Frontiers in Microbiology | www.frontiersin.org 11 April 2015 | Volume 6 | Article 251 81

Chapter 4. Characterizing the DNA methyltransferases of Haloferax volcanii via bioinformatics, gene deletion, and SMRT sequencing

Matthew Ouellette, J. Peter Gogarten, Jessica Lajoie, Andrea M. Makkay, and R. Thane Papke

This chapter examines the putative DNA methyltransferase (MTase) genes in Haloferax volcanii, and their role in methylating the genome of the organism. This research was published in Genes in 2018. The purpose of this work was to determine which MTases were responsible for the genomic methylation patterns in H. volcanii observed in Chapter 3, which would provide a clearer picture of MTases and restriction-modification (RM) systems in a model halobacterial organism. This work was also performed in order to construct a deletion mutant of H. volcanii without RM genes, which would be useful in future studies on DNA methylation and RM systems in H. volcanii. Regarding my contributions, I designed the experiments with the assistance of R. Thane Papke and J. Peter Gogarten and performed the research with the assistance of Jessica Lajoie and J. Peter Gogarten. I analyzed the results with the assistance of

Andrea M. Makkay, R. Thane Papke, and J. Peter Gogarten, and I wrote the manuscript with the assistance of R. Thane Papke and J. Peter Gogarten.

Citation:

Ouellette, M., Gogarten, J. P., Lajoie, J., Makkay, A. M., & Papke, R. T. (2018) Characterizing

the DNA methyltransferases of Haloferax volcanii via bioinformatics, gene deletion, and

SMRT sequencing. Genes, 9(129). doi: 10.3390/genes9030129

http://www.mdpi.com/2073-4425/9/3/129

82

G C A T T A C G G C A T genes

Article Characterizing the DNA Methyltransferases of Haloferax volcanii via Bioinformatics, Gene Deletion, and SMRT Sequencing

Matthew Ouellette, J. Peter Gogarten ID , Jessica Lajoie, Andrea M. Makkay and R. Thane Papke *

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06268, USA; [email protected] (M.O.); [email protected] (J.P.G.); [email protected] (J.L.); [email protected] (A.M.M.) * Correspondence: [email protected]; Tel.: +1-860-486-7963

Received: 24 January 2018; Accepted: 19 February 2018; Published: 27 February 2018

Abstract: DNA methyltransferases (MTases), which catalyze the methylation of adenine and cytosine bases in DNA, can occur in bacteria and archaea alongside cognate restriction endonucleases (REases) in restriction-modification (RM) systems or independently as orphan MTases. Although DNA methylation and MTases have been well-characterized in bacteria, research into archaeal MTases has been limited. A previous study examined the genomic DNA methylation patterns (methylome) of the halophilic archaeon Haloferax volcanii, a model archaeal system which can be easily manipulated in laboratory settings, via single-molecule real-time (SMRT) sequencing and deletion of a putative MTase gene (HVO_A0006). In this follow-up study, we deleted other putative MTase genes in H. volcanii and sequenced the methylomes of the resulting deletion mutants via SMRT sequencing to characterize the genes responsible for DNA methylation. The results indicate that deletion of putative RM genes HVO_0794, HVO_A0006, and HVO_A0237 in a single strain abolished methylation of the sole cytosine motif in the genome (Cm4TAG). Amino acid alignments demonstrated that HVO_0794 shares homology with characterized cytosine CTAG MTases in other organisms, indicating that this MTase is responsible for Cm4TAG methylation in H. volcanii. The CTAG motif has high density at only one of the origins of replication, and there is no relative increase in CTAG motif frequency in the genome of H. volcanii, indicating that CTAG methylation might not have effectively taken over the role of regulating DNA replication and mismatch repair in the organism as previously predicted. Deletion of the putative Type I RM operon rmeRMS (HVO_2269-2271) resulted in abolished methylation of the adenine motif m6 in the genome (GCA BN6VTGC). Alignments of the MTase (HVO_2270) and site specificity subunit (HVO_2271) demonstrate homology with other characterized Type I MTases and site specificity subunits, indicating that the rmeRMS operon is responsible for adenine methylation in H. volcanii. Together with HVO_0794, these genes appear to be responsible for all detected methylation in H. volcanii, even though other putative MTases (HVO_C0040, HVO_A0079) share homology with characterized MTases in other organisms. We also report the construction of a multi-RM deletion mutant (∆RM), with multiple RM genes deleted and with no methylation detected via SMRT sequencing, which we anticipate will be useful for future studies on DNA methylation in H. volcanii.

Keywords: haloarchaea; Halobacteria; Haloferax volcanii; DNA methylation; methylation; methylome; restriction-modification system; CTAG methylation; GATC methylation

1. Introduction In bacteria and archaea, DNA methylation by DNA methyltransferases (MTases) has many roles. MTases are commonly associated with restriction-modification (RM) systems, in which the MTase functions alongside a cognate restriction endonuclease (REase). The REase will target the same sites

Genes 2018, 9, 129; doi:10.3390/genes9030129 www.mdpi.com/journal/genes 83 Genes 2018, 9, 129 2 of 23 of DNA as the MTase and cleave those that are unmethylated, whereas methylated motifs will be disregarded. RM systems function in self-recognition, allowing the host to differentiate between its own methylated DNA and potentially harmful foreign unmethylated DNA, which can then be recognized and digested by the REase [1,2]. RM systems have also been characterized as Toxin/Antitoxin systems that lead to addiction through post-segregational killing in case the methylation activity decays in a cell from which the RM system has been lost so that unmethylated restriction sites become exposed to the remaining restriction enzyme activity [3,4]. DNA methylation occurs at adenine or cytosine bases, resulting in one of three possible types of methylation: N6-methyladenine (6mA), N4-methylcytosine (4mC), or C5-methylcytosine (5mC) [5]. Methylation is catalyzed by DNA MTases, which interact with the cofactor S-adenosyl methionine (AdoMet) to transfer a methyl group to a nucleotide base of a DNA molecule. MTases typically consist of three major domains: an AdoMet binding domain which interacts with AdoMet to obtain the methyl group, a target recognition domain (TRD) which recognizes a short sequence of DNA to be targeted for methylation, and a catalytic domain which transfers the methyl group from AdoMet to the targeted nucleotide [6]. MTases can be categorized based on the type of methylation they perform (6mA, 4mC, and 5mC), with the 6mA and 4mC MTases being more similar to each other than to 5mC MTases [7,8]. MTases have been further categorized into subtypes based on the order of several conserved motifs that make up the primary domains of the MTases. The 6mA and 4mC MTases can be classified into six categories (α, β, γ, δ, ε, and ζ) based on the N-terminal to C-terminal order of conserved motifs X, I–III (AdoMet binding motifs), IV-VIII (catalytic motifs), and the TRD [8,9]. The occurrence of signature AdoMet binding motif DPPY and catalytic motif FXGXG (abbreviated FGG) can also be used to categorize these MTases [10]. The 5mC MTases have a different set of motifs that can be used to identify them [11,12]. There are four major types of characterized RM systems [13,14]. Type I RM systems consist of pentamer complexes with two REase (R) subunits, two MTase (M) subunits, and one site specificity (S) subunit containing two tandem TRDs which recognize bipartite target sites. When the complex comes across a target site, it will either methylate the site if it is methylated on one strand (hemimethylated) or will cleave the DNA several bases upstream or downstream from the site if it is unmethylated on both strands [15,16]. Type II RM systems include MTases and REases which operate independently and target the same sites of DNA [17]. Many subgroups of Type II RM systems have been categorized, such as Type IIG which consists of independent RM enzymes capable of both MTase and REase activity [18]. In Type III RM systems, a REase subunit (Res) and MTase subunit (Mod) work together in a two-component complex, with the Mod subunit containing the TRD which recognizes the target site of the system [19]. Type IV RM systems consist only of REases, and these REases cleave methylated target sites instead of unmethylated sites [20]. MTases can also occur independently in bacteria and archaea without cognate REases. These MTases, known as orphan MTases, typically provide important functions for their host organisms [21]. In Escherichia coli, for example, the orphan adenine MTase Dam is involved in coordinating timing of DNA replication by methylating GATC sites at the origin of replication which are also targets of binding for SeqA in a hemimethylated state [22]. When SeqA binds to hemimethylated GATC sites of the origin after replication has occurred, DnaA is prevented from binding to the origin and re-initiating DNA replication [23,24]. Dam methylation is also important in the methyl-directed mismatch repair (MMR) system in E. coli, in which the complex binds to a closely located methylated GATC site on the old strand in order to target and cleave the mismatched base on the new strand [25–27]. In Caulobacter crescentus, the orphan adenine MTase CcrM is involved in regulating the expression of genes like ctrA, which are essential for cell cycle regulation [28,29]. Orphan MTases can also protect the host from parasitic RM systems by mimicking the methylation of the invader, such as orphan cytosine MTase Dcm in E. coli which methylates the same sites as RM system EcoRII and prevents degradation of the genome by the invading system [30]. Orphan MTases are more common among the bacteria and more well-conserved within a genus than RM-associated MTases, likely due to orphan MTases performing important roles within their hosts [31,32].

84 Genes 2018, 9, 129 3 of 23

DNA methylation has been well-studied in bacterial organisms. However, research has not been as extensive in the archaea, which have focused on characterizing methylation and a few RM systems primarily in thermophilic organisms [33–36]. A previous study [37] examined the genomic methylation patterns (methylome) of the halophilic archaeal organism Haloferax volcanii, a member of Class Halobacteria which are often referred to as haloarchaea, as a model for examining DNA methylation and RM systems in archaea, due to its well-established genetic system which allows it to be easily manipulated in lab settings [38,39]. In the study, the methylome of H. volcanii was sequenced via single-molecule real-time (SMRT) sequencing developed by Pacific Biosciences (PacBio) [40]. H. volcanii was observed to have two types of motifs methylated throughout its genome: Cm4TAG m6 and GCA BN6VTGC. The study also demonstrated that deletion of one of the putative RM genes (HVO_A0006) resulted in an alteration in the adenine motif, which was surprising considering that the gene is not a MTase but instead encodes an REase family protein [37]. In this follow-up study, we aim to characterize the MTases of H. volcanii through bioinformatics and gene deletions of the various predicted RM genes in the genome and sequence the methylomes of the deletion mutants via SMRT sequencing. We will also describe the production of an RM null mutant without a methylated genome, which we anticipate will be useful in future research of DNA methylation in the archaea.

2. Materials and Methods

2.1. Strains and Growth Conditions All strains and plasmids used in this study are listed and described in Table 1. Strains of H. volcanii were grown at 42 ◦C while shaking at 200 rpm using either rich medium (Hv-YPC) or selective rich medium (Hv-Ca) developed by Allers et al. [41] and outlined in the Halohandbook [42]. For ∆pyrE2 strains, media was supplemented with uracil (50 µg/mL) and 5-fluoroorotic acid (50 µg/mL) as needed. Strains of E. coli were grown at 37 ◦C while shaking at 200 rpm in either Lysogeny Broth (LB) or S.O.C. medium (Clontech, Mountain View, CA, USA). Ampicillin (100 µg/mL) and X-gal (20 µg/mL) were added to the media when needed.

Table 1. Strains and plasmids used in this study.

Strain/Plasmid Name Description Source E. coli HST08 Cloning strain of E. coli Clontech, Cat. # 636763 H. volcanii DS2 Wild-type strain Mullakhanbhai and Larsen [43] H. volcanii H26 ∆pyrE2; uracil auxotroph derived from DS2 Bitan-Banin et al. [44] H. volcanii H1206 ∆pyrE2/∆mrr; derived from H26 Allers et al. [45] H. volcanii ∆rmeRMS rmeRMS deletion strain; derived from H1206 This study H. volcanii ∆HVO_0794 Deletion strain of HVO_0794, HVO_A0006, and HVO_A0237; This study ∆HVO_A0006 ∆HVO_A0237 Derived from H1206 Deletion strain of HVO_0794, rmeRMS, HVO_A0006, HVO_A0074, H. volcanii ∆RM This study HVO_A0079, and HVO_A0237; derived from H1206 Vector used for gene deletion. Contains lacZ cloning site for pTA131 blue-white screening, ampR ampicillin resistance gene for Allers, Ngo, Mevarech and Lloyd [41] selectivity in E. coli, and pyrE2 for screening in H. volcanii. Derivative of pTA131 with flanking regions of HVO_A0006 p∆HVO_A0006 Ouellette, Jackson, Chimileski and Papke [37] inserted into lacZ cloning site for gene deletion Derivative of pTA131 with flanking regions of HVO_0794 inserted p∆HVO_0794 This study into lacZ cloning site for gene deletion Derivative of pTA131 with flanking regions of rmeRMS operon p∆rmeRMS This study inserted into lacZ cloning site for gene deletion Derivative of pTA131 with flanking regions of HVO_A0074 p∆HVO_A0074 This study inserted into lacZ cloning site for gene deletion Derivative of pTA131 with flanking regions of HVO_A0079 p∆HVO_A0079 This study inserted into lacZ cloning site for gene deletion Derivative of pTA131 with flanking regions of HVO_A0237 p∆HVO_A0237 This study inserted into lacZ cloning site for gene deletion

85 Genes 2018, 9, 129 4 of 23

2.2. Deletion of Annotated Restriction Modification Genes Putative RM genes in H. volcanii were identified from New England BioLabs Restriction Enzyme Database (REBASE) [10] and National Center for Biotechnology Information (NCBI) (Table 2). These genes were deleted in H. volcanii strain H1206 utilizing a method developed by Blaby et al. [39] that uses the In-Fusion HD Cloning Kit (Clontech). Primers were designed to construct deletion plasmids of putative RM genes and are listed in Table 3. These deletion plasmids were then used to transform H. volcanii H1206 and its derivatives via the polyethylene glycol (PEG)-mediated transformation protocol outlined in the Halohandbook [42]. Transformed cell cultures were plated on Hv-Ca and incubated at 42 ◦C for 5–7 days. Pop-ins were detected via a colony PCR screen using external deletion primers and visualized via gel electrophoresis. Confirmed pop-ins were then plated on Hv-Ca with 50 µg/mL 5-fluoroorotic acid (5-FOA) and 50 µg/mL uracil to pop-out genes of interest. Successful pop-outs were identified via PCR screen as performed for detecting pop-ins. Final deletion strains obtained though this method are listed in Table 1.

Table 2. List of restriction-modification (RM) genes annotated in Haloferax volcanii DS2.

Putative RM Location in the Gene Locus Tag Gene Symbol Gene Size (bp) Notes Classification Genome Type IV restriction HVO_0682 mrr Type IV 1005 Chromosome endonuclease Putative 4mC CTAG HVO_0794 zim Type II 1095 Chromosome methyltransferase 2223, Operon which contains a HVO_2269-2271 rmeRMS Type I 1395, Chromosome putative Type I RM system 1233 with 6mA methyltransferase Putative 5mC GTCGAC HVO_C0040 - Type II 1221 pHV1 methyltransferase Putative restriction HVO_A0006 - Type IIG 660 pHV4 endonuclease fragment of HVO_A0237 [37] Putative Type IV HVO_A0074 - Type IV 3315 pHV4 restriction endonuclease Putative 6mA Type IIG HVO_A0079 - Type IIG 3267 pHV4 RM protein Putative 6mA HVO_A0237 - Type IIG 2199 pHV4 methyltransferase and target recognition protein

Table 3. List of primers used in this study.

Primer Name Primer Sequence Primer Description HVO_A0006 FR1F 5’- CGG GCC CCC CCT CGA GTC AAG CAG TAC CTC AAC ACG GAA CA -3’ Used to amplify the flanking regions of HVO_A0006 FR1R 5’- ATT CGA TAT CAA GCT GTC CTC AAG GAC GGC CTG CA -3’ HVO_A0006 for insertion into pTA131 HVO_A0006 FR2F 5’- GAC GCG TTG ATA TCC CGA AGA ATC CAG TTG CTG TCT GTT G -3’ linearized with XhoI and XbaI HVO_A0006 FR2R 5’- GGA TAT CAA CGC GTC GGC ATT ATG CAA TTC -3’ (Primer designs from Ouellette et al. [37]) HVO_0794 FR1F 5’- GCT TGA TAT CGA ATT CCC CGC GAG AAA GAC GAG AAG -3’ Used to amplify the flanking regions of HVO_0794 FR1R 5’- GCC TGG TAG AAT TCC CGT ACG GAC GTA TTT CCC CCG A -3’ HVO_0794 for insertion into pTA131 HVO_0794 FR2F 5’- GGA ATT CTA CCA GGC CGA CGA CGA CCG ACT GAG GTC -3’ linearized with EcoRI and BamHI HVO_0794 FR2R 5’- TAG AAC TAG TGG ATC CGA ACG GCA GCA CCC GCG A -3’ rmeRMS FR1F 5’- CGG GCC CCC CCT CGA GTC GGT GTT TCG CAG GTC ATT C -3’ Used to amplify the flanking regions of rmeRMS FR1R 5’- GGG CGC CAT CCA GGC TAC TCA CTA TAT TTC ACT CGG GGT A -3’ the rmeRMS operon for insertion into rmeRMS FR2F 5’- GCC TGG ATG GCG CCC CTC ACC TAT TCA CAA AGA GAG GAA -3’ pTA131 linearized with XhoI and ClaI rmeRMS FR2R 5’- ATA TCA AGC TTA TCG ATT GCC GGG TTT CCT GTT ATT TT CT -3’ HVO_A0074 FR1F 5’ GCT TGA TAT CGA ATT CTG CTC GTC GTG GTA CTT GTC -3’ Used to amplify the flanking regions of HVO_A0074 FR1R 5’- CGG TAC CGA CAT GTT ATC TCA ATG CAG CGC TTC TC -3’ HVO_A0074 for insertion into pTA131 HVO_A0074 FR2F 5’- AAC ATG TCG GTA CCG TTG AGG ACT GGG AGC GTA TC -3’ linearized with EcoRI and XbaI HVO_A0074 FR2R 5’- TGG CGG CCG CTC TAG TTG AAG GTC TGT GTC GCA TC -3’ HVO_A0079 FR1F 5’- GCG AAT TGG GTA CCG GCC CCG ACC TGC CTT GG -3’ Used to amplify the flanking regions of HVO_A0079 FR1R 5’- GCC TGG TAG AAT TCC CCG TGT TCG GTT AAG CGG A -3’ HVO_A0079 for insertion into pTA131 HVO_A0079 FR2F 5’- GGA ATT CTA CCA GGC AAT GGG ATC TGA CGA AGG AGG -3’ linearized with ApaI and EcoRV HVO_A0079 FR2R 5’- CTG CAG GAA TTC GAT CAT AAA GGT CTT CTC AGC GGT T -3’

86 Genes 2018, 9, 129 5 of 23

Table 3. Cont.

Primer Name Primer Sequence Primer Description HVO_A0237 FR1F 5’- CGG GCC CCC CCT CGA GGT TCG CGC TCT TGC TCA GGT -3’ Used to amplify the flanking regions of HVO_A0237 FR1R 5’- GGG ATC CAA AGC TTG AGG CGT TGC TGA CAT TAT ATC GAA G -3’ HVO_A0237 for insertion into pTA131 HVO_A0237 FR2F 5’- CAA GCT TTG GAT CCC GCC TTT CTG CTG GCG AGT TTC C -3’ linearized with XhoI and XbaI HVO_A0237 FR2R 5’- TGG CGG CCG CTC TAG AAT ATC GCG CAG CTC TAT CGG G -3’ M13(-21) F 5’- GTA AAA CGA CGG CCA GT -3’ Used for amplifying the multiple cloning M13 R 5’- AGG AAA CAG CTA TGA CCA T -3’ site of pTA131 for screening

2.3. DNA Purification for Single-Molecule Real-Time Sequencing In order to extract DNA from the H. volcanii deletion mutants for SMRT sequencing, 40 mL of cell cultures in late log to early stationary phase (optical density (OD600) = ~0.8–1) were pelleted and lysed by resuspension in 10 mM Tris-HCl buffer (pH 8.0). The lysates were then treated with proteinase K (50 µg/mL final concentration) and incubated overnight at 37 ◦C to hydrolyze the proteins, after which the DNA was extracted via ethanol precipitation. Performing three rounds of phenol-chloroform and two rounds of chloroform extractions purified the DNA further. A final ethanol precipitation was then performed on the remaining DNA, and the samples were purified of RNA via Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA). The 260/280 ratio, 260/230 ratio, and DNA concentration of each sample was quantified via Nanodrop and Qubit dsDNA BR assay (Invitrogen, Eugene, OR, USA).

2.4. Single-Molecule Real-Time Sequencing The DNA samples extracted from the H. volcanii deletion mutants were analyzed via PacBio SMRT sequencing in order to determine the methylomes of the strains. The samples were submitted to the Keck Sequencing Facility of the Yale School of Medicine for SMRT sequencing analysis. A detailed outline of the SMRT sequencing strategy can be found in the PacBio manual “Detecting DNA Base Modifications: SMRT Analysis of Microbial Methylomes” [46]. Libraries of 0.25 to 3 kb were constructed for each strain using an estimated input size of 4 Mb, and were each sequenced in one SMRT cell, resulting in coverage of ~150x for ∆RM, ~400x for ∆HVO_0794 ∆HVO_A0006 ∆HVO_A0237, and ~120x for ∆rmeRMS. The SMRT Portal program “RS_Modification_and_Motif_Analysis.1” was used under default settings to determine the modified bases and motifs in ∆RM. The modified bases and motifs in the other strains were identified using the same SMRT Portal program, but with the ∆RM analysis results used as a control. All analyses used the H. volcanii DS2 genome as the reference sequence [47].

2.5. Bioinformatics Analysis Homologs of the putative RM proteins in H. volcanii DS2 were discovered via protein BLAST (blastp) [48] and position-specific iterative BLAST (PSI-BLAST) of the non-redundant protein database on NCBI as well as translated nucleotide BLAST (tblastn) of the NCBI Halobacteria genome database (taxid 183963) (E-value cutoff of 1e−4). Homologs were also identified using the REBASE database of RM genes [10]. Alignments of identified homologs were performed using Clustal X2 [49]. Protein domain architecture and sequence features, including identification of Structural Classification of Proteins (SCOP) superfamilies, were analyzed using InterProScan [50]. Homologs to the CTAG modification methyltransferase in H. volcanii DS2 (ADE02643) in completely sequenced halobacterial genomes were identified using the NCBI's blast site for microbial genomes, selecting completely sequenced halobacterial genomes and the tblastn search algorithm. The list of completely sequenced genomes did not completely correspond to the genomes searched through the NCBI's web interface; therefore, the absence of a homolog in a genome was confirmed through a targeted tblastn search. The one additional homologous gene identified in this step was added to the phylogenetic analysis. Matching nucleotide sequences were retrieved, translated into protein and aligned using muscle [51] as implemented in Seaview [52], and used for phylogenetic reconstruction using PhyML [53] with the following parameters: LG substitution model, 100 bootstrap samples, 4 substitution rate categories with estimated Gamma distribution parameter, and estimated

87 Genes 2018, 9, 129 6 of 23 fraction of invariant sites, and a tree topology search using both Nearest Neighbor Interchange and Subtree Pruning and Regrafting. CTAG and GATC frequency and cumulative occurrence of these motifs were calculated with an in house Perl (Practical Extraction and Report Language) script.

2.6. Haloferax volcanii Growth Experiments Haloferax volcanii strains H26 and ∆RM (Table 1) were grown in Hv-YPC medium to mid-log phase (OD600 = ~0.6–0.8). The cell cultures were then diluted in Hv-YPC to an OD600 of ~0.01 and were each distributed into 24 wells of a 96-well plate, with each well receiving 200 µL of culture. One well on the plate received 200 µL of Hv-YPC to be used as a blank reading. The 96 well plate was then covered with sealing tape and inserted into a Multiscan FC plate reader (Fisher Scientific, Waltham, MA, USA), which ◦ recorded the OD620 of each well every hour for 72 h while incubating and shaking the plate at 42 C.

3. Results

3.1. Bioinformatics Analysis Supports Identification of HVO_0794 as a Chromosomal 4mC CTAG MTase The putative 4mC CTAG MTase HVO_0794 was analyzed bioinformatically. A blastp analysis identified a homolog to the enzyme in Methanothermobacter thermautotrophicus named M.MthZI (GenBank CAA48447) which has been experimentally characterized as a 4mC CTAG MTase [33]. Two other homologs were also identified via blastp that were also experimentally characterized via unpublished work according to REBASE: M.BfaI (GenBank ADQ20483) in Bacteroides fragilis and M.MjaI (GenBank AAB98988) in Methanocaldococcus jannaschii DSM 2661. These homologs are similar in size to HVO_0794, ranging between 303 to 364 amino acids in length. Also, these enzymes are classified as Type II, subtype β 4mC MTases on REBASE, as is HVO_0794. A multiple sequence alignment (Figure 1) of HVO_0794 with these homologs and homolog M.HsaR1I (GenBank CAP14114) from Halobacterium salinarium R1 indicate significant sequence similarity is shared in the N-terminal and central regions of the amino acid sequences. This region of sequence similarity belongs to the S-adenosyl-L-methionine-dependent methyltransferase superfamily domain SSF53335 identified by InterProScan in the SCOP database. Signature N4-methyltransferase motifs PR00508 from the protein motif database PRINTS were also identified in the region via InterProScan (data not shown). A closer examination of the alignment revealed the presence of motifs I-X identified in M.MthZI and other 4mC MTases by Bujnicki and Radlinska [9]. These motifs are present in the alignment in the order of N-III-IV-V-VI-VII-VIII-VIII’-IX-X-I-II-C which is indicative of subtype β MTases [8]. The signature AdoMet binding motif DPPY and catalytic motif FGG are also fully conserved in the alignment (DPPY conserved here as SPPY). The FGG motif also occurs before the DPPY motif in the alignment, a motif order observed in subtype β MTases according to REBASE [10]. Overall, these results support the identification of HVO_0794 as a 4mC CTAG MTase of the subtype. In a search of all completely sequenced halobacterial genomes available on 12 June 2017, homologs that group with HVO_0794 with high statistical support were identified in 37 (88%) of the completely sequenced genomes. Genomes with the homolog present are included in Figure 2. Homologs that grouped with the H. volcanii CTAG enzyme with high support were absent in Halorubrum lacusprofundi ATCC 49239, Halorubrum trapanicum, Haloquadratum walsbyi C23, Haloquadratum walsbyi DSM 16790, and Halopenitus persicus. The GATC motif is methylated in many organisms and this methylation was shown to play a role in regulating the start of replication and in mismatch repair of newly synthesized DNA strands [23–27]. CTAG and GATC motifs occur throughout the H. volcanii genome. In contrast to other Halobacteria, both motifs show localized areas of higher concentrations within the genome. The H. volcanii chromosome possesses several origins of replication [54], one of these (oriC2) is associated with an increased concentration of CTAG and GATC motifs (Figure 3).

88 Genes 2018, 9, 129 7 of 23

In E. coli and other organisms where GATC methylation facilitates recognition of the newly synthesized DNA strand during mismatch repair, the GATC motif occurs with higher frequency as compared to the CTAG motif (22 times in E. coli, 46 times in H. trapanicum, see Table 4). In H. volcanii this ratio is only 2.8 (see Figure 2 and Table 4). Ratios below 5 were found in other Haloferax species, Haloarcula sp., Natronomonas pharaonic, Halobacterium hubeiense strain JI20-1, and Halobacterium sp. DL1; whereas Halobacterium salinarum R1 has a ratio above 20 (Table 4). In Haloferax spp. this drop in relative GATC frequency is due to a dropin frequency of the GATC (Table 4). The CTAG motif actually occurs less frequently in H. volcanii (0.24 times per 1000 nucleotides) than in other halobacterial chromosomes (average ± standard deviation in all completely sequenced chromosomes is 0.42/kb (±0.19). Genes 2018, 9, x FOR PEER REVIEW 7 of 23

Figure 1. Amino acid alignment of HVO_0794 homologs. The multiple sequence alignment includes Figure 1. Amino acid alignment of HVO_0794 homologs. The multiple sequence alignment includes HVO_0794 (Haloferax volcanii DS2; GenBank ADE02643), M.HsaR1I (Halobacterium salinarum R1; HVO_0794GenBank (Haloferax CAP14114), volcanii M.MthZIDS2; GenBank(Methanothermobacter ADE02643), thermautotrophicus M.HsaR1I; GenBank (Halobacterium CAA48447), salinarum R1; GenBank CAP14114),M.MjaI (Methanocaldococcus M.MthZI ( Methanothermobacterjannaschii DSM 2661; GenBank thermautotrophicus AAB98988) and M.BfaI; GenBank (Bacteroides CAA48447), fragilis; M.MjaI (MethanocaldococcusGenBank ADQ20483). jannaschii IdentifiedDSM 2661; N4-cytosine GenBank methyltransferase AAB98988) motifs and M.BfaII-X [9] are (Bacteroides highlighted in fragilis blue ; GenBank (representing S-adenosyl methionine (AdoMet) binding motifs), purple (representing DNA binding ADQ20483). Identified N4-cytosine methyltransferase motifs I-X [9] are highlighted in blue (representing motif), and yellow (representing catalytic motifs). Red boxes are used to identify the signature DPPY S-adenosyland methionine FGG motifs. (AdoMet) The bindingSCOP motifs),superfamily purple domain (representing S-adenosyl- DNAL-methionine-dependent binding motif), and yellow (representingmethyltransferase catalytic motifs). domain RedSSF53335 boxes is highlighted are used in to green identify throughout the signature the alignment. DPPY Clustal and X2 FGG motifs. The SCOP superfamilyshading and marking domain of aminoS-adenosyl- acids is includedL-methionine-dependent in the alignment. methyltransferase domain SSF53335

is highlightedIn a insearch green of throughout all completely the sequenced alignment. halo Clustalbacterial X2 genomes shading available and marking on 12 ofJune amino 2017, acids is includedhomologs in the alignment.that group with HVO_0794 with high statistical support were identified in 37 (88%) of the completely sequenced genomes. Genomes with the homolog present are included in Figure 2. Table 4.HomologsCTAG andthat GATCgrouped motif with frequenciesthe H. volcanii in CTAG completely enzyme sequenced with high support halobacterial were absent chromosomes. in Data forHalorubrumEscherichia lacusprofundi coli K12 are ATCC given for49239, comparison. Halorubrum trapanicum, Haloquadratum walsbyi C23, Haloquadratum walsbyi DSM 16790, and Halopenitus persicus.

Accession Number, Organism and Match to E. coli Total CTAG Total GATC CTAG/kb GATC/kb GATC/CTAG Chromosome Number Dam $ NC 013967.1 Haloferax volcanii DS2 671 1851 0.24 0.65 2.8 NZ CP007551.1 Haloferax mediterranei 1130 1472 0.38 0.50 1.3 ATCC 33500 NZ CP011947.1 Haloferax gibbonsii strain ARA6 556 1510 0.19 0.51 2.7 NC 017941.2 Haloferax mediterranei 1142 1500 0.39 0.51 1.3 ATCC 33500 NC 023013.1 Haloarcula hispanica N601 chr.1 1497 7523 0.50 2.50 5.0 NC 023010.2 Haloarcula hispanica N601 chr.2 340 1675 0.94 4.61 4.9 NZ CP010529.1 Haloarcula sp. CBA1115 1849 9333 0.54 2.73 5.0 NC 006396.1 Haloarcula marismortui ATCC 1816 6564 0.58 2.10 3.6 43049 chr.I NC 006397.1 Haloarcula marismortui ATCC 274 1011 0.95 3.51 3.7 43049 chr.II NC 015948.1 Haloarcula hispanica ATCC 1493 7462 0.50 2.49 5.0 33960 chr.I

89 Genes 2018, 9, 129 8 of 23

Table 4. Cont.

Accession Number, Organism and Match to E. coli Total CTAG Total GATC CTAG/kb GATC/kb GATC/CTAG Chromosome Number Dam $ NC 015943.1 Haloarcula hispanica ATCC 479 2210 0.98 4.52 4.6 33960 chr.II NZ LN831302.1 Halobacterium hubeiense 795 1820 0.32 0.72 2.3 strain JI20-1 NZ CP007060.1 Halobacterium sp. DL1 1168 4634 0.41 1.63 4.0 NC 002607.1 Halobacterium sp. NRC-1 551 11047 0.27 5.48 20.0 NC 010364.1 Halobacterium salinarum R1 537 10991 0.27 5.49 20.5 NC 012029.1 Halorubrum lacusprofundi ATCC 756 25016 0.28 9.15 33.1 + 49239 chr.1 NC 012028.1 Halorubrum lacusprofundi ATCC 389 3306 0.74 6.29 8.5 + 49239 chr.2 NC 007426.1 Natronomonas pharaonis 1016 1839 0.39 0.71 1.8 DSM 2160 NC 008212.1 Haloquadratum walsbyi 2290 14449 0.73 4.61 6.3 DSM 16790 NC 017459.1 Haloquadratum walsbyi C23 2281 14681 0.72 4.66 6.4 NC 014729.1 Halogeometricum borinquense 1085 8407 0.38 2.98 7.7 DSM 11551 CP024845.1 Halophilic archaeon True-ADL 1786 23542 0.54 7.07 13.2 NC 021921.1 Halorhabdus tiamatea SARL4B 892 27010 0.32 9.59 30.3 + NC 013158.1 Halorhabdus utahensis DSM 12940 964 32101 0.31 10.30 33.3 NC 013202.1 Halomicrobium mukohataei 918 27978 0.30 8.99 30.5 DSM 12286 NC 013743.1 Haloterrigena turkmenica 1347 37472 0.35 9.64 27.8 DSM 5511 NC 013922.1 Natrialba magadii ATCC 43099 1592 25139 0.42 6.70 15.8 NC 014297.1 Halalkalicoccus jeotgali B3 1106 29489 0.39 10.50 26.7 NC 015666.1 Halopiger xanaduensis SH-6 1090 33560 0.30 9.15 30.8 NC 018224.1 Natrinema sp. J7-2 1393 33801 0.38 9.14 24.3 NC 019792.1 Natronobacterium gregoryi SP2 2330 31628 0.62 8.35 13.6 NC 019962.1 Natrinema pellirubrum DSM 15624 1384 36667 0.37 9.67 26.5 NC 019964.1 Halovivax ruber XH-70 1256 30664 0.39 9.51 24.4 NC 019974.1 Natronococcus occultus SP4 1534 42563 0.38 10.61 27.7 NC 020388.1 Natronomonas moolapensis 8.8.11 1088 23003 0.37 7.90 21.1 NC 021313.1 Salinarchaeum sp. Harcht-Bsk1 948 33056 0.29 10.15 34.9 NZ AP017558.1 Halopenitus persicus DNA 695 31995 0.23 10.78 46.0 + CBA1233 NZ AP017569.1 Halorubrum trapanicum DNA 426 19948 0.15 7.03 46.8 + CBA1232 NZ CP007055.1 Halostagnicola larsenii XH-48 1094 24861 0.39 8.91 22.7 NZ CP008874.1 Halanaeroarchaeum 596 15696 0.29 7.53 26.3 sulfurireducens HSR2 NZ CP011564.1 Halanaeroarchaeum 637 16067 0.30 7.55 25.2 sulfurireducens M27-SA2 NZ CP016070.1 Halodesulfurarchaeum 639 18453 0.32 9.36 28.9 formicicum HTSR1 NZ CP016804.1 Halodesulfurarchaeum 696 19038 0.33 9.13 27.4 formicicum HSR6 NZ CP019067.1 Halorientalis sp. IM1011 1046 25987 0.31 7.68 24.8 + NZ CP019285.1 Halobiforma lacisalsi AJ5 1235 40138 0.30 9.64 32.5 NZ CP019327.1 Haloterrigena daqingensis JX313 1124 25935 0.33 7.63 23.1 NZ CP019893.1 Natrialbaceae archaeon 1177 35113 0.30 8.93 29.8 + JW/NM-HA 15 Mean value per chromosome 0.42 6.34 18.4 Standard Deviation 0.19 3.35 12.7 NC 000913.3 Escherichia coli str K-12 885 19124 0.19 4.12 21.6 + substr MG1655 $: Presence of a match to the E. coli DNA adenine methyltransferase in a translated nucleotide BLAST (tblastn) search with an E-value < 10−25 are indicated by +.

90 Genes 2018, 9, 129 9 of 23 Genes 2018, 9, x FOR PEER REVIEW 8 of 23

FigureFigure 2. Maximum 2. Maximum likelihood likelihood phylogeny phylogeny of of homologs ofof the theHaloferax Haloferax volcanii volcaniiDS2 DS2 CTAG CTAG MTase MTase identified in completely sequenced haloarchaeal genomes. Numbers give non-parametric bootstrap identified in completely sequenced haloarchaeal genomes. Numbers give non-parametric bootstrap support values. The phylogeny was rooted using more divergent haloarchaeal and methanomicrobial support values. The phylogeny was rooted using more divergent haloarchaeal and methanomicrobial homologs. Genomes with a chromosome wide GATC to CTAG ratio below five are given in red, homologs.those with Genomes a GATC with to CTAG a chromosome ratio between wide 5 and GATC 14 are to given CTAG in orange. ratio below In addition, five are those given with in a red, ratio those withabove a GATC 20 are to given CTAG in black.ratio Notebetween that only5 and few 14 groups are gi areven well in supported,orange. In including addition, the thoseHaloferax withand a ratio aboveHaloarcula 20 are givengenera. in black. Note that only few groups are well supported, including the Haloferax and Haloarcula genera. The GATC motif is methylated in many organisms and this methylation was shown to play a role in regulating the start of replication and in mismatch repair of newly synthesized DNA strands [23–27]. CTAG and GATC motifs occur throughout the H. volcanii genome. In contrast to other Halobacteria, both motifs show localized areas of higher concentrations within the genome. The H. volcanii chromosome possesses several origins of replication [54], one of these (oriC2) is associated with an increased concentration of CTAG and GATC motifs (Figure 3).

91

Genes 2018, 9, 129 10 of 23

Genes 2018, 9, x FOR PEER REVIEW 9 of 23

Figure 3. Cumulative occurrence (y-axis) of GATC (orange) and CTAG (blue) motifs along the Figure 3. CumulativeHaloferax occurrencevolcanii DS2 genome (y-axis) (x-axis). of The GATC location (orange) of the origins and of replication CTAG (blue) identified motifs in Hawkins along the Haloferax volcanii DS2 genomeet al. [54] (x are-axis). indicated The on locationtop. of the origins of replication identified in Hawkins et al. [54] are indicated onIntop. E. coli and other organisms where GATC methylation facilitates recognition of the newly synthesized DNA strand during mismatch repair, the GATC motif occurs with higher frequency as 3.2. Bioinformaticscompared Analysis to the SupportsCTAG motifIdentification (22 times in E. coli of, 46 RmeM times in asH. trapanicum a Type I, 6mAsee Table MTase 4). In H. and volcanii RmeS as a Type I Specificity Subunitthis onratio the is only Chromosome 2.8 (see Figure 2 and Table 4). Ratios below 5 were found in other Haloferax species, Haloarcula sp., Natronomonas pharaonic, Halobacterium hubeiense strain JI20-1, and Halobacterium sp. DL1; whereas Halobacterium salinarum R1 has a ratio above 20 (Table 4). In Haloferax spp. this drop in The putativerelative Type GATC I frequency 6mA MTase is due to RmeM a dropin andfrequency its cognateof the GATC specificity (Table 4). The subunit CTAG motif RmeS were also analyzed via bioinformatics.actually occurs less Tblastnfrequently ofin H. the volcanii RmeM (0.24 sequencetimes per 1000 against nucleotides) the databasethan in other of Halobacteria genomes in NCBIhalobacterial (taxid 183963)chromosomes showed (average this± standard MTase deviation to be in relativelyall completely rare sequenced in this chromosomes Order, as we retrieved is 0.42/kb (±0.19). significant hits in 19 out of 181 (10.5%) genomes of Halobacteria, and 3 out of 42 (7.1%) of the fully sequenced genomes.Table Blastp 4. CTAGanalysis and GATC motif of RmeM frequencies indicated in completely thatsequenced it is halobacterial homologous chromosomes. to M.EcoKI (GenBank Data for Escherichia coli K12 are given for comparison. P08957), a well-characterized Type I 6mA MTase in E. coli [55,56]. A homolog to RmeM was also Accession Number, Organism and Chromosome Total Total CTAG GATC GATC Match to E. identified in Bacillus cereus ATCCNumber 10987 (M.BceSVI;CTAG GenBank GATC AAS39772),/kb /kb which/CTAG coli has Dam been $ characterized NC 013967.1 Haloferax volcanii DS2 671 1851 0.24 0.65 2.8 via SMRT sequencingNZ CP007551.1 [57 Haloferax], as well mediterranei as ATCC in Methanoregula 33500 1130 1472 boonei 0.386A8 (Mboo_1031; 0.50 1.3 GenBank ABS55549) NZ CP011947.1 Haloferax gibbonsii strain ARA6 556 1510 0.19 0.51 2.7 which was characterizedNC 017941.2 Haloferax via unpublished mediterranei ATCC 33500 research 1142 according 1500 0.39 to REBASE. 0.51 1.3 Theseenzymes are 477 to NC 023013.1 Haloarcula hispanica N601 chr.1 1497 7523 0.50 2.50 5.0 529 amino acids inNC length, 023010.2 Haloarcula which hispanica is similar N601 chr.2 to RmeM 340 in 1675 size. 0.94 The classification 4.61 4.9 on REBASE for these enzymes also matchesNZ CP010529.1 RmeM Haloarcula (Type sp. CBA1115 I, subtype γ 1849MTase). 9333 A 0.54 multiple 2.73 sequence 5.0 alignment (Figure 4) NC 006396.1 Haloarcula marismortui ATCC 43049 1816 6564 0.58 2.10 3.6 of these identified homologs,chr.I along with a homolog identified in Caldanaerobacter subterraneus subsp. tengcongensis MB4 (Tte_1547; GenBank AAM24756) and in Halobacterium salinarum NRC-1 (M.HspNI;

GenBank AAG18733) indicated sequence similarity is shared throughout the alignment. The same SCOP superfamily domain SSF53335 present in the HVO_0794 homologs was also identified in these homologs and spans most of the alignment. InterProScan also identified PFAM protein family database domain PF12161, an N-terminal domain present in Type I MTases which affects the affinity of the MTase for hemimethylated DNA [58]. Another PFAM domain was also detected: PF02384, which is a 6mA MTase domain found in Type I MTase enzymes. The catalytic signature motif FGG is conserved (AGG in this alignment), although the third residue of the motif is not well-conserved in M.BceSVI and M.EcoKI. The AdoMet binding signature motif DPPY is also conserved in the alignment as NPP(Y/F). Both of these signature motifs are poorly conserved in M.HspNI. The order of these signature motifs in the alignment, with FGG occurring before DPPY, is typical of subtype γ MTases according to REBASE [10]. These data indicate that RmeM is a Type I, subtype γ MTase.

92 Genes 2018, 9, x FOR PEER REVIEW 11 of 23

identified in Bacillus cereus ATCC 10987 (M.BceSVI; GenBank AAS39772), which has been characterized via SMRT sequencing [57], as well as in Methanoregula boonei 6A8 (Mboo_1031; GenBank ABS55549) which was characterized via unpublished research according to REBASE. These enzymes are 477 to 529 amino acids in length, which is similar to RmeM in size. The classification on REBASE for these enzymes also matches RmeM (Type I, subtype γ MTase). A multiple sequence alignment (Figure 4) of these identified homologs, along with a homolog identified in Caldanaerobacter subterraneus subsp. tengcongensis MB4 (Tte_1547; GenBank AAM24756) and in Halobacterium salinarum NRC-1 (M.HspNI; GenBank AAG18733) indicated sequence similarity is shared throughout the alignment. The same SCOP superfamily domain SSF53335 present in the HVO_0794 homologs was also identified in these homologs and spans most of the alignment. InterProScan also identified PFAM protein family database domain PF12161, an N-terminal domain present in Type I MTases which affects the affinity of the MTase for hemimethylated DNA [58]. Another PFAM domain was also detected: PF02384, which is a 6mA MTase domain found in Type I MTase enzymes. The catalytic signature motif FGG is conserved (AGG in this alignment), although the third residue of the motif is not well-conserved in M.BceSVI and M.EcoKI. The AdoMet binding signature motif DPPY is also conserved in the alignment as NPP(Y/F). Both of these signature motifs are poorly conserved in M.HspNI. The order of these signature motifs in the alignment, with FGG occurring Genes before2018, 9 ,DPPY, 129 is typical of subtype γ MTases according to REBASE [10]. These data indicate that 11 of 23 RmeM is a Type I, subtype γ MTase.

FigureFigure 4. Amino4. Amino acid acid alignment alignment of RmeM homologs. homologs. The The multiple multiple sequence sequence alignment alignment includes includes RmeM (Haloferax volcanii DS2; GenBank ADE02452), M.HspNI (Halobacterium salinarum NRC-1; RmeM (Haloferax volcanii DS2; GenBank ADE02452), M.HspNI (Halobacterium salinarum NRC-1; GenBank GenBank AAG18733), Mboo_1031 (Methanoregula boonei 6A8; GenBank ABS55549), Tte_1547 AAG18733), Mboo_1031 (Methanoregula boonei 6A8; GenBank ABS55549), Tte_1547 (Caldanaerobacter (Caldanaerobacter subterraneus subsp. tengcongensis MB4; AAM24756), M.BceSVI (Bacillus cereus ATCC subterraneus10987; GenBank subsp. tengcongensisAAS39772), andMB4; M.EcoKI AAM24756), (Escherichia M.BceSVI coli K-12; (GenBankBacillus cereusP08957).ATCC The PFAM 10987; N6 GenBank AAS39772),adenine-specific and M.EcoKI DNA methyltran (Escherichiasferase coli K-12; N-terminal GenBank domain P08957). PF12161 The is PFAMhighlighted N6 adenine-specific in yellow, and DNA methyltransferase N-terminal domain PF12161 is highlighted in yellow, and the PFAM DNA methylase, adenine specific domain PF02384 is highlighted in blue. Red boxes are used to identify the signature DPPY and FGG motifs. The SCOP superfamily domain S-adenosyl-L-methionine-dependent methyltransferase domain SSF53335 is highlighted in green throughout the alignment. Clustal X2 shading and marking of amino acids is included in the alignment.

Tblastn of the RmeS sequence against the NCBI database of Halobacteria genomes (taxid 183963) resulted in significant hits in 26 out of 181 (14.4%) genomes of Halobacteria, and 7 out of 42 (16.7%) of the fully sequenced genomes. Further analysis of RmeS via blastp and PSI-BLAST resulted in the identification of a homolog in E. coli K-12 called S.EcoKI (GenBank AAC77304), a well-characterized site specificity subunit that belongs to the same Type I RM system as M.EcoKI [59]. Blastp analysis also indicated that RmeS is homologous to Tte_1545 (GenBank AAM24754), a Type I site specificity subunit in Caldanaerobacter subterraneus subsp. tengcongensis MB4 which corresponds to the same RM system as Tte_1547 and has been structurally analyzed [60]. RmeS also shares homology with S.BceSVI (GenBank AAS39773), the site specificity subunit which belongs to the same Type I RM system as M.BceSVI in Bacillus cereus ATCC 10987 [57]. RmeS was also observed to be homologous to the putative cognate site specificity subunit of M.HspNI in Halobacterium salinarum NRC-1 (S.HspNI; GenBank AAG18734) as well as the putative site specificity subunit of Mboo_1031 in Methanoregula boonei 6A8 (Mboo_1032; GenBank ABS55550). These homologs range from 398 to 476 amino acids in length, similar to RmeS which is 410 amino acids long, and were all classified as Type I specificity subunits on REBASE. A multiple sequence alignment of these enzymes (Figure 5) did not show high sequence conservation

93 Genes 2018, 9, 129 12 of 23 among the homologs. However, InterProScan revealed that these homologs all shared the same SCOP superfamily DNA methylase specificity domain SSF116734. This superfamily domain was observed to occur twice in similar regions of the homologs in the alignment: one was more N-terminal in its location and the other was more C-terminal. Within these regions, the PFAM restriction endonuclease, type I, HsdS domains PF01420 were observed to occur (not shown in the alignment), which correspond to the two target recognition domains of Type I site-specificity subunits [61]. In summary, these results indicate that RmeS is the cognate Type I site-specificity subunit of RmeM. Genes 2018, 9, x FOR PEER REVIEW 13 of 23

Figure 5. Amino acid alignment of RmeS homologs. The multiple sequence alignment includes RmeS Figure(Haloferax 5. Amino volcanii acid alignmentDS2; GenBank ofRmeS ADE04051), homologs. S.HspNI The (Halobacterium multiple sequence salinarum alignment NRC-1; GenBank includes RmeS (HaloferaxAAG18734), volcanii Mboo_1032DS2; GenBank (Methanoregula ADE04051), boonei 6A8; S.HspNI ABS55550), (Halobacterium Tte1545 (Caldanaerobacter salinarum NRC-1; subterraneus GenBank AAG18734),subsp. tengcongensis Mboo_1032 MB4; (Methanoregula GenBank AAM24754), boonei 6A8; S.BceSVI ABS55550), (Bacillus Tte1545 cereus ( CaldanaerobacterATCC 10987; GenBank subterraneus subsp.AAS39773),tengcongensis and S.EcoKIMB4; GenBank(Escherichia AAM24754),coli K-12; GenBank S.BceSVI AAG18734). (Bacillus The cereus first ATCCSCOP superfamily 10987; GenBank AAS39773),domain andDNA S.EcoKI methylase (Escherichia specificity domain coli K-12; SSF116734 GenBank is highlighted AAG18734). in yellow, The and first the SCOP second superfamily one domainis highlighted DNA methylase in green. specificity Clustal X2 domainshading and SSF116734 marking isofhighlighted amino acids is in incl yellow,uded in and the the alignment. second one is highlighted in green. Clustal X2 shading and marking of amino acids is included in the alignment. 3.3. Bioinformatics Analysis Supports Annotation of HVO_C0040 as a 5mC MTase and HVO_A0079 as a 6mA MTase 3.3. Bioinformatics Analysis Supports Annotation of HVO_C0040 as a 5mC MTase and HVO_A0079 as a 6mA MTaseThe other two putative MTase genes in H. volcanii DS2, which are located on plasmids, were also examined bioinformatically. Putative Type II 5mC MTase HVO_C0040 is located on Theextrachromosomal other two putative plasmid MTase pHV1 genes and in isH. flanked volcanii DS2,by an which upstream are located IS4 family on plasmids, transposase were also examined(HVO_C0039). bioinformatically. Tblastn of Putative the HVO_C0040 Type II 5mCsequence MTase against HVO_C0040 the NCBI is database located onof extrachromosomalHalobacteria plasmidgenomes pHV1 (taxid and 183963) is flanked resulted by anin significant upstream hits IS4 in family 87 out transposaseof 181 (48.1%) (HVO_C0039).of halobacterial genomes, Tblastn of the HVO_C0040and 13 (31%) sequence of the against fully sequenced the NCBI genomes. database A ofblastp Halobacteria analysis of genomesHVO_C0040 (taxid revealed 183963) that resultedit is in homologous to M.HgiDII (GenBank CAA38941) in Herpetosiphon aurantiacus, which has been significant hits in 87 out of 181 (48.1%) of halobacterial genomes, and 13 (31%) of the fully sequenced experimentally characterized as a 5mC MTase recognizing GTCGAC [62]. HVO_C0040 was also genomes.observed A blastp to share analysis homology of with HVO_C0040 M.BbrUII (GenBank revealed ABE95799), that it is homologous which has been to characterized M.HgiDII (GenBank in CAA38941)Bifidobacterium in Herpetosiphon breve UCC2003 aurantiacus as a 5mC MTase, which [63]. has Two been other experimentallyhomologs identified characterized via blastp include as a 5mC MTasethe recognizing putative 5mC GTCGAC MTase in [ 62halobacterial]. HVO_C0040 species was Halorhabdus also observed tiamatea to SARL4B share homology (M.Hti4BORF752P; with M.BbrUII GenBank CCQ33914). Putative 5mC MTase in Acinetobacter baumannii MAR002 (M.AbaMAR002ORF10745P; GenBank KGF60346) was also identified as a homolog via blastp. These sequences range from 347 to 415 amino acids in length, which is similar to the 406-amino acid length of HVO_C0040. These sequences are also all annotated as Type II 5mC MTases on REBASE. An amino acid alignment of these homologs (Figure 6) indicates94 that sequence similarity is shared in many

Genes 2018, 9, 129 13 of 23

(GenBank ABE95799), which has been characterized in Bifidobacterium breve UCC2003 as a 5mC MTase [63]. Two other homologs identified via blastp include the putative 5mC MTase in halobacterial species Halorhabdus tiamatea SARL4B (M.Hti4BORF752P; GenBank CCQ33914). Putative 5mC MTase in Acinetobacter baumannii MAR002 (M.AbaMAR002ORF10745P; GenBank KGF60346) was also identified as a homolog via blastp. These sequences range from 347 to 415 amino acids in length, which is similar to the 406-amino acid length of HVO_C0040. These sequences are also all annotated as Type II 5mC

MTasesGenes on 2018 REBASE., 9, x FOR PEER An REVIEW amino acid alignment of these homologs (Figure 6) indicates that14 of sequence 23 similarity is shared in many regions of the amino acid sequences. Many of these regions where significantregions sequence of the amino similarity acid sequences. is observed Many are of identified these regions as signature where significant 5mC MTase sequence motifs. similarity Three is regions of sequenceobserved similarity, are identified for example,as signature are 5mC identified MTase bymot InterProScanifs. Three regions asPRINTS of sequence cytosine-specific similarity, for DNA example, are identified by InterProScan as PRINTS cytosine-specific DNA MTase signature domains MTase signature domains PR00105. Other regions match the 5mC conserved signature motifs identified PR00105. Other regions match the 5mC conserved signature motifs identified by Posfai et al. [11], by Posfai et al. [11], such as FGG, PC, ENV, QRR, and YGN (conserved here as (R/L)GN). Each sequence such as FGG, PC, ENV, QRR, and YGN (conserved here as (R/L)GN). Each sequence also belongs to also belongsthe SCOP to S the-adenosyl- SCOPLS-methionine-dependent-adenosyl-L-methionine-dependent MTase superfamily MTase domain superfamily SSF53335 identified domain SSF53335by identifiedInterProScan. by InterProScan. Overall, these Overall, data support these data the annotation support the of annotationHVO_C0040 ofas HVO_C0040a 5mC MTase. as a 5mC MTase.

Figure 6. Amino acid alignment of HVO_C0040 homologs. The multiple sequence alignment includes FigureHVO_C0040 6. Amino acid(Haloferax alignment volcanii of DS2; HVO_C0040 GenBank ADE05226), homologs. M.Hti4BORF752P The multiple sequence (Halorhabdus alignment tiamatea includes HVO_C0040SARL4B; (Haloferax GenBank volcanii CCQ33914),DS2; GenBank M.AbaMAR002ORF10745P ADE05226), M.Hti4BORF752P (Acinetobacter (Halorhabdus baumannii tiamateaMAR002;SARL4B; GenBankGenBank CCQ33914), KGF60346), M.AbaMAR002ORF10745P M.BbrUII (Bifidobacterium breve (Acinetobacter UCC2003; baumannii GenBank ABE95799),MAR002; GenBank and M.HgiDII KGF60346), M.BbrUII(Herpetosiphon (Bifidobacterium aurantiacus breve; GenBankUCC2003; CAA38941). GenBank The ABE95799), protein motif and database M.HgiDII PRINTS (Herpetosiphon cytosine-specific aurantiacus ; GenBankDNA CAA38941).methyltransferase The signature protein motifdomains database PR00105 PRINTSare highlighted cytosine-specific in yellow. Red DNA boxes methyltransferase are used to signatureidentify domains signature PR00105 FGG, PC, are ENV, highlighted QRR, and YGN in yellow. motifs described Red boxes in Posfai are used et al. to[11]. identify The SCOP signature FGG,superfamily PC, ENV, domain QRR, andS-adenosyl- YGN motifsL-methionine-dependent described in Posfai methyl ettransferase al. [11]. domain The SCOP SSF53335 superfamily is highlighted in green throughout the alignment. Clustal X2 shading and marking of amino acids is domain S-adenosyl-L-methionine-dependent methyltransferase domain SSF53335 is highlighted in green included in the alignment. throughout the alignment. Clustal X2 shading and marking of amino acids is included in the alignment. Putative Type IIG 6mA MTase HVO_A0079 is located on extrachromosomal plasmid pHV4 and Putativeis flanked Type by a IIGdownstream 6mA MTase IS4 family HVO_A0079 transposase is located(HVO_A0080). on extrachromosomal Tblastn of the HVO_A0079 plasmid pHV4 and issequence flanked against by a downstream the NCBI database IS4 family of halobacterial transposase genomes (HVO_A0080). (taxid 183963) Tblastn resulted of inthe significant HVO_A0079 sequencehits in against 101 out theof 181 NCBI (59.1%) database genomes of of halobacterial Halobacteria, and genomes 21 (50%) (taxid of the 183963) fully sequenced resulted genomes, in significant indicating a higher prevalence in the Order than the RmeM or RmeS homologs, but not as high as hits in 101 out of 181 (59.1%) genomes of Halobacteria, and 21 (50%) of the fully sequenced genomes, HVO_0794 homologs. Blastp analysis of putative Type IIG 6mA MTase HVO_A0079 identified a indicatingnumber a of higher homologs prevalence to the protein, in the including Order than RM.Aco12261II the RmeM (GenBank or RmeS ADE57453), homologs, a Type but IIG not 6mA as high as HVO_0794MTase in homologs. Aminobacterium Blastp colombiense analysis DSM of putative12261 which Type has IIGbeen 6mA identified MTase via HVO_A0079SMRT sequencing identified as a numbertargeting of homologs the motif to theCCRGA protein,m6G [32]. including HVO_A0079 RM.Aco12261II was also observed (GenBank to ADE57453), share homology a Type with IIG 6mA MTaseRM.Fla104114II in Aminobacterium (GenBank colombiense BAV07385),DSM a Type 12261 IIG 6mA which MTase has characterized been identified via unpublished via SMRT SMRT sequencing as targetingsequencing the data motif on CCRGA REBASE.m6 BlastpG[32 ].also HVO_A0079 determined wasthat HVO_A0079 also observed shared to share homology homology with with RM.Fla104114IIputative Type (GenBank IIG 6mA BAV07385), MTases in aHalorubrum Type IIG 6mAcaliforniensis MTase characterizedDSM19288 (C463_0072; via unpublished GenBank SMRT ELZ48543) and in Halophilic archaeon DL31 (RM.HarDL31ORF105P; GenBank AEN07377). These proteins range between 1117 to 1185 amino acids in length, which is similar to the 1088 amino acid length of HVO_A0079. These homologs are also all annotated on REBASE as Type IIG 6mA subtype α RM proteins, with the exception of C463_0072,95 which is not present in REBASE. A multiple

Genes 2018, 9, 129 14 of 23 sequencing data on REBASE. Blastp also determined that HVO_A0079 shared homology with putative Type IIG 6mA MTases in Halorubrum californiensis DSM19288 (C463_0072; GenBank ELZ48543) and in Halophilic archaeon DL31 (RM.HarDL31ORF105P; GenBank AEN07377). These proteins range between 1117 to 1185 amino acids in length, which is similar to the 1088 amino acid length of HVO_A0079. These homologs are also all annotated on REBASE as Type IIG 6mA subtype α RM proteins, with the exception of C463_0072, which is not present in REBASE. A multiple sequence alignment of the amino acid sequences of these homologs (Figure 7) indicated significant sequence conservation in the central region ofGenes the 2018 alignment., 9, x FOR PEER Three REVIEW large sections of this central region were observed via15 InterProScanof 23 to belong to the SCOP S-adenosyl-L-methionine-dependent methyltransferase superfamily domain sequence alignment of the amino acid sequences of these homologs (Figure 7) indicated significant SSF53335.sequence InterProScan conservation also in identified the central three region regions of the alignment. of the alignment Three large whichsections belong of this central to the PRINTS adenine-specificregion were DNA observed MTase via signature InterProScan domains to belong PR00507, to the SCOP as wellS-adenosyl- as a PFAML-methionine-dependent Eco57I domain PF07669, a domainmethyltransferase observed in well-characterized superfamily domain SSF53335. Type IIG InterProScan RM protein also Eco57I identified [64 three]. A closerregions analysisof the of the alignmentalignment revealed which the presence belong to the of FGGPRINTS (conserved adenine-specific as AGG) DNA andMTase DPPY signature (conserved domains PR00507, as NPPY) as signature well as a PFAM Eco57I domain PF07669, a domain observed in well-characterized Type IIG RM motifs inprotein the order Eco57I N-FGG-NPPY-C, [64]. A closer analysis which of the a wouldlignment follow revealed the the presence motif order of FGG N-FGG-TRD-DPPY-C (conserved as observedAGG) in subtype and DPPYα MTases(conserved according as NPPY) signature to REBASE motifs [in10 the]. However,order N-FGG-NPPY-C, no significant which would similarity was observedfollow in the the N-terminal motif order N-FGG-TRD-DPPY-C region of these proteins, observed whichin subtype is whereα MTases the according restriction to REBASE endonuclease domain is[10]. typically However, located no significant in Type similarity IIG RM was proteins. observed These in theresults N-terminal overall region support of these theproteins, annotation of which is where the restriction endonuclease domain is typically located in Type IIG RM proteins. HVO_A0079 as a Type II 6mA subtype α MTase. These results overall support the annotation of HVO_A0079 as a Type II 6mA subtype α MTase.

Figure 7. Amino acid alignment of HVO_A0079 homologs. The multiple sequence alignment includes Figure 7. HVO_A0079Amino acid (Haloferax alignment volcanii of HVO_A0079DS2; GenBank ADE01706), homologs. C463_0072 The multiple (Halorubrum sequence californiensis alignment DSM includes HVO_A007919288; (Haloferax GenBank ELZ48543), volcanii DS2; RM.HarDL31ORF105P GenBank ADE01706), (Halophilic C463_0072 archaeon DL31; (Halorubrum GenBank AEN07377), californiensis DSM 19288; GenBankRM.Aco12261II ELZ48543), (Aminobacterium RM.HarDL31ORF105P colombiense DSM 12261; (Halophilic GenBank archaeon ADE57453),DL31; and RM.Fla104114II GenBank AEN07377), RM.Aco12261II(Filimonas (Aminobacterium lacunae 104114; GenBank colombiense BAV07385).DSM The 12261; PRINTS GenBank adenine-specific ADE57453), DNA methyltransferase and RM.Fla104114II signature domains PR00507 are highlighted in yellow. Red boxes signify signature FGG and DPPY (Filimonas lacunae 104114; GenBank BAV07385). The PRINTS adenine-specific DNA methyltransferase motifs. The SCOP superfamily domain S-adenosyl-L-methionine-dependent methyltransferase signaturedomain domains SSF53335 PR00507 is highlighted are highlighted throughout inthe yellow. alignment. Red Clustal boxes X2 shading signify and signature marking of FGG amino and DPPY motifs. Theacids SCOP is included superfamily in the alignment. domain S-adenosyl-L-methionine-dependent methyltransferase domain SSF53335 is highlighted throughout the alignment. Clustal X2 shading and marking of amino acids is

included in the alignment.

96 Genes 2018, 9, 129 15 of 23

3.4. Deletion of HVO_0794, HVO_A0006, and HVO_A0237 Eliminates 4mC Methylation and Does Not Effect 6mA Methylation In order to better understand the roles of HVO_0794, HVO_A0006, and HVO_A0237 in DNA methylation, the three genes were deleted in mrr deletion strain H1206, producing a triple deletion mutant (∆HVO_0794 ∆HVO_A0006 ∆HVO_A0237). The genome of this deletion mutant was sequenced via SMRT sequencing to determine the methylome and the results are listed in Table 5. In this strain, the Cm4TAG motif that is modified in the parental strain H26 [37] is no longer detected as methylated. Also, m6 the 6mA motif GCA BN6VTGC is modified in the triple deletion mutant, with 100% of the 410 motifs in the genome identified as methylated. Between studies, there was also a difference in the percent of motifs detected as methylated in H26 compared to the triple deletion mutant. In H26, only 316 of the 410 GCABN6VTGC motifs (~77%) were detected as methylated in [37], whereas in this study all 410 motifs are modified in ∆HVO_0794 ∆HVO_A0006 ∆HVO_A0237. This discrepancy is likely the result of a difference in sequence coverage, since the mean motif coverage and QV scores (confidence scores) were greater in the triple deletion mutant compared to H26. In ∆HVO_0794 ∆HVO_A0006 m6 ∆HVO_A0237, the mean motif coverage for GCA BGN5VTGC was 130.4, a ~325% increase from the mean motif coverage of 30.7 in H26. The mean modification QV score for the 6mA motif was 213.0 in the triple deletion mutant, an increase of ~274% from the H26 mean QV score of 57.0. Therefore, it is likely that the motifs were methylated completely in both strains, but that some of those motifs were not detected as modified in H26 due to the lower coverage and mean QV scores. Overall, these results indicate that deletion of HVO_0794, HVO_A0006, and HVO_A0237 abolishes methylation of Cm4TAG, m6 but has no effect on methylation of GCA BN6VTGC.

Table 5. DNA methylation patterns detected in H. volcanii RM deletion mutants

∆HVO_0794 ∆HVO_A0006 ∆HVO_A0237 ∆rmeRMS ∆RM Motif GCAm6BNNNNNNVTGC Cm4TAG GCAm6BNNNNNNVTGC Cm4TAG GCAm6BNNNNNNVTGC Cm4TAG Methylated position 3 1 3 1 3 1 Methylation type 6mA 4mC 6mA 4mC 6mA 4mC Number of 410 0 0 1199 0 0 methylated motifs Number of motifs 410 1342 410 1342 410 1342 in genome Percent of 100 0 0 89 0 0 methylated motifs Mean modification 213.0 - - 104.1 - - QV score Mean motif coverage 130.4 - - 113.0 - -

3.5. Deletion of the rmeRMS Operon Abolishes 6mA Methylation The putative Type I operon rmeRMS was deleted in H. volcanii H1206, and sequenced via SMRT sequencing, in order to determine the role of the operon in DNA methylation. The results of the SMRT m6 analysis for this strain (∆rmeRMS) are listed in Table 5. In ∆rmeRMS, the 6mA motif GCA BN6VTGC is not detected as modified as it is in H26, and no other 6mA methylation is present [37]. Modification of Cm4TAG is still detected in the deletion strain. In ∆rmeRMS, 1199 of the 1342 CTAG motifs in the genome (~89%) are detected as methylated, These results are better than in Ouellette et al. [37] due to the better sequence coverage in ∆rmeRMS compared to H26, thus providing better detection of the methylated motifs. The mean motif coverage for Cm4TAG in this deletion mutant was 113.0. The mean modification QV score for the 4mC motif in ∆rmeRMS was 104.1. Overall, these results indicate that m6 deletion of the rmeRMS operon eliminates methylation of the GCA BN6VTGC motif.

3.6. Multi-RM Deletion Eliminates Detection of All DNA Methylation A multi-RM deletion mutant, with all putative RM genes except for HVO_C0040 deleted from the strain, was also analyzed using SMRT sequencing to determine if the deletion of these genes resulted in

97 Genes 2018, 9, 129 16 of 23 Genes 2018, 9, x FOR PEER REVIEW 17 of 23 eliminationstrain (∆RM of) are DNA listed methylation in Table 5. in TheH. volcanii6mA motif. The GCA resultsm6BN of6 theVTGC SMRT identified analysis in for H26 this is strainnot detected (∆RM) m6 m4 areas modified listed in Tablein this5. strain The 6mA [37]. motif Also, GCA the 4mCBN 6motifVTGC C identifiedTAG is also in H26 not is detected not detected as methylated. as modified No in thisother strain motifs [37 are]. Also, detected the 4mC as modified motif C m4inTAG this strain. is also These not detected results asindicate methylated. that all No DNA other methylation motifs are detectedthat can asbe modified detected in by this SMRT strain. sequencing These results has indicate been eliminated that all DNA in methylationthe multi-RM that deletion can be detectedmutant. byAlthough SMRT sequencingthe remaining has RM been gene eliminated in this strain in the (HVO_C0040 multi-RM deletion) encodes mutant. a MTase Although predicted the to remaining perform RM5mC gene methylation in this strain which (HVO_C0040 is difficult )to encodes detect via a MTase SMRT predicted sequencing to perform without 5mC Tet treatment methylation [65], which motifs is difficultof this type to detect of methylation via SMRT sequencing can still be without weakly Tet detected treatment without [65], motifsTet treatment. of this type Since of methylation even weak candetection still be of weaklymotifs was detected not observed without in Tet this treatment. strain, the Since results even indicate weak that detection HVO_C0040 of motifs is not was active not observedas an MTase. in this strain, the results indicate that HVO_C0040 is not active as an MTase.

3.7. No Defect in Growth Occurs in the Multi-RMMulti-RM DeletionDeletion ComparedCompared toto thethe ParentalParental StrainStrain − InIn E. coli coli dam dam− mutants,mutants, the the lack lack of methylation of methylation results results in growth in growth defects defects compared compared to the to wild- the wild-typetype strain strain [66]. [In66 ].order In order to determine to determine if the if lack the lack of RM of RMgenes genes resulted resulted in a in deficiency a deficiency of growth of growth in inthe the ΔRM∆RM strainstrain compared compared to to the the H26 H26 parental parental strain strain,, both both strains strains were were grown grown in Hv-YPC medium (Figure(Figure 88).). TheThe resultsresults indicate indicate that that no no significant significant difference difference in in growth. growth. Both Both strains strains entered entered log log phase phase at ~6at ~6 h, andhours, although and although∆RM initially ΔRM hadinitially a slightly had a higher slightly OD higher620 when OD it620 entered when it log entered phase, log this phase, difference this disappeareddifference disappeared after ~20 h after of growth, ~20 hours and of both growth, cultures and reached both cultures stationary reached phase stationary at ~36 h withphase similar at ~36 ODhours620 withvalues similar (Supplementary OD620 values Figure (Supplementary S1). The final Figure OD S1).620 at The for final H26 OD after620 72at for h was H26 0.335, after whereas72 hours forwas∆ 0.335,RM the whereas final OD for620 ΔwasRM 0.334.the final The OD difference620 was 0.334. between The thesedifference two averagesbetween wasthese not two significant averages basedwas not on significant the standard based error on values the standard and analysis error of va variancelues and (ANOVA) analysis single of variance factor statistical(ANOVA) analysis. single Overall,factor statistical these results analysis. indicate Overall, that therethese isresults no detectable indicate defectthat there in growth is no detectable in the ∆RM defectstrain in comparedgrowth in tothe the ΔRM H26 strain strain. compared to the H26 strain.

Figure 8. Cell density of H26 and ΔRM at stationary phase when grown on Hv-YPC, represented by Figure 8. Cell density of H26 and ∆RM at stationary phase when grown on Hv-YPC, represented by the average optical density (OD620) reading of 24 cell culture replicates after 72 hours of growth. Error the average optical density (OD620) reading of 24 cell culture replicates after 72 h of growth. Error bars bars indicate the standard error of the mean. Analysis of variance (ANOVA) single factor, p = 0.98. indicate the standard error of the mean. Analysis of variance (ANOVA) single factor, p = 0.98.

4. Discussion In a previous study on DNA methylation in H. volcanii H26 [37], two motifs were identified as modified throughout the genome of the organism: the 4mC motif Cm4TAG and the 6mA motif

98 Genes 2018, 9, 129 17 of 23

4. Discussion In a previous study on DNA methylation in H. volcanii H26 [37], two motifs were identified as modified throughout the genome of the organism: the 4mC motif Cm4TAG and the 6mA motif m6 GCA BN6VTGC. These motifs were predicted to be methylated by a putative Type II 4mC MTase encoded by HVO_0794 and a putative 6mA MTase belonging to a Type I RM system encoded by the operon HVO_2269-2271 (rmeRMS), respectively. However, there are several annotated RM genes with no predicted motif recognition. In this follow-up study, we demonstrated through successive deletions of annotated RM genes that the Cm4TAG motif is methylated by the Type II MTase HVO_0794; m6 the GCA BN6VTGC motif is methylated by the Type I RM system RmeRMS; and that the other annotated MTases do not methylate under the conditions tested. In mutants with HVO_0794 deleted from the genome, the SMRT sequencing analyses did not detect methylation of Cm4TAG or any other type of 4mC methylation, indicating that the MTase encoded by this gene is responsible for CTAG methylation since removal of this gene abolishes methylation of the motif. This result confirms predictions from previous studies [47,67] that suggested HVO_0794 is a CTAG MTase. Our bioinformatics analysis also supports the identification of this MTase as responsible for 4mC methylation, since the amino acid sequence has high similarity to previously characterized Type II 4mC CTAG MTases such as M.MthZI [33]. Although HVO_A0006 and HVO_A0237 were also deleted in the same strain as HVO_0794, neither of these genes have high similarity to 4mC CTAG MTases, and deletion of HVO_A0006 in a previous study [37] indicated that it does not affect cytosine methylation, ruling out these genes as candidates for Cm4TAG methylation. Based on our search of REBASE and NCBI, no cognate REase is encoded in the genome of H. volcanii, and deletion of the gene was not lethal as would be expected if there was a cognate REase, suggesting that HVO_0794 is an orphan MTase. The observation of an orphan CTAG MTase in H. volcanii was not unexpected based on previous work by Blow et al. [32], who found that predicted Type II CTAG orphan MTase gene families are common in the Halobacteria, occurring in 78% of halobacterial species. Several of the halobacterial species examined by Blow et al. [32] which contained the CTAG MTase family also had a high CTAG motif density at their origins of replication, suggesting that this gene family may play a role in regulating DNA replication in the Halobacteria. Our analysis of H. volcanii showed a higher CTAG motif density surrounding oriC2, but not in regions near the other two origins; however, the oriC2 region also showed the enrichment of GATC motifs was even more pronounced (see Figure 3). It remains to be established, if the stretches with higher CTAG and GATC motif density have a selected function in H. volcanii, or if they reflect gene acquisition from a donor with different compositional bias. H. volcanii has a lower GATC to CTAG ratio than organisms with homologs to the E.coli DNA adenine methyltransferase (Dam) which recognizes the GATC motif, and aids DNA repair via a methyl-directed mismatch repair system [21]. However, the decrease in the GATC to CTAG ratio in H. volcanii is due exclusively due to a drop in the frequency of the GATC motif, and not to an increase in the CTAG frequency. The role of CTAG methylation may not be of major importance for H. volcanii, as no growth defect was observed to occur in the ∆RM strain compared to the parental H26 strain. Considering that H. volcanii does not require origins of replication in order to grow efficiently [54], it is not too surprising that eliminating the putative role of HVO_0794 in regulating the origins of replication does not affect growth. m6 The absence of GCA BN6VTGC methylation in deletion mutants without the rmeRMS operon indicated that these genes are responsible for 6mA methylation in H. volcanii. Our bioinformatics analysis also indicates that RmeRMS is a Type I RM system, since both the MTase subunit RmeM and specificity subunit RmeS are homologous to well-characterized Type I 6mA MTases and specificity m6 subunits such as M.EcoKI and S.EcoKI [56]. The motif GCA BN6VTGC resembles the type of sequences targeted by Type I systems, which are typically bipartite sequences with a gap of unspecified nucleotides in the middle [15]. Therefore, the observation that rmeRMS is a Type I RM system supports the identification of this operon as responsible for 6mA methylation in H. volcanii. Previous work by

99 Genes 2018, 9, 129 18 of 23

Ouellette et al. [37] had suggested that the RM gene HVO_A0006 might have a role in 6mA methylation, since SMRT sequencing of a HVO_A0006 deletion mutant identified an alteration in the 6mA motif m6 m6 (GCA BGN5VTGC instead of GCA BN6VTGC). However, our SMRT sequencing analysis of a deletion mutant without HVO_A0006 (∆HVO_0794 ∆HVO_A0006 ∆HVO_A0237) did not demonstrate any difference in the 6mA motif compared to the H26 parental strain. This difference is likely due to better sequence coverage in our data (~400x coverage for ∆HVO_0794 ∆HVO_A0006 ∆HVO_A0237 compared to ~80x coverage for ∆HVO_A0006), allowing our analysis to identify more motifs as modified in the genome compared to the previous study [37]. This result, along with the observation that deletion of rmeRMS alone abolished detection of 6mA methylation, suggests that RmeRMS is solely responsible for adenine methylation in H. volcanii. The presence of a restriction-subunit encoding gene (rmeR) indicates that the system can also cleave DNA at unmethylated target motifs, acting as a fully functional RM system. It is possible that this system functions in protecting H. volcanii from foreign DNA similar to RM systems in other organisms [68]. This defense system, in combination with clustered regularly interspaced short palindromic repeats (CRISPR-Cas system) [69,70], is likely advantageous to H. volcanii considering that haloarchaeoviruses are highly abundant in hypersaline environments [71,72]. RmeRMS may also be involved in regulating gene transfer, which in H. volcanii can occur within species as well as between species [73]. In E. coli, for example, the Type I RM system EcoKI has been demonstrated to reduce uptake via conjugation of unmethylated plasmids with EcoKI target sites [74]. A study by Lin et al. [75] indicated that RM systems could limit the size of DNA fragments that can recombine in Helicobacter pylori. Correlation between RM system occurrence and phylogenetic clusters was observed in Haemophilus influenzae, suggesting that RM systems are acting as barriers to genetic exchange between phylogenetic groups [76]. RM systems have also been hypothesized to drive population dynamics and diversification in Neisseria meningitidis [77]. Our results also indicate that homologs to RmeRMS, as well as the other predicted RM genes, do not occur as frequently in haloarchaeal species compared to the CTAG orphan MTase family genes; the RmeRMS system could possibly limit gene transfer that occurs with other individuals in the environment which lack the system, thus acting as a barrier to recombination for H. volcanii. Our SMRT sequencing analyses indicate that deletion of HVO_0794 and rmeRMS is sufficient to eliminate detection of methylation in H. volcanii, indicating that the other predicted MTase genes in the organism (HVO_C0040, HVO_A0079, and HVO_A0237) do not contribute to methylation. The reason for the apparent inactivity of these genes is unclear, considering that our bioinformatics analyses indicate that these genes share homology with characterized MTase genes in other organisms. Inactive RM genes have been observed to occur in other organisms, such as those belonging to the MmeI RM gene family [78]. These inactive genes can be readily reactivated, and were hypothesized to exist in a population to confer a selective advantage to individuals when the population undergoes disruption from foreign parasitic DNA [78]. However, these MmeI RM genes were inactivated as a result of disruptive mutations which do not appear to be present in the predicted RM genes in H. volcanii. It is possible that these genes may still be active in H. volcanii but are only expressed under conditions not tested. However, a blastn search of the H. volcanii DS2 transcriptome data from Babski et al. [79] (sequence read archive (SRA) accession number SRP076059) using these three genes as queries suggested that they are expressed, although the search results do not indicate if functional protein products of these genes are produced. Interestingly, our results indicate that HVO_C0040 and HVO_A0079 are flanked by transposase genes similarly to HVO_A0237 [37]. Perhaps these genes are mobile genetic elements, as is the case with many RM genes [80], and they became non-functional when transferred into H. volcanii. Nevertheless, HVO_C0040, HVO_A0079, and HVO_A0237 do not appear to contribute to the methylome of H. volcanii under standard growing conditions. A possible exception to this list is HVO_C0040, the only remaining putative MTase gene in our ∆RM strain, which our bioinformatics analysis indicates is a 5mC MTase. Methylation patterns produced from 5mC MTases are typically difficult to detect with SMRT sequencing in the absence of Tet treatment [40,65].

100 Genes 2018, 9, 129 19 of 23

However, 5mC methylation usually produces some signal via SMRT sequencing, yet we did not detect it in any of our multiple analyses including of the null mutant, leading us to think it is not methylating. We also report in this study the construction of a MTase null mutant in H. volcanii. This strain (∆RM) has all putative RM genes deleted from the genome with the exception of HVO_C0040, and our SMRT sequencing analysis indicates that this strain has no genomic methylation. We anticipate that this strain will be useful for future studies that examine the impact of RM systems and DNA methylation on cellular processes in H. volcanii, in which the ∆RM strain can be compared to the parental strain H26 that has all the RM genes intact. This strain could also be useful for characterizing putative MTase genes in other halobacterial strains via gene knock-in and SMRT sequencing to determine the target sites for methylation. We expect that this strain will be a useful tool in the quest to better understand DNA methylation and RM systems in the Halobacteria and other archaeal organisms.

Supplementary Materials: The following are available online at www.mdpi.com/2073-4425/9/3/129/s1. Figure S1. Growth curves of H26 and ∆RM when grown on Hv-YPC, represented by the average optical density (OD620) readings of 24 cell culture replicates taken each hour for 72 h of growth. Acknowledgments: We would like to thank Laura Jackson, Victoria Hull, and Matthew Coxe for their assistance in designing primers for deleting H. volcanii RM genes. We would also like to thank Thorsten Allers from the University of Nottingham for providing us with H. volcanii strains and plasmids, and Guilin Wang at the Yale Center for Genomic Analysis for advice and consultation on PacBio SMRT sequencing technology and methylation detection. This research was funded by: a joint grant from the National Science Foundation and the US/Israel Binational Science Foundation, award number NSF 1716046; the U.S./Israel Binational Science Foundation, award number BSF 2013061; and the National Aeronautics and Space Administration, grant number NNX15AM09G. Author Contributions: M.O., R.T.P., and J.P.G. conceived and designed the experiments; M.O., J.L., and J.P.G. performed the experiments; M.O., R.T.P., A.M.M., and J.P.G. analyzed the data; M.O., R.T.P., and J.P.G. wrote the paper. Conflicts of Interest: The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

1. Bickle, T.A.; Krüger, D.H. Biology of DNA restriction. Microbiol. Rev. 1993, 57, 434–450. [PubMed] 2. Tock, M.R.; Dryden, D.T. The biology of restriction and anti-restriction. Curr. Opin. Microbiol. 2005, 8, 466–472. [CrossRef][PubMed] 3. Kobayashi, I. Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res. 2001, 29, 3742–3756. [CrossRef][PubMed] 4. Ohno, S.; Handa, N.; Watanabe-Matsui, M.; Takahashi, N.; Kobayashi, I. Maintenance forced by a restriction-modification system can be modulated by a region in its modification enzyme not essential for methyltransferase activity. J. Bacteriol. 2008, 190, 2039–2049. [CrossRef][PubMed] 5. Korlach, J.; Turner, S.W. Going beyond five bases in DNA sequencing. Curr. Opin. Struct Biol. 2012, 22, 251–261. [CrossRef][PubMed] 6. Bheemanaik, S.; Reddy, Y.V.; Rao, D.N. Structure, function and mechanism of exocyclic DNA methyltransferases. Biochem. J. 2006, 399, 177–190. [CrossRef][PubMed] 7. Malone, T.; Blumenthal, R.M.; Cheng, X. Structure-guided analysis reveals nine sequence motifs conserved among DNA amino-methyltransferases, and suggests a catalytic mechanism for these enzymes. J. Mol. Biol. 1995, 253, 618–632. [CrossRef][PubMed] 8. Bujnicki, J.M. Sequence permutations in the molecular evolution of DNA methyltransferases. BMC Evol. Biol. 2002, 2, 3. [CrossRef] 9. Bujnicki, J.M.; Radlinska, M. Molecular evolution of DNA-(cytosine-N4) methyltransferases: Evidence for their polyphyletic origin. Nucleic Acids Res. 1999, 27, 4501–4509. [CrossRef][PubMed] 10. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE—A database for DNA restriction and modification: Enzymes, genes and genomes. Nucleic Acids Res. 2010, 38, D234–D236. [CrossRef][PubMed] 11. Posfai, J.; Bhagwat, A.S.; Posfai, G.; Roberts, R.J. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 1989, 17, 2421–2435. [CrossRef][PubMed]

101 Genes 2018, 9, 129 20 of 23

12. Militello, K.T.; Simon, R.D.; Qureshi, M.; Maines, R.; VanHorne, M.L.; Hennick, S.M.; Jayakar, S.K.; Pounder, S. Conservation of Dcm-mediated cytosine DNA methylation in Escherichia coli. FEMS Microbiol. Lett. 2012, 328, 78–85. [CrossRef][PubMed] 13. Roberts, R.J.; Belfort, M.; Bestor, T.; Bhagwat, A.S.; Bickle, T.A.; Bitinaite, J.; Blumenthal, R.M.; Degtyarev, S.; Dryden, D.T.; Dybvig, K.; et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 2003, 31, 1805–1812. [CrossRef][PubMed] 14. Ershova, A.S.; Rusinov, I.S.; Spirin, S.A.; Karyagina, A.S.; Alexeevski, A.V. Role of restriction-modification systems in prokaryotic evolution and ecology. Biochemistry 2015, 80, 1373–1386. [CrossRef][PubMed] 15. Loenen, W.A.; Dryden, D.T.; Raleigh, E.A.; Wilson, G.G. Type I restriction enzymes and their relatives. Nucleic Acids Res. 2014, 42, 20–44. [CrossRef][PubMed] 16. Liu, Y.P.; Tang, Q.; Zhang, J.Z.; Tian, L.F.; Gao, P.; Yan, X.X. Structural basis underlying complex assembly and conformational transition of the type I R-M system. Proc. Natl. Acad. Sci. USA 2017, 114, 11151–11156. [CrossRef][PubMed] 17. Pingoud, A.; Wilson, G.G.; Wende, W. Type II restriction endonucleases—A historical perspective and more. Nucleic Acids Res. 2014, 42, 7489–7527. [CrossRef][PubMed] 18. Morgan, R.D.; Bhatia, T.K.; Lovasco, L.; Davis, T.B. MmeI: A minimal type II restriction-modification system that only modifies one DNA strand for host protection. Nucleic Acids Res. 2008, 36, 6558–6570. [CrossRef] [PubMed] 19. Rao, D.N.; Dryden, D.T.; Bheemanaik, S. Type III restriction-modification enzymes: A historical perspective. Nucleic Acids Res. 2014, 42, 45–55. [CrossRef][PubMed] 20. Loenen, W.A.; Raleigh, E.A. The other face of restriction: Modification-dependent enzymes. Nucleic Acids Res. 2014, 42, 56–69. [CrossRef][PubMed] 21. Adhikari, S.; Curtis, P.D. DNA methyltransferases and epigenetic regulation in bacteria. FEMS Microbiol. Rev. 2016, 40, 575–591. [CrossRef][PubMed] 22. Sánchez-Romero, M.A.; Busby, S.J.; Dyer, N.P.; Ott, S.; Millard, A.D.; Grainger, D.C. Dynamic distribution of SeqA protein across the chromosome of Escherichia coli K-12. MBio 2010, 1.[CrossRef][PubMed] 23. Kang, S.; Lee, H.; Han, J.S.; Hwang, D.S. Interaction of SeqA and Dam methylase on the hemimethylated origin of Escherichia coli chromosomal DNA replication. J. Biol. Chem. 1999, 274, 11463–11468. [CrossRef] [PubMed] 24. Waldminghaus, T.; Skarstad, K. The Escherichia coli SeqA protein. Plasmid 2009, 61, 141–150. [CrossRef] [PubMed] 25. Welsh, K.M.; Lu, A.L.; Clark, S.; Modrich, P. Isolation and characterization of the Escherichia coli mutH gene product. J. Biol. Chem. 1987, 262, 15624–15629. [PubMed] 26. Au, K.G.; Welsh, K.; Modrich, P. Initiation of methyl-directed mismatch repair. J. Biol. Chem. 1992, 267, 12142–12148. [PubMed] 27. Putnam, C.D. Evolution of the methyl directed mismatch repair system in Escherichia coli. DNA Repair 2016, 38, 32–41. [CrossRef][PubMed] 28. Zweiger, G.; Marczynski, G.; Shapiro, L. A Caulobacter DNA methyltransferase that functions only in the predivisional cell. J. Mol. Biol. 1994, 235, 472–485. [CrossRef][PubMed] 29. Domian, I.J.; Reisenauer, A.; Shapiro, L. Feedback control of a master bacterial cell-cycle regulator. Proc. Natl. Acad. Sci. USA 1999, 96, 6648–6653. [CrossRef][PubMed] 30. Takahashi, N.; Naito, Y.; Handa, N.; Kobayashi, I. A DNA methyltransferase can protect the genome from postdisturbance attack by a restriction-modification gene complex. J. Bacteriol. 2002, 184, 6100–6108. [CrossRef][PubMed] 31. Seshasayee, A.S.; Singh, P.; Krishna, S. Context-dependent conservation of DNA methyltransferases in bacteria. Nucleic Acids Res. 2012, 40, 7066–7073. [CrossRef][PubMed] 32. Blow, M.J.; Clark, T.A.; Daum, C.G.; Deutschbauer, A.M.; Fomenkov, A.; Fries, R.; Froula, J.; Kang, D.D.; Malmstrom, R.R.; Morgan, R.D.; et al. The epigenomic landscape of prokaryotes. PLoS Genet. 2016, 12, e1005854. [CrossRef][PubMed] 33. Nolling, J.; de Vos, W.M. Identification of the CTAG-recognizing restriction-modification systems MthZI and MthFI from Methanobacterium thermoformicicum and characterization of the plasmid-encoded mthZIM gene. Nucleic Acids Res. 1992, 20, 5047–5052. [CrossRef][PubMed]

102 Genes 2018, 9, 129 21 of 23

34. Grogan, D.W. Cytosine methylation by the suai restriction-modification system: Implications for genetic fidelity in a hyperthermophilic archaeon. J. Bacteriol. 2003, 185, 4657–4661. [CrossRef][PubMed] 35. Ishikawa, K.; Watanabe, M.; Kuroita, T.; Uchiyama, I.; Bujnicki, J.M.; Kawakami, B.; Tanokura, M.; Kobayashi, I. Discovery of a novel restriction endonuclease by genome comparison and application of a wheat-germ-based cell-free translation assay: PabI (50-GTA/C) from the hyperthermophilic archaeon Pyrococcus abyssi. Nucleic Acids Res. 2005, 33, e112. [CrossRef][PubMed] 36. Watanabe, M.; Yuzawa, H.; Handa, N.; Kobayashi, I. Hyperthermophilic DNA methyltransferase M.PabI from the archaeon Pyrococcus abyssi. Appl. Environ. Microbiol. 2006, 72, 5367–5375. [CrossRef][PubMed] 37. Ouellette, M.; Jackson, L.; Chimileski, S.; Papke, R.T. Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006. Front. Microbiol. 2015, 6, 251. [CrossRef][PubMed] 38. Allers, T.; Mevarech, M. Archaeal genetics—The third way. Nat. Rev. Genet. 2005, 6, 58–73. [CrossRef] [PubMed] 39. Blaby, I.K.; Phillips, G.; Blaby-Haas, C.E.; Gulig, K.S.; El Yacoubi, B.; de Crecy-Lagard, V. Towards a systems approach in the genetic analysis of archaea: Accelerating mutant construction and phenotypic analysis in Haloferax volcanii. Archaea 2010, 2010, 426239. [CrossRef][PubMed] 40. Clark, T.A.; Murray, I.A.; Morgan, R.D.; Kislyuk, A.O.; Spittle, K.E.; Boitano, M.; Fomenkov, A.; Roberts, R.J.; Korlach, J. Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucleic Acids Res. 2012, 40, e29. [CrossRef][PubMed] 41. Allers, T.; Ngo, H.P.; Mevarech, M.; Lloyd, R.G. Development of additional selectable markers for the halophilic archaeon Haloferax volcanii based on the leuB and trpA genes. Appl. Environ. Microbiol. 2004, 70, 943–953. [CrossRef][PubMed] 42. Dyall-Smith, M. The Halohandbook: Protocols for Haloarchaeal Genetics Ver. 7.1. Available online: http://www.haloarchaea.com/resources/halohandbook/index.html (accessed on 6 September 2013). 43. Mullakhanbhai, M.F.; Larsen, H. Halobacterium volcanii spec. Nov., a dead sea halobacterium with a moderate salt requirement. Arch. Microbiol. 1975, 104, 207–214. [CrossRef][PubMed] 44. Bitan-Banin, G.; Ortenberg, R.; Mevarech, M. Development of a gene knockout system for the halophilic archaeon Haloferax volcanii by use of the pyrE gene. J. Bacteriol. 2003, 185, 772–778. [CrossRef][PubMed] 45. Allers, T.; Barak, S.; Liddell, S.; Wardell, K.; Mevarech, M. Improved strains and plasmid vectors for conditional overexpression of his-tagged proteins in Haloferax volcanii. Appl. Environ. Microbiol. 2010, 76, 1759–1769. [CrossRef][PubMed] 46. Detecting DNA Base Modifications: SMRT Analysis of Microbial Methylomes. Available online: https://github. com/PacificBiosciences/Bioinformatics-Training/wiki/Methylome-Analysis-Technical-Note (accessed on 16 June 2015). 47. Hartman, A.L.; Norais, C.; Badger, J.H.; Delmas, S.; Haldenby, S.; Madupu, R.; Robinson, J.; Khouri, H.; Ren, Q.; Lowe, T.M.; et al. The complete genome sequence of Haloferax volcanii DS2, a model archaeon. PLoS ONE 2010, 5, e9605. [CrossRef][PubMed] 48. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [CrossRef] 49. Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.; McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; et al. Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23, 2947–2948. [CrossRef][PubMed] 50. Quevillon, E.; Silventoinen, V.; Pillai, S.; Harte, N.; Mulder, N.; Apweiler, R.; Lopez, R. InterProScan: Protein domains identifier. Nucleic Acids Res. 2005, 33, W116–W120. [CrossRef][PubMed] 51. Edgar, R.C. Muscle: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [CrossRef][PubMed] 52. Gouy, M.; Guindon, S.; Gascuel, O. Seaview version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 2010, 27, 221–224. [CrossRef][PubMed] 53. Guindon, S.; Dufayard, J.F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol. 2010, 59, 307–321. [CrossRef][PubMed] 54. Hawkins, M.; Malla, S.; Blythe, M.J.; Nieduszynski, C.A.; Allers, T. Accelerated growth in the absence of DNA replication origins. Nature 2013, 503, 544–547. [CrossRef][PubMed]

103 Genes 2018, 9, 129 22 of 23

55. Haberman, A.; Heywood, J.; Meselson, M. DNA modification methylase activity of Escherichia coli restriction endonucleases K and P. Proc. Natl. Acad. Sci. USA 1972, 69, 3138–3141. [CrossRef][PubMed] 56. Kennaway, C.K.; Obarska-Kosinska, A.; White, J.H.; Tuszynska, I.; Cooper, L.P.; Bujnicki, J.M.; Trinick, J.; Dryden, D.T. The structure of M.EcoKI type I DNA methyltransferase with a DNA mimic antirestriction protein. Nucleic Acids Res. 2009, 37, 762–770. [CrossRef][PubMed] 57. Murray, I.A.; Clark, T.A.; Morgan, R.D.; Boitano, M.; Anton, B.P.; Luong, K.; Fomenkov, A.; Turner, S.W.; Korlach, J.; Roberts, R.J. The methylomes of six bacteria. Nucleic Acids Res. 2012, 40, 11450–11462. [CrossRef] [PubMed] 58. Kelleher, J.E.; Daniel, A.S.; Murray, N.E. Mutations that confer de novo activity upon a maintenance methyltransferase. J. Mol. Biol. 1991, 221, 431–440. [CrossRef] 59. O’Neill, M.; Powell, L.M.; Murray, N.E. Target recognition by EcoKI: The recognition domain is robust and restriction-deficiency commonly results from the proteolytic control of enzyme activity. J. Mol. Biol. 2001, 307, 951–963. [CrossRef][PubMed] 60. Gao, P.; Tang, Q.; An, X.; Yan, X.; Liang, D. Structure of HsdS subunit from Thermoanaerobacter. tengcongensis sheds lights on mechanism of dynamic opening and closing of type I methyltransferase. PLoS ONE 2011, 6, e17346. [CrossRef][PubMed] 61. Janscak, P.; Bickle, T.A. The DNA recognition subunit of the type IB restriction-modification enzyme EcoAI tolerates circular permutions of its polypeptide chain. J. Mol. Biol. 1998, 284, 937–948. [CrossRef][PubMed] 62. Dusterhoft, A.; Kroger, M. Cloning, sequence and characterization of m5C-methyltransferase-encoding gene, hgiDIIM (GTCGAC), from Herpetosiphon. giganteus strain Hpa2. Gene 1991, 106, 87–92. [CrossRef] 63. O'Connell Motherway, M.; O'Driscoll, J.; Fitzgerald, G.F.; Van Sinderen, D. Overcoming the restriction barrier to plasmid transformation and targeted mutagenesis in Bifidobacterium breve UCC2003. Microb. Biotechnol. 2009, 2, 321–332. [CrossRef][PubMed] 64. Janulaitis, A.; Vaisvila, R.; Timinskas, A.; Klimasauskas, S.; Butkus, V. Cloning and sequence analysis of the genes coding for Eco57I type IV restriction-modification enzymes. Nucleic Acids Res. 1992, 20, 6051–6056. [CrossRef][PubMed] 65. Clark, T.A.; Lu, X.; Luong, K.; Dai, Q.; Boitano, M.; Turner, S.W.; He, C.; Korlach, J. Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation. BMC Biol. 2013, 11, 4. [CrossRef] [PubMed] 66. Westphal, L.L.; Sauvey, P.; Champion, M.M.; Ehrenreich, I.M.; Finkel, S.E. Genomewide Dam methylation in Escherichia coli during long-term stationary phase. mSystems 2016, 1, e00130-16. [CrossRef][PubMed] 67. Charlebois, R.L.; Lam, W.L.; Cline, S.W.; Doolittle, W.F. Characterization of pHV2 from Halobacterium volcanii and its use in demonstrating transformation of an archaebacterium. Proc. Natl. Acad. Sci. USA 1987, 84, 8530–8534. [CrossRef][PubMed] 68. Stern, A.; Sorek, R. The phage-host arms race: Shaping the evolution of microbes. Bioessays. 2011, 33, 43–51. [CrossRef][PubMed] 69. Maier, L.K.; Dyall-Smith, M.; Marchfelder, A. The adaptive immune system of Haloferax volcanii. Life 2015, 5, 521–537. [CrossRef][PubMed] 70. Stachler, A.E.; Turgeman-Grott, I.; Shtifman-Segal, E.; Allers, T.; Marchfelder, A.; Gophna, U. High tolerance to self-targeting of the genome by the endogenous CRISPR-Cas system in an archaeon. Nucleic Acids Res. 2017, 45, 5208–5216. [CrossRef][PubMed] 71. García-Heredia, I.; Martín-Cuadrado, A.B.; Mojica, F.J.; Santos, F.; Mira, A.; Antón, J.; Rodríguez-Valera, F. Reconstructing viral genomes from the environment using fosmid clones: The case of haloviruses. PLoS ONE 2012, 7, e33802. [CrossRef][PubMed] 72. Luk, A.W.; Williams, T.J.; Erdmann, S.; Papke, R.T.; Cavicchioli, R. Viruses of haloarchaea. Life (Basel) 2014, 4, 681–715. [CrossRef][PubMed] 73. Naor, A.; Lapierre, P.; Mevarech, M.; Papke, R.T.; Gophna, U. Low species barriers in halophilic archaea and the formation of recombinant hybrids. Curr. Biol. 2012, 22, 1444–1448. [CrossRef][PubMed] 74. Roer, L.; Aarestrup, F.M.; Hasman, H. The EcoKI type I restriction-modification system in Escherichia coli affects but is not an absolute barrier for conjugation. J. Bacteriol. 2015, 197, 337–342. [CrossRef][PubMed] 75. Lin, E.A.; Zhang, X.S.; Levine, S.M.; Gill, S.R.; Falush, D.; Blaser, M.J. Natural transformation of Helicobacter pylori involves the integration of short DNA fragments interrupted by gaps of variable size. PLoS Pathog. 2009, 5, e1000337. [CrossRef][PubMed]

104 Genes 2018, 9, 129 23 of 23

76. Erwin, A.L.; Sandstedt, S.A.; Bonthuis, P.J.; Geelhood, J.L.; Nelson, K.L.; Unrath, W.C.; Diggle, M.A.; Theodore, M.J.; Pleatman, C.R.; Mothershed, E.A.; et al. Analysis of genetic relatedness of Haemophilus influenzae isolates by multilocus sequence typing. J. Bacteriol. 2008, 190, 1473–1483. [CrossRef][PubMed] 77. Budroni, S.; Siena, E.; Dunning Hotopp, J.C.; Seib, K.L.; Serruto, D.; Nofroni, C.; Comanducci, M.; Riley, D.R.; Daugherty, S.C.; Angiuoli, S.V.; et al. Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc. Natl. Acad. Sci. USA 2011, 108, 4494–4499. [CrossRef][PubMed] 78. Morgan, R.D.; Dwinell, E.A.; Bhatia, T.K.; Lang, E.M.; Luyten, Y.A. The MmeI family: Type II restriction-modification enzymes that employ single-strand modification for host protection. Nucleic Acids Res. 2009, 37, 5208–5221. [CrossRef][PubMed] 79. Babski, J.; Haas, K.A.; Nather-Schindler, D.; Pfeiffer, F.; Forstner, K.U.; Hammelmann, M.; Hilker, R.; Becker, A.; Sharma, C.M.; Marchfelder, A.; et al. Genome-wide identification of transcriptional start sites in the haloarchaeon Haloferax volcanii based on differential RNA-seq (dRNA-seq). BMC Genomics 2016, 17, 629. [CrossRef][PubMed] 80. Furuta, Y.; Abe, K.; Kobayashi, I. Genome comparison and context analysis reveals putative mobile forms of restriction-modification systems and related rearrangements. Nucleic Acids Res. 2010, 38, 2428–2443. [CrossRef][PubMed]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

105

Chapter 5. The Impact of Restriction-Modification Systems on Mating in Haloferax volcanii

Matthew Ouellette, Andrea M. Makkay, Artemis S. Louyakis, and R. Thane Papke

This chapter examines the efficiency of cell-to-cell mating between Haloferax volcanii strains with restriction-modification (RM) genes and strains without RM genes. The purpose of this research was to determine whether differences in RM systems could limit mating between different H. volcanii strains, which would suggest RM systems could act as barriers to recombination in a model halobacterial organism. Regarding my contributions to this chapter, I designed the experiments and conducted the research with assistance from Andrea M. Makkay. I also analyzed the data with the assistance of Artemis S. Louyakis, and wrote the manuscript with assistance from Artemis S. Louyakis and R. Thane Papke.

106

Abstract

Halobacteria have been observed to be highly recombinogenic, frequently exchanging

genetic material. Several barriers to recombination in the Halobacteria have been examined, such

as CRISPR-Cas, glycosylation, and archaeosortases, but these are low barriers that do not drastically reduce recombination. Another potential barrier could be restriction-modification

(RM) systems, which cleave DNA that is not properly methylated, thus limiting the exchange of genetic material between cells which do not have compatible RM systems. In order to examine the role of RM systems on limiting recombination in the Halobacteria, the impact of RM systems on cell-to-cell mating in Haloferax volcanii, a well-characterized method of genetic exchange and recombination in a halobacterial species, was examined. Strains which possessed all naturally-occurring RM system genes in H. volcanii (RM+) and strains without these RM systems (ΔRM) were mated together to compare the efficiency of mating between RM- compatible strains and RM-incompatible strains. The results indicated that mating RM- incompatible strains together resulted in a decrease in mating efficiency compared to mating

RM-compatible strains together, suggesting that RM systems limit mating in H. volcanii, but do not act as absolute barriers to recombination. Therefore, RM systems could be low barriers to recombination in the Halobacteria, with RM-incompatible strains exchanging genetic material at a lower frequency than those with compatible RM systems, similar to other low recombination barriers in the Halobacteria.

Introduction

In Bacteria and Archaea, distantly related organisms can exchange genetic material through horizontal gene transfer and recombination. Many different strategies exist to allow for lateral transfer of genetic material (Blakely, 2015). One method is natural transformation, where

107

extracellular DNA is acquired via natural competence systems (Chen and Dubnau, 2004). Gene

transfer can also occur via transduction, where bacteriophages transfer DNA between host cells

(Touchon et al., 2017). Cells can also transfer genetic material via conjugation, where cells come

into contact with each other and transfer plasmid DNA and integrative conjugal elements

between each other, usually through specific structures such as a type IV pilus (Banuelos-

Vazquez et al., 2017). Newly acquired genetic material which is not self-replicating is

incorporated into the host’s genome via homologous recombination, in which the DNA is

integrated at homologous sites in the genome (Rocha et al., 2005). Although gene transfer can

occur between species of distant lineages, barriers to exchange do exist which can prevent

transfer between certain species. Some of these barriers are physiological in nature (Thomas and

Nielsen, 2005), such as surface exclusion which limits conjugation by preventing pilus formation and DNA transfer between the cells (Arutyunov and Frost, 2013), host range limitations of transferred plasmids (Hulter et al., 2017), or the lack of DNA uptake signals in eDNA which prevents its use by naturally competent cells (Smith et al., 1999; Spencer-Smith et al., 2016).

Barriers to recombination can result in horizontal gene transfer being more likely to occur among

more closely related strains and species over transfer events between more distantly-related species (Andam and Gogarten, 2011).

Restriction-modification (RM) systems can also potentially act as barriers to gene transfer. RM systems consist of a restriction endonuclease (REase) and a DNA methyltransferase

(MTase) which both recognize the same target sequence of DNA. The MTase will methylate a base at the target site, whereas the REase will cleave the site if it is not methylated (Ershova et al., 2015). These systems act as defense mechanisms for their host organisms, in which potentially harmful foreign DNA which is not properly methylated is cleaved by the REase while

108

the host’s own genome is protected due to methylation (Bickle and Kruger, 1993; Tock and

Dryden, 2005). The ability of these systems to restrict foreign DNA could allow them limit genetic exchange between species, thus potentially driving the diversification of microbial

populations (Erwin et al., 2008; Budroni et al., 2011). Studies have demonstrated that RM

systems can limit conjugal transfer of plasmids (Roer et al., 2015), and that they can limit the size of recombinant DNA fragments that are obtained via natural transformation (Lin et al.,

2009).

Genetic recombination has been observed to occur frequently in several representatives of the halophilic archaeal class Halobacteria. In a study by Papke et al. (2007), Halorubrum strains isolated from saltern ponds in Spain and a hypersaline lake in Algeria were observed to cluster into three major phylogroups, with sequence diversification being driven primarily by recombination within the phylogroups rather than mutations. In a study by Fullmer et al. (2014),

Halorubrum isolates from the same hypersaline lake in Iran were observed to cluster into distinct

phylogroups, with each group sharing an average nucleotide identity (ANI) of greater than 98%.

Recombination was also observed to occur frequently within the phylogroups, but at a lower rate

between the phylogroups (Fullmer et al., 2014). The observation that recombination is more frequent between closely-related members of a phylogroup, and that interspecies mating is less frequent than intraspecies mating, suggests that there are barriers to recombination which limit genetic exchange, eventually resulting in the diversification of haloarchaeal populations.

One mechanism of gene transfer in the Halobacteria is cell-to-cell mating, which has been characterized in Haloferax volcanii (Rosenshine et al., 1989). In this process, the cells come together and fuse into a heterodiploid state which contains the genetic material of both parental cells. This state allows for gene transfer and recombination between the parental cells.

109

After genetic exchange, the cells will separate into hybrids of the parental strains (Rosenshine et

al., 1989; Ortenberg et al., 1998; Naor et al., 2012). This process appears to have a low species

barrier. In a study by Naor et al. (2012), H. volcanii was mated with the closely-related species

Haloferax mediterranei, resulting in successful hybrids. However, the interspecies mating

efficiency was observed to be lower than intraspecies mating events, suggesting that barriers to

recombination exist which limit interspecies mating events between H. volcanii and H.

mediterranei. One barrier to mating is CRISPR-Cas systems, which consist of short, repeated,

spacer sequences acquired from foreign genetic elements that act as immunity systems for the

host (Barrangou et al., 2007). These spacer sequences are used to produce RNAs known as

crRNAs, which interact with Cas proteins to target and degrade invasive, foreign elements

(Koonin et al., 2017). CRISPR-Cas systems have been identified in halobacterial species such as

H. volcanii and H. mediterranei (Li et al., 2013; Maier et al., 2019), and research has indicated

that these systems can limit interspecies mating between H. volcanii and H. mediterranei when

the chromosome of one species is designed to be targeted by the partner species, although they

do not act as total barriers to recombination (Turgeman-Grott et al., 2019). The glycosylation of surface glycoproteins has also been observed to affect mating in H. volcanii. A study by Shalev et al. (2017) tested the mating efficiency of H. volcanii mutant with the glycosylation genes aglB

and agl15, and observed a dramatic decrease in mating efficiency when the deletion mutants

were mated together, indicating that proper glycosylation is required for mating. However,

mating the deletion mutant with their parental strains resulted in a less notable decrease in

mating, suggesting that glycosylation limits mating between strains with different glycosylation

patterns, but is likely not an absolute barrier to recombination (Shalev et al., 2017). Modification

of surface proteins by archaeosortases has also been observed to affect mating. A study by Abdul

110

Halim et al. (2013) examined the mating efficiency of a H. volcanii strain with the

archaeosortase gene artA deleted, and observed that mating decreased in the deletion mutant

compared to the parental strain, but was not a total barrier to recombination. Overall, CRISPR-

Cas, glycosylation, and archaeosortases act as low barriers to mating, but do not act as total

barriers to recombination.

Another possible barrier to mating in the H. volcanii might be RM systems. Studies have characterized a few of these systems in H. volcanii, including a Type I RM system which targets the motif GCABN6VTGC (Ouellette et al., 2015; Ouellette et al., 2018). A Type IV REase

known as Mrr has also been characterized in H. volcanii, and has been observed to reduce

transformation efficiency on GATC-methylated plasmids (Holmes et al., 1991; Allers et al.,

2010). However, the overall role of these systems on cell-to-cell mating and recombination has

not been examined in detail. In this study, derivatives of the H. volcanii RM system deletion

strain from Ouellette et al. (2018) were used in mating experiments to determine the impact of

RM systems on cell-to-cell mating in H. volcanii.

Materials and Methods

Strains and Growth Conditions

All strains and plasmids used in this study are recorded in Table 1. Haloferax volcanii

strains were grown in either rich undefined medium (Hv-YPC) or selective undefined medium

(Hv-Ca) developed by Allers et al. (2004) and listed in the Halohandbook (Dyall-Smith, 2009).

Media was supplemented with uracil (50 µg/mL) and 5-fluoroorotic acid (5-FOA) (50 µg/mL) as

needed to grow ΔpyrE2 strains. For ΔtrpA strains, the media was supplemented with tryptophan

(50 µg/mL) as needed, whereas thymidine (40 µg/mL) and hypoxanthine (40 µg/mL) were

supplemented as needed when growing ΔhdrB strains. The strains were grown at 42 °C while

111

shaking at 200 rpm. Escherichia coli strains were grown at 37 °C while shaking at 200 rpm in

lysogeny broth (LB), with ampicillin (100 µg/mL) added to the medium as needed.

Table 1. List of plasmids and strains used in this study.

Strain/Plasmid Name Description Source

H. volcanii H53 ΔpyrE2 ΔtrpA uracil and (Allers et al., 2004) tryptophan auxotrophic strain of wild-type H. volcanii H. volcanii H98 ΔpyrE2 ΔhdrB uracil, (Allers et al., 2004) thymidine, and hypoxanthine auxotrophic strain of wild- type H. volcanii H. volcanii ΔRM Deletion strain of RM genes (Ouellette et al., 2018) HVO_0794, rmeRMS, HVO_A0006, HVO_A0074, HVO_A0079, and HVO_A0237 derived from a ΔpyrE2 uracil auxotrophic strain of H. volcanii H. volcanii RM ΔtrpA ΔtrpA tryptophan auxotrophic This study strain derived from ΔRM H. volcanii RM ΔhdrB ΔhdrB thymidine and This study hypoxanthine auxotrophic strain derived from ΔRM E. coli HST08 E. coli cloning strain Clontech, Cat. # 636763

pTA95 Plasmid used to delete trpA (Allers et al., 2004) gene pTA155 Plasmid used to delete hdrB (Allers et al., 2004) gene

Deletion of trpA and hdrB Genes from ΔRM Strain

Plasmids pTA95 and pTA155 were used to delete trpA and hdrB, respectively, from H. volcanii strain ΔRM. These plasmids were transformed into the ΔRM strain using the polyethylene glycol (PEG)-mediated transformation protocol from the Halohandbook (Dyall-

Smith, 2009), with resulting transformants being plated on Hv-Ca for 5-7 days. Screening for

pop-ins was performed via colony PCR with screening primers (Table 2) and gel electrophoresis

112

for visualization. Pop-outs were obtained by plating confirmed pop-ins on Hv-Ca plates with 5-

FOA, uracil, tryptophan, thymidine, and hypoxanthine. Successful pop-outs were identified via

replica plating onto Hv-Ca plates with uracil but without tryptophan, thymidine, or

hypoxanthine, as well as via the colony PCR screening method used to detected pop-ins.

Table 2. List of primers used in this study.

Primer Primer Sequence Primer Description Name dTrp5F 5’- GCTCTAGAACGCGCTCGGGCAGGTCTTACTGG -3’ Used to screen for deletion dTrp3R 5’- CCGGTGAGTCTCTAGACGTTTTCGTCCG -3’ of trpA (Primer designs from Allers et al., (2004)) d_hdrBF 5’- TCCCGCCGTGTCACTACA -3’ Used to screen for deletion d_hdrBR 5’- ACGTTCACGACGGTACAGGG -3’ of hdrB

Mating H53 and H98 Strains with RM ΔtrpA and RM ΔhdrB Strains

Experiments were set up following a mating protocol adapted from Naor et al. (2012).

Cultures of H53, H98, RM ΔtrpA, and RM ΔhdrB were grown in triplicate across three experiments (9 replicates total) to an OD600 of ~1-1.1, with 2 mL of two different cultures applied to 0.2µm filters. The cultures were mixed in the following combinations: H53 × H98,

H53 × RM ΔhdrB, RM ΔtrpA × H98, and RM ΔtrpA × RM ΔhdrB. The resulting filters were then placed on plates of Hv-Ca with uracil, tryptophan, thymidine, and hypoxanthine and incubated at 42 °C for 2 days. The filters were then transferred to 2-mL tubes containing 1 mL of liquid Hv-Ca medium and shaken at 200 rpm for ~1 hour. The filters were then diluted and plated onto Hv-Ca plates with uracil to determine the number of recombinants, and Hv-Ca plates with uracil, tryptophan, thymidine, and hypoxanthine to determine the total number of viable cells. The plates were incubated at 37 °C for 1 week. The number of recombinants was divided by the total number of viable cells to calculate the mating efficiency of each mated combination.

113

Mann-Whitney U tests were performed in R with the package ggpubr v0.2.1 (Kassambara,

2018). A boxplot was constructed in R using the package ggplot2 v3.2.0 (Wickham, 2016).

Results

Mating Efficiency is Lower When Mating RM-incompatible Strains

In order to determine whether RM systems in H. volcanii can act as a barrier to mating,

H. volcanii strains H53 and H98, which contained the full set of RM system genes (RM+), and

ΔRM derivative strains which were missing RM system genes (RM ΔtrpA, RM ΔhdrB) were

mated together (Figure 1). Each strain was mated with a partner strain with a matching set of RM

genes (RM-compatible; H53 × H98, RM ΔtrpA × RM ΔhdrB) and a partner strain without

matching RM gene sets (RM-incompatible; H53 × RM ΔhdrB, RM ΔtrpA × H98). In each

mating event, one strain is a tryptophan auxotroph (ΔtrpA) and the other strain is a thymidine

auxotroph (ΔhdrB). Therefore, by mating the strains together and plating them on selective plates

without tryptophan or thymidine, recombinants can be selected which contain both trpA and

hdrB from both parental strains, and mating efficiency can be determined by calculating the

number of colonies on the selective plates divided by the total number of viable cells for each

mating event.

114

Figure 1. Box plot for mating efficiencies of H. volcanii strains H53 and H98 with RM system

genes (RM+) and ΔRM derivative strains without RM systems (RM ΔtrpA, RM ΔhdrB). Mating crosses between RM-compatible strains (H53 × H98, RM ΔtrpA × RM ΔhdrB) and RM- incompatible strains (H53 × RM ΔhdrB, RM ΔtrpA × H98) were performed. Mating efficiency is

expressed as the average number of colonies on selective plates divided by the total number of

viable cells of each mating cross performed in triplicate for three experiments (9 replicates total).

Red dots represent outliers. The p-values are from Mann-Whitney U tests of the differences between each mating cross.

The results indicate that mating H53 with H98 (H53 × H98) had an average mating efficiency of 1.6 × 10-4. In comparison, mating H53 with RM ΔhdrB (H53 × RM ΔhdrB) resulted

in an average mating efficiency of 5.2 × 10-5, representing a ~66% decrease from H53 × H98.

The difference between these efficiencies was supported as significant by Mann-Whitney U (p =

0.004). When mating RM ΔtrpA with H98 (RM ΔtrpA × H98), the average mating efficiency was

115

3.7 × 10-5, representing a ~77% decrease from H53 × H98. This was a significant difference

supported by Mann-Whitney U (p = 0.001).

When mating RM ΔtrpA with RM ΔhdrB (RM ΔtrpA × RM ΔhdrB), the average mating

efficiency was observed to be 9.2 × 10-5. This was a ~43% decrease in mating efficiency from

H53 × H98, but this difference was not significant according to Mann-Whitney U (p = 0.14). The mating efficiency increased from H53 × RM ΔhdrB by ~70%. However, the difference was not strongly supported as significant by Mann-Whitney U (p = 0.09). The mating efficiency of RM

ΔtrpA × RM ΔhdrB increased by ~149% from RM ΔtrpA × H98, and the differences were supported as significant via Mann-Whitney U (p = 0.02). A difference was also observed between the mating efficiency of the RM-incompatible mating events, with RM ΔtrpA × H98 exhibiting a ~33% lower mating efficiency than H53 × RM ΔhdrB. However, this difference was not supported as significant by Mann-Whitney U (p = 0.11). Overall, the results indicate that mating efficiency between RM-compatible strains is higher than the mating efficiency between

RM-incompatible strains in H. volcanii.

Discussion

This study provides evidence that RM systems might act as post-mating barriers to recombination in H. volcanii. When RM-compatible strains were mated together, such as H53 with H98 and RM ΔtrpA with RM ΔhdrB, the recombination efficiencies were similar to each other. The average mating efficiencies when crossing the RM-compatible strains were both close to the 1 × 10-4 intraspecies mating efficiency for H. volcanii observed by Naor et al. (2012), with

H53 × H98 having an average mating efficiency of ~1.6 × 10-4 and RM ΔtrpA × RM ΔhdrB

having an average mating efficiency of ~9.2 × 10-5. However, when RM-incompatible strains

were mated together, such as H53 × RM ΔhdrB or RM ΔtrpA with H98, the mating efficiencies

116

were observed to be lower than mating between RM-compatible strains. This difference indicates that when cells do not have compatible sets of RM systems, they mate less frequently, likely due to the RM systems from RM+ cells cleaving DNA from ΔRM partner cells which is not properly

methylated at recognition sites of the RM systems, thus preventing recombination from

occurring.

RM systems have been observed to limit conjugation events in bacteria. In a study by

Roer et al. (2015), for example, conjugation was observed to be limited in E. coli by the RM

system EcoKI when using plasmids with unmethylated EcoKI target sites, although they are not

a major barrier to conjugal transfer. A ~85% reduction in transfer rate was observed when

unmethylated plasmids were transferred into a recipient strain with an active EcoKI system in

comparison to the rate when using a recipient strain with a deactivated EcoKI system (Roer et

al., 2015). This reduction in E. coli conjugation is slightly higher than the ~66-77% decrease in

mating efficiency observed when mating RM-incompatible strains together, suggesting that RM

systems have a slightly lower impact on cell-to-cell mating in H. volcanii than they do on

conjugation in E. coli.

A few studies have also suggested that RM systems can limit recombination as well.

Phylogenetic studies in Haemophilus influenzae and Neisseria meningitidis have suggested RM

systems could act as barriers to recombination and drive population diversification (Erwin et al.,

2008; Budroni et al., 2011). A Type III RM system in Staphylococcus aureus has also been

demonstrated to prevent natural transformation on DNA from other bacterial species such as E.

coli (Corvaglia et al., 2010). However, other studies have indicated that DNA cleaved by RM

systems are still able to recombine, and that RM systems only limit the size of recombined DNA

fragments (Vasu and Nagaraja, 2013). A study by Chang and Cohen (1977) indicated that the

117

REase of RM system EcoRI was able to mediate site-specific recombination in E. coli. A study

of natural transformation in Helicobacter pylori by Lin et al. (2009) indicated that RM systems

limited the size of fragments imported during transformation, but did not prevent recombination

of those fragments. The decreased mating between RM-incompatible strains observed in this study suggests that RM systems act to reduce recombination in H. volcanii rather than just limit fragment size of DNA during mating, and might be more effective if the methylation site were highly distributed around the chromosome or in the middle of our genetic markers.

The results also indicate that the RM ΔtrpA and RM ΔhdrB strains, when mated together, had a slightly decreased mating efficiency compared to when H53 and H98 were mated together.

This result suggests that strains derived from ΔRM, which are missing active RM systems, are less efficient at mating than RM+ strains H53 and H98, which would indicate the RM systems

have an important role in facilitating recombination. However, this difference was not strongly

supported by Mann-Whitney U, so it is possible that this difference was due to chance. Also, the observation of an increase in mating efficiency when compared to H53 × RM ΔhdrB and RM

ΔtrpA × H98 suggests that, even if the ΔRM strains themselves have a lower mating efficiency,

RM-incompatibility between strains also results in reduced mating efficiency.

In H. volcanii, there are two major methylated motifs throughout the genome: the m4C

m4 m6 m4 motif C TAG and the m6A motif GCA BN6VTGC (Ouellette et al., 2015). The C TAG

m6 motif is methylated by the orphan MTase HVO_0794 and the GCA BN6VTGC motif is methylated by the Type I RM system RmeRMS (Ouellette et al., 2018). Since HVO_0794 is an orphan MTase and is not associated with a RM system, it is unlikely to have an impact on mating efficiency and recombination. The most likely candidate for reducing mating between RM- incompatible strains in H. volcanii is the RmeRMS system. In the ΔRM strain and its derivatives,

118

the genome is missing the operon which encodes the RmeRMS system, and is unmethylated at

its target sites (Ouellette et al., 2018). Therefore, the target sites of RmeRMS would be exposed

to cleavage when ΔRM derivative strains are mated with strains H53 or H98, which have the

RmeRMS system. While H. volcanii also has the Type IV REase Mrr, which limits

transformation efficiency with GATC-methylated plasmids from E. coli (Holmes et al., 1991;

Allers et al., 2010), this REase has not been demonstrated to cleave methylated H. volcanii DNA

and is unlikely to affect mating and recombination. Therefore, the decrease in mating between

RM-incompatible strains is likely the result of RmeRMS in the RM+ strain cleaving

unmethylated sites from the ΔRM strain and, therefore, limiting recombination between the

strains. However, examination of the trpA and hdrB marker gene sequences, which are exchanged during mating, indicated that there are no RmeRMS sites located within those genes.

The closest RmeRMS site to trpA is located 897 bp upstream of the gene, and the closest site to hdrB is located 6441 bp upstream of the gene. This would suggest that RmeRMS could negatively impact recombination even when the restriction sites are distantly located from the genes of interest. One possible explanation is that RM-incompatibility between strains reduces the time that the cells can coexist after fusing together due to DNA cleavage, resulting the cells separating early from each other and, therefore, reducing chances for recombination between the cells. It is possible that, if the restriction sites were located closer to the genes of interest, or within the genes themselves, the mating efficiency between the RM-incompatible strains would decrease further. Since the RmeRMS site is located closer to trpA than hdrB, it is possible that recombination is more limited for trpA than hdrB. This may explain the difference in mating efficiency observed between the two RM-incompatible mating crosses. However, this difference was not supported as statistically significant. Future mating experiments using ΔRM derivative

119

strains complemented with the rmeRMS operon could confirm the role of this RM system in

limiting mating in H. volcanii.

RM-incompatibility may also be an explanation for the lower interspecies mating

efficiency observed between H. volcanii and H. mediterranei (Naor et al., 2012). According to the RM database REBASE (Roberts et al., 2015), H. mediterranei has only three RM-related genes (http://rebase.neb.com/cgi-bin/onumget?8920): a Type IV REase (Mrr), an orphan m4C

CTAG MTase (M.Hme33500I), and an orphan m4C MTase which modifies the motif

HGCm4WGCK (M.Hme33500II). Since it has no Type I RM system analogous to RmeRMS in

H. volcanii, its genome is unmethylated at the target sites of that RM system. Therefore, the

genome of H. mediterranei is exposed to cleavage by the RmeRMS system when mating with H. volcanii, which could limit recombination. Interestingly, the interspecies mating efficiency observed by Naor et al. (2012) was 4.2 × 10-5, which is between the mating efficiencies observed when crossing RM-incompatible strains (~5.2 × 10-5 for H53 × RM ΔhdrB and ~3.7 × 10-5 for

RM ΔtrpA × H98). Genome sequencing of 10 hybrids also indicated that they were all H.

mediterranei which had received genetic material from H. volcanii, indicating that genetic

transfer during mating always occurred in one direction (from H. volcanii to H.

mediterranei)(Naor et al., 2012). It is possible that the presence of RmeRMS in H. volcanii may

prevent the transfer of genetic material from H. mediterranei to H. volcanii due to exposed

RmeRMS sites. It is also possible that the lower interspecies mating efficiency is due to H.

mediterranei containing CRISPR spacers which target the genome of H. volcanii, since

interspecies mating efficiency is further reduced when a CRISPR is added to H. mediterranei

which is specifically targeted by H. volcanii (Turgeman-Grott et al., 2019). Future interspecies

mating experiments in which H. volcanii ΔRM derivative strains are crossed with H.

120

mediterranei, could elucidate the impact of RM systems on mating efficiencies between these two species.

Although the Halobacteria are highly recombinogenic, distinct phylogroups have been observed even within the same geographic location (Papke et al., 2007; Fullmer et al., 2014).

This suggests that there are barriers to recombination within the Halobacteria which limit interactions between phylogroups and allow for speciation to occur. Our results indicate that when RM systems are incompatible between mating strains of H. volcanii, there is a decrease in mating efficiency. This RM-driven decrease in recombination suggests that strains of

Halobacteria which have incompatible sets of RM systems might recombine less frequently in natural environments, resulting in the eventual divergence of incompatible strains. However, the results also indicate that RM systems are not a strong barrier to recombination, and are not very efficient in preventing mating between RM-incompatible strains. Therefore, the compatibility of

RM system sets between recombining strains appear to be a low barrier to recombination, similar to CRISPR-Cas, glycosylation, and archaeosortases.

References

Abdul Halim, M.F., Pfeiffer, F., Zou, J., Frisch, A., Haft, D., Wu, S., Tolić, N., Brewer, H., Payne, S.H., and Paša‐Tolić, L. (2013). Haloferax volcanii archaeosortase is required for motility, mating, and C‐terminal processing of the S‐layer glycoprotein. Molecular microbiology 88, 1164-1175.

Allers, T., Barak, S., Liddell, S., Wardell, K., and Mevarech, M. (2010). Improved strains and plasmid vectors for conditional overexpression of His-tagged proteins in Haloferax volcanii. Appl Environ Microbiol 76, 1759-1769.

Allers, T., Ngo, H.P., Mevarech, M., and Lloyd, R.G. (2004). Development of additional selectable markers for the halophilic archaeon Haloferax volcanii based on the leuB and trpA genes. Appl Environ Microbiol 70, 943-953.

Andam, C.P., and Gogarten, J.P. (2011). Biased gene transfer in microbial evolution. Nature Reviews Microbiology 9, 543.

121

Arutyunov, D., and Frost, L.S. (2013). F conjugation: back to the beginning. Plasmid 70, 18-32.

Banuelos-Vazquez, L.A., Torres Tejerizo, G., and Brom, S. (2017). Regulation of conjugative transfer of plasmids and integrative conjugative elements. Plasmid 91, 82-89.

Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P., Moineau, S., Romero, D.A., and Horvath, P. (2007). CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709-1712.

Bickle, T.A., and Kruger, D.H. (1993). Biology of DNA restriction. Microbiol Rev 57, 434-450.

Blakely, G.W. (2015). "Mechanisms of Horizontal Gene Transfer and DNA Recombination," in Molecular Medical Microbiology. Elsevier), 291-302.

Budroni, S., Siena, E., Dunning Hotopp, J.C., Seib, K.L., Serruto, D., Nofroni, C., Comanducci, M., Riley, D.R., Daugherty, S.C., Angiuoli, S.V., Covacci, A., Pizza, M., Rappuoli, R., Moxon, E.R., Tettelin, H., and Medini, D. (2011). Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc Natl Acad Sci U S A 108, 4494-4499.

Chang, S., and Cohen, S.N. (1977). In vivo site-specific genetic recombination promoted by the EcoRI restriction endonuclease. Proc Natl Acad Sci U S A 74, 4811-4815.

Chen, I., and Dubnau, D. (2004). DNA uptake during bacterial transformation. Nat Rev Microbiol 2, 241-249.

Corvaglia, A.R., François, P., Hernandez, D., Perron, K., Linder, P., and Schrenzel, J. (2010). A type III-like restriction endonuclease functions as a major barrier to horizontal gene transfer in clinical Staphylococcus aureus strains. Proceedings of the National Academy of Sciences 107, 11954-11958.

Dyall-Smith, M. (2009). The Halohandbook: Protocols for Haloarchaeal Genetics Ver. 7.2 [Online]. Available: https://haloarchaea.com/halohandbook/ [Accessed September 6, 2013].

Ershova, A.S., Rusinov, I.S., Spirin, S.A., Karyagina, A.S., and Alexeevski, A.V. (2015). Role of restriction-modification systems in prokaryotic evolution and ecology. Biochemistry (Mosc) 80, 1373-1386.

Erwin, A.L., Sandstedt, S.A., Bonthuis, P.J., Geelhood, J.L., Nelson, K.L., Unrath, W.C., Diggle, M.A., Theodore, M.J., Pleatman, C.R., Mothershed, E.A., Sacchi, C.T., Mayer, L.W., Gilsdorf, J.R., and Smith, A.L. (2008). Analysis of genetic relatedness of Haemophilus influenzae isolates by multilocus sequence typing. J Bacteriol 190, 1473-1483.

122

Fullmer, M.S., Soucy, S.M., Swithers, K.S., Makkay, A.M., Wheeler, R., Ventosa, A., Gogarten, J.P., and Papke, R.T. (2014). Population and genomic analysis of the genus Halorubrum. Front Microbiol 5, 140.

Holmes, M.L., Nuttall, S.D., and Dyall-Smith, M.L. (1991). Construction and use of halobacterial shuttle vectors and further studies on Haloferax DNA gyrase. J Bacteriol 173, 3807-3813.

Hulter, N., Ilhan, J., Wein, T., Kadibalban, A.S., Hammerschmidt, K., and Dagan, T. (2017). An evolutionary perspective on plasmid lifestyle modes. Curr Opin Microbiol 38, 74-80.

Kassambara, A. (2018). ggpubr:‘ggplot2’Based Publication Ready Plots (2018). R package version 0.2.

Koonin, E.V., Makarova, K.S., and Zhang, F. (2017). Diversity, classification and evolution of CRISPR-Cas systems. Current opinion in microbiology 37, 67-78.

Li, M., Liu, H., Han, J., Liu, J., Wang, R., Zhao, D., Zhou, J., and Xiang, H. (2013). Characterization of CRISPR RNA biogenesis and Cas6 cleavage-mediated inhibition of a provirus in the haloarchaeon Haloferax mediterranei. Journal of bacteriology 195, 867- 875.

Lin, E.A., Zhang, X.S., Levine, S.M., Gill, S.R., Falush, D., and Blaser, M.J. (2009). Natural transformation of Helicobacter pylori involves the integration of short DNA fragments interrupted by gaps of variable size. PLoS Pathog 5, e1000337.

Maier, L.-K., Stachler, A.-E., Brendel, J., Stoll, B., Fischer, S., Haas, K.A., Schwarz, T.S., Alkhnbashi, O.S., Sharma, K., and Urlaub, H. (2019). The nuts and bolts of the Haloferax CRISPR-Cas system IB. RNA biology 16, 469-480.

Naor, A., Lapierre, P., Mevarech, M., Papke, R.T., and Gophna, U. (2012). Low species barriers in halophilic archaea and the formation of recombinant hybrids. Curr Biol 22, 1444-1448.

Ortenberg, R., Tchlet, R., and Mevarech, M. (1998). A model for the genetic exchange system of the extremely halophilic archaeon Haloferax volcanii. Microbiology and biogeochemistry of hypersaline environments 5, 331.

Ouellette, M., Gogarten, J.P., Lajoie, J., Makkay, A.M., and Papke, R.T. (2018). Characterizing the DNA methyltransferases of Haloferax volcanii via bioinformatics, gene deletion, and SMRT sequencing. Genes (Basel) 9.

Ouellette, M., Jackson, L., Chimileski, S., and Papke, R.T. (2015). Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006. Front Microbiol 6, 251.

123

Papke, R.T., Zhaxybayeva, O., Feil, E.J., Sommerfeld, K., Muise, D., and Doolittle, W.F. (2007). Searching for species in haloarchaea. Proc Natl Acad Sci U S A 104, 14092-14097.

Roberts, R.J., Vincze, T., Posfai, J., and Macelis, D. (2015). REBASE--a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res 43, D298- 299.

Rocha, E.P., Cornet, E., and Michel, B. (2005). Comparative and evolutionary analysis of the bacterial homologous recombination systems. PLoS genetics 1, e15.

Roer, L., Aarestrup, F.M., and Hasman, H. (2015). The EcoKI type I restriction-modification system in Escherichia coli affects but is not an absolute barrier for conjugation. J Bacteriol 197, 337-342.

Rosenshine, I., Tchelet, R., and Mevarech, M. (1989). The mechanism of DNA transfer in the mating system of an archaebacterium. Science 245, 1387-1389.

Shalev, Y., Turgeman-Grott, I., Tamir, A., Eichler, J., and Gophna, U. (2017). Cell surface glycosylation is required for efficient mating of Haloferax volcanii. Frontiers in microbiology 8, 1253.

Smith, H.O., Gwinn, M.L., and Salzberg, S.L. (1999). DNA uptake signal sequences in naturally transformable bacteria. Res Microbiol 150, 603-616.

Spencer-Smith, R., Roberts, S., Gurung, N., and Snyder, L.a.S. (2016). DNA uptake sequences in Neisseria gonorrhoeae as intrinsic transcriptional terminators and markers of horizontal gene transfer. Microb Genom 2, e000069.

Thomas, C.M., and Nielsen, K.M. (2005). Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 3, 711-721.

Tock, M.R., and Dryden, D.T. (2005). The biology of restriction and anti-restriction. Curr Opin Microbiol 8, 466-472.

Touchon, M., Moura De Sousa, J.A., and Rocha, E.P. (2017). Embracing the enemy: the diversification of microbial gene repertoires by phage-mediated horizontal gene transfer. Curr Opin Microbiol 38, 66-73.

Turgeman-Grott, I., Joseph, S., Marton, S., Eizenshtein, K., Naor, A., Soucy, S.M., Stachler, A.- E., Shalev, Y., Zarkor, M., and Reshef, L. (2019). Pervasive acquisition of CRISPR memory driven by inter-species mating of archaea can limit gene transfer and influence speciation. Nature microbiology 4, 177.

Vasu, K., and Nagaraja, V. (2013). Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol Mol Biol Rev 77, 53-72.

124

Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.

125

Chapter 6. The patchy distribution of restriction–modification system genes and the conservation of orphan methyltransferases in Halobacteria

Matthew S. Fullmer, Matthew Ouellette, Artemis S. Louyakis, R. Thane Papke, and J. Peter

Gogarten

In this chapter, the distribution of restriction-modification (RM) system and orphan methyltransferase (MTase) gene families is examined. This research was published in Genes in

2019. The purpose of this work was to better understand the evolutionary history of RM systems and orphan MTases in the Halobacteria and their impact on halobacterial evolution and diversification. Regarding my contributions to this chapter, I consulted with Matthew S. Fullmer as he designed the experiments and conducted the research with assistance from Artemis S.

Louyakis, R. Thane Papke, and J. Peter Gogarten. I helped analyze the data alongside Matthew

S. Fullmer, Artemis S. Louyakis, R. Thane Papke, and J. Peter Gogarten. I also wrote and the

Introduction and portions of the Results and Discussion sections of the manuscript, and assisted in writing and editing the rest of the manuscript alongside Matthew S. Fullmer, Artemis S.

Louyakis, R. Thane Papke, and J. Peter Gogarten.

Citation:

Fullmer, M. S., Ouellette, M., Louyakis, A., Papke, R. T., & Gogarten, J. P. (2019) The

patchy distribution of restriction-modification system genes and the conservation of

orphan methyltransferases in Halobacteria. Genes, 10(3). doi: 10.3390/genes10030233

https://www.mdpi.com/2073-4425/10/3/233

126

G C A T T A C G G C A T genes

Article The Patchy Distribution of Restriction–Modification System Genes and the Conservation of Orphan Methyltransferases in Halobacteria

Matthew S. Fullmer 1,2,†, Matthew Ouellette 1,†, Artemis S. Louyakis 1,† , R. Thane Papke 1,* and Johann Peter Gogarten 1,* 1 Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA; [email protected] (M.S.F.); [email protected] (M.O.); [email protected] (A.S.L.) 2 Bioinformatics Institute, School of Biological Sciences, The University of Auckland, Auckland 1010, New Zealand * Correspondence: [email protected] (R.T.P.); [email protected] (J.P.G.) † These authors contributed equally.

 Received: 15 February 2019; Accepted: 14 March 2019; Published: 19 March 2019 

Abstract: Restriction–modification (RM) systems in bacteria are implicated in multiple biological roles ranging from defense against parasitic genetic elements, to selfish addiction cassettes, and barriers to gene transfer and lineage homogenization. In bacteria, DNA-methylation without cognate restriction also plays important roles in DNA replication, mismatch repair, protein expression, and in biasing DNA uptake. Little is known about archaeal RM systems and DNA methylation. To elucidate further understanding for the role of RM systems and DNA methylation in Archaea, we undertook a survey of the presence of RM system genes and related genes, including orphan DNA methylases, in the halophilic archaeal class Halobacteria. Our results reveal that some orphan DNA methyltransferase genes were highly conserved among lineages indicating an important functional constraint, whereas RM systems demonstrated patchy patterns of presence and absence. This irregular distribution is due to frequent horizontal gene transfer and gene loss, a finding suggesting that the evolution and life cycle of RM systems may be best described as that of a selfish genetic element. A putative target motif (CTAG) of one of the orphan methylases was underrepresented in all of the analyzed genomes, whereas another motif (GATC) was overrepresented in most of the haloarchaeal genomes, particularly in those that encoded the cognate orphan methylase.

Keywords: HGT; restriction; methylation; gene transfer; selfish genes; archaea; haloarchaea; DNA methylase; epigenetics

1. Introduction DNA methyltransferases (MTases) are enzymes which catalyze the addition of a methyl group to a nucleotide base in a DNA molecule. These enzymes will methylate either adenine, producing N6-methyladenine (6mA), or cytosine, producing either N4-methylcytosine (4mC) or C5-methylcytosine (5mC), depending on the type of MTase enzyme [1]. DNA methyltransferases typically consist of three types of protein domains: an S-adenosyl-L-methionine (AdoMet) binding domain which obtains the methyl group from the cofactor AdoMet, a target recognition domain (TRD) that binds the enzyme to the DNA strand at a short nucleotide sequence known as the recognition sequence, and a catalytic domain that transfers the methyl group from AdoMet to a nucleotide at the recognition sequence [2]. The order in which these domains occur in a MTase varies and can be used to classify the enzymes into the subtypes of α, β, γ, δ, ε, and ζ MTases [3–5].

Genes 2019, 10, 233; doi:10.3390/genes10030233 www.mdpi.com/journal/genes 127 Genes 2019, 10, 233 2 of 19

In bacteria and archaea, MTases are often components of restriction–modification (RM) systems, in which an MTase works alongside a cognate restriction endonuclease (REase) that targets the same recognition site. The REase will cleave the recognition site when it is unmethylated, but the DNA will escape cutting when the site has been methylated by the MTase; this provides a self-recognition system to the host where it differentiates between its own methylated DNA and that of unmethylated, potentially harmful foreign DNA that is then digested by the host’s REase [6–8]. RM systems have also been described as addiction cassettes akin to toxin-antitoxin systems, in which postsegregational killing occurs when the RM system is lost since the MTase activity degrades more quickly than REase activity, resulting in digestion of the host genome at unmodified recognition sites [9,10]. RM systems have been hypothesized to act as barriers to genetic exchange and drive population diversification [11,12]. In Escherichia coli for example, conjugational uptake of plasmids is reduced by the RM system EcoKI when the plasmids contain EcoKI recognition sequences [13]. However, transferred DNA that is digested by a cell’s restriction endonuclease can still effectively recombine with the recipient’s chromosomal DNA [7,14,15]; the effect of DNA digestion serves to limit homologous recombinant DNA fragment size [16]. Restriction thus advantages its host by decreasing transfer of large mobile genetic elements and infection with phage originating in organisms without the cognate MTase [8], while also reducing linkage between beneficial and slightly deleterious mutations [17]. There are four major types of RM systems which have been classified in bacteria and archaea [18,19]: Type I RM systems consist of three types of subunits: REase (R) subunits, MTase (M) subunits, and site specificity (S) subunits which contain two tandem TRDs. These subunits form pentamer complexes of two R subunits, two M subunits, and one S subunit, and these complexes will either fully methylate recognition sites which are modified on only one DNA strand (hemimethylated) or cleave the DNA several bases upstream or downstream of recognition sites which are unmethylated on both strands [20,21]. The MTases and REases of Type II RM systems have their own TRDs and operate independently of each other, but each one targets the same recognition site [22]. There are many different subclasses of Type II RM system enzymes, such as Type IIG enzymes which contain both REase and MTase domains and are therefore capable of both methylation and endonuclease activity [23]. Type III RM systems consist of REase (Res) and MTase (Mod) subunits which work together as complexes, with the Mod subunit containing the TRD which recognizes asymmetric target sequences [24]. Type IV RM systems are made up of only REases, but unlike in other RM systems, these REases will target and cleave methylated recognition sites [20,25]. MTases can also exist in bacterial and archaeal hosts as orphan MTases, which occur independently of cognate restriction enzymes and typically have important physiological functions [26]. In E. coli, the orphan MTase, Dam, an adenine MTase which targets the recognition sequence GATC, is involved in regulating the timing of DNA replication by methylating the GATC sites present at the origin of replication (oriC)[27]. The protein SeqA binds to hemimethylated GATC sites at oriC, which prevents reinitiation of DNA replication at oriC after a new strand has been synthesized [28,29]. Dam methylation is also important in DNA repair in E. coli, where the methylation state of GATC sites is used by the methyl-directed mismatch repair (MMR) system to identify the original DNA strand in order to make repairs to the newly-synthesized strand [30–32]. In Cauldobacter crescentus, the methylation of target sites in genes such as ctrA by orphan adenine MTase CcrM helps regulate the cell cycle of the organism [33–35]. The importance of orphan MTases in cellular processes is likely the reason why they are more widespread and conserved in bacteria compared to MTases associated with RM systems [36,37]. MTases and RM systems have been well-studied in Bacteria, but less research has been performed in Archaea, with most studies focused on characterizing RM systems of thermophilic species [38–42]. Recent research into the halophilic archaeal species, Haloferax volcanii, has demonstrated a role for DNA methylation in DNA metabolism and probably uptake: cells could not grow on wild type E. coli DNA as a phosphorous source, whereas unmethylated E. coli was metabolized completely [43,44]. In an effort to better understand this phenomenon, we characterized the genomic methylation patterns

128 Genes 2019, 10, 233 3 of 19

(methylome) and MTases in the halophilic archaeal species Haloferax volcanii [45,46]. However, the distribution of RM systems and MTases among the Archaea has not been extensively studied, and thus their life histories and impact on host evolution are unclear. To that end we surveyed the breadth of available genomes from public databases representing the class, Halobacteria, also known as the Haloarchaea, for RM system and MTase candidate genes. We further sequenced additional genomes from the genus Halorubrum, which provided an opportunity to examine patterns among very closely related strains. Upon examining their patterns of occurrence, we discovered orphan methyltransferases widely distributed throughout the Haloarchaea. In contrast, RM system candidate genes had a sparse and spotty distribution indicating frequent gene transfer and loss. Even individuals from the same species isolated from the same environment and at the same time differed in the RM system complement.

2. Materials and Methods

2.1. Search Approach The starting data consists of 217 Halobacteria genomes from NCBI and 14 in-house sequenced genomes (Table S1). We note that some of these genomes were assembled from shotgun metagenome sequences and not from individual cultured strains. Genome completion was determined through identification of 371 Halobacteriaceae marker genes using CheckM v1.0.7 [47]. Queries for all available restriction-methylation-specificity genes were obtained from the Restriction Enzyme dataBASE (REBASE) website [48,49]. As methylation genes are classified by function rather than by homology [48], the protein sequences of each category were clustered into homologous groups (HGs) via the uclust function of the USEARCH v9.0.2132 package [50] at a 40 percent identity. The resulting ~36,000 HGs were aligned with MUSCLE v3.8.31 [51]. HMMs were then generated from the alignments using the hmmbuild function of HMMER3 v3.1b2 [52]. The ORFs of the 217 genomes were searched against the profiles via the hmmsearch function of HMMER3. Top hits were extracted and cross hits filtered with in-house Perl scripts available at the Gogarten-lab’s GitHub repository rms_analysis [53]. Steps were taken to collapse and filter HGs. First, the hits were searched against the arCOG database [54] using BLAST [55] to assign arCOG identifiers to the members of each group. Second, the R package igraph v1.2.2 [56] was used to create a list of connected components from the arCOG identifications. All members of a connected component were collapsed into a single collapsed HG (cHG). Because REBASE is a database of all methylation-restriction-related activities there are many members of the database outside our interest. At this point, we made a manual curation of our cHGs attempting to identify known functions that did not apply to our area of interest. Examples include protein methylation enzymes, exonucleases, cell division proteins, etc. The final tally of this clustering and filtering yielded 1696 hits across 48 total candidate cHGs. arCOG annotations indicate DNA methylase activity, restriction enzyme activity, or specificity module activity as part of an RM system for 26 cHGs. The remaining 22 cHGs had predominant arCOG annotations matching other functions that may reasonably be excluded from conservative RM system-specific analyses. For a graphical representation of the search strategy (Figure S1). The putative Type IV methyl-directed restriction enzyme gene mrr, which is known to be present in Haloferax volcanii, had not been assembled into a cHG. We assembled a cluster of mrr homologs and determined their presence and absence using Mrr from Haloferax volcanii DS2 (accession: ADE02322.1) as query in BLASTP searches against each genome (E-value cut-off 10−10).

2.2. Reference Phylogeny A reference tree was created using the full complement of ribosomal proteins. The ribosomal protein set for Halorubrum lacusprofundi ATCC 49239 was obtained from the BioCyc website [57]. Each protein open reading frame (ORF) was used as the query in a BLAST [55] search against each genome. Hits for each gene were aligned with MUSCLE v3.8.31 [51] and then concatenated with in-house

129 Genes 2019, 10, 233 4 of 19 scripting. The concatenated alignment was subjected to maximum likelihood phylogenetic inference in the IQ-TREE v1.6.1 suite with ultrafast bootstrapping and automated model selection [58,59]. The final model selection was LG + F + R9.

2.3. Presence–absence Phylogeny It is desirable to use maximum-likelihood methodology rather than simple distance measures. To realize this, the matrix was converted to an A/T alignment by replacing each present with an “A” and absent with a “T.” This allowed the use of an F81 model with empirical base frequencies. This confines the base parameters to only A and T while allowing all of the other advantages of an ML approach. IQ-TREE was employed to infer the tree with 100 bootstraps [59].

2.4. Horizontal Gene Transfer Detection Gene trees for each of the cHGs were inferred using RAxML v8.2.11 [60] under PROTCATLG models with 100 bootstraps. The gene trees were then improved by resolving their poorly supported in nodes to match the species tree using TreeFix-DTL [61]. Optimized gene tree rootings were inferred with the OptRoot function of Ranger-DTL. Reconciliation costs for each gene tree were computed against the reference tree using Ranger-DTL 2.0 [62] with default DTL costs. One-hundred reconciliations, each using a different random seed, were calculated for each cHG. After aggregating these with the AggregateRanger function of Ranger-DTL, the results were summarized and each prediction and any transfer inferred in 51% or greater of cases was counted as a transfer event.

2.5. Data Analysis and Presentation The presence–absence matrix of cHGs was plotted as a heatmap onto the reference phylogeny using the gheatmap function of the R Bioconductor package ggtree v1.14.4 [63,64]. The rarefaction curve was generated with the specaccum function of the vegan v2.5-3 package in R [65], and the number of genomes per homologous group was plotted with ggplot2 v3.1.0 [66]. Spearman correlations and significances between the presence–absence of cHGs was calculated with the rcorr function of the hmisc v4.1-1 package in R [67]. A significance cutoff of p < 0.05 was used with a Bonferroni correction. All comparisons failing this criterion were set to correlation = 0. These data were plotted into a correlogram via the corrplot function of the R package corrplot v0.84. To compare the Phylogeny calculated from presence–absence data to the ribosomal protein reference, the bootstrap support set of the presence–absence phylogeny was mapped onto the ribosomal protein reference tree using the plotBS function in phangorn v2.4.0 [68]. Support values equal to or greater than 10% are displayed. To compare phylogenies using Internode Certainty, scores were calculated using the IC/TC score calculation algorithm implemented in RAxML v8.2.11 [60,69]. Genomes were searched for location of cHGs. Proximity was used to determine synteny of groups of cHGs frequently identified on the same genomes. Jaccard distances between presence–absence of taxa were calculated using the distance function of the R package philentropy v0.2.0 [70]. The PCoA was generated using the wcmdscale function in vegan v2.5-3 [65]. The two best sampled genera—Halorubrum (orange) and Haloferax (red)—are colored distinctively. To determine the most likely recognition sites, each member of each cHG was searched against the REBASE Gold Standard set using BLASTp. The REBASE gold standard set was chosen over the individual gene sets on account of it having a much higher density of recognition site annotation. This simplifies the need to search for secondary hits to find predicted target sites. After applying an e-value cut-off of 10−20, the top hit was assigned to each ORF. CTAG and GATC motifs were counted with an inhouse perl script available at the Gogarten-lab’s GitHub [71]. Sets of Gene Ontology (GO) terms were identified for each cHG using Blast2GO [72]. Annotations were checked against the UniProt database [73] using arCOG identifiers.

130 Genes 2019, 10, 233 5 of 19

3. Results

3.1. RM-System Gene Distribution Analysis of 217 haloarchaeal genomes and metagenome-assembled genomes yielded 48 total candidate collapsed homologous groups (cHGs) of RM-system components. Out of these 48 cHGs, 26 had arCOG annotation suggesting DNA methylase activity, restriction enzyme activity, or specificity module activity as part of an RM system. We detected 22 weaker candidates with predominant arCOG annotations matching other functions (Table 1). Our analysis shows that nearly all of the cHGs are found more than once. (Figure 1A). Indeed, 16 families are found in 20 or more genomes each (>9%), and this frequency steadily increases culminating in five families being conserved in greater than 80 genomes each (>37%) with one cHG being in ~80% of all Haloarchaea surveyed. Though these genes appear frequently in taxa across the haloarchaeal class, the majority of each candidate RM system cHG is present in fewer than half the genomes—the second most abundantly recovered cHG is found in only ~47% of all taxa surveyed. We note that the cHGs with wide distribution are annotated as MTases without an identifiable coevolving restriction endonuclease: Groups U DNA_methylase-022; W dam_methylase-031; Y dcm_methylase-044; and AT Uncharacterized-032 (members of this cHG are also annotated as methylation subunit and N6-Adenine MTase). Rarefaction analysis indicates that ~50% of the genomes assayed contain seven dominant cHGs, and that all taxa on average are represented by half of the cHGs (Figure 1B). Together, the separate analyses indicate extensive gene gain and loss of RM-system genes. In contrast, orphan MTases in cHG U and W, and to a lesser extent Y (Figure 2) have a wider distribution in some genera.

Table 1. Collapsed homologous group descriptions $.

Alpha Code Numerical Code Annotated arCOG Function $$ arCOG Number A cHG_021 T_I_M arCOG02632 B cHG_024 T_I_M arCOG05282 C cHG_018 T_I_R arCOG00880 D cHG_034 T_I_R arCOG00879 E cHG_045 T_I_R arCOG00878 F cHG_006 T_I_S arCOG02626 G cHG_025 T_I_S arCOG02628 H cHG_036 probable_T_II_M arCOG00890 I cHG_001 T_II_M arCOG02635 J cHG_003 T_II_M arCOG02634 K cHG_011 T_II_M arCOG04814 L cHG_033 T_II_M arCOG03521 M cHG_007 T_II_R arCOG11279 N cHG_013 T_II_R arCOG11717 O cHG_023 T_II_R arCOG03779 P cHG_029 T_II_R arCOG08993 Q cHG_042 Adenine_DNA_methylase_probable_T_III_M arCOG00108 R cHG_008 T_III_R arCOG06887 S cHG_009 T_III_R_probable arCOG07494 T cHG_014 Adenine_DNA_methylase arCOG00889 U cHG_022 DNA_methylase arCOG00115 V cHG_027 DNA_methylase arCOG00129 W cHG_031 dam_methylase arCOG03416 X cHG_035 probable_RMS_M arCOG08990 Y cHG_044 dcm_methylase arCOG04157 Z cHG_048 Adenine_DNA_methylase arCOG02636 AA cHG_010 RNA_methylase arCOG00910 AB cHG_040 SAM-methylase arCOG01792

131 Genes 2019, 10, 233 6 of 19

Genes 2019, 10, 233 Table 1. Cont. 6 of 19

Alpha Code Numerical Code TableAnnotated 1. Cont. arCOG Function $$ arCOG Number

AFAC cHG_019cHG_012 EndonucleaseRestrictionEndonuclease arCOG02782arCOG05724 AGAD cHG_020cHG_038 PredictedRestrictionEndonucleaseEndonuclease arCOG02781arCOG06431 AE cHG_015 HNH_endonuclease arCOG07787 AH cHG_004 HNH_endonuclease arCOG09398 AF cHG_019 Endonuclease arCOG02782 AIAG cHG_037cHG_020 HNH_nucleaseEndonuclease arCOG05223arCOG02781 AHAJ cHG_039cHG_004 HNH_nucleaseHNH_endonuclease arCOG03898arCOG09398 AKAI cHG_041cHG_037 HNH_nucleaseHNH_nucl ease arCOG08099arCOG05223 ALAJ cHG_046cHG_039 MBF1HNH_nuclease arCOG01863arCOG03898 AMAK cHG_028cHG_041 CBS_domainHNH_nuclease arCOG00608arCOG08099 ANAL cHG_005cHG_046 MarR MBF1 arCOG03182arCOG01863 AOAM cHG_030cHG_028 ParB-like nucleaseCBS_domain arCOG01875arCOG00608 APAN cHG_016cHG_005 GVPC MarR arCOG06392arCOG03182 AQAO cHG_002cHG_030 ASCH domainParB-like RNA bindin nucleaseg arCOG01734arCOG01875 AP cHG_016 GVPC arCOG06392 AR cHG_017 Uncharacterized arCOG10082 AQ cHG_002 ASCH domain RNA binding arCOG01734 ASAR cHG_026cHG_017 UncharacterizedUncharacterized arCOG13171arCOG10082 ATAS cHG_032cHG_026 UncharacterizedUncharacterized arCOG08946arCOG13171 AUAT cHG_043cHG_032 UncharacterizedUncharacterized arCOG08856arCOG08946 AVAU cHG_047cHG_043 UncharacterizedUncharacterized arCOG04588arCOG08856 $: AAV listing of associatedcHG_047 Gene Ontology terms and geUncharacterizedne family descriptions is availablearCOG04588 in Table S2. $$$:: AT_I listing and ofT_II associated denote Gene type Ontology I and type terms II restrictio and genen family enzymes, descriptions respectively. is available M, inR, Tableand S2.S denote$$: T_I andthe T_II denote type I and type II restriction enzymes, respectively. M, R, and S denote the methylase, restriction methylase, restriction endonuclease, and specificity subunits, respectively. endonuclease, and specificity subunits, respectively.

AB180 250

150 200

120 150

90 No. of taxa No. No. of genomesNo. 100

60

50 30

0 0 I J F P S T L E Z B C R U Y K A O H Q X D G N V M W AI AJ AT AP AS AF AE AL AU AC AV AR AB AG AQ AO AA mrr AH AK AD AN AM 0 1020304050 cHG Gene families FigureFigure 1. 1. DistributionDistribution of of collapsed collapsed Homologous Homologous Group Group (cHG) (cHG) among among haloarchaeal haloarchaeal genomes. genomes. (A) The (A) numberThe number of genomes of genomes present present in each in each collapsed collapsed Homologous Homologous Group Group (cHG). (cHG). No No cHG cHG contains contains a a representativerepresentative from from every every genome genome used used in in this this st study.udy. With With the the exception exception of of one one cHG, all contain membersmembers from from fewer fewer than than half half of of the genomes. Th Thee cHGs cHGs are are ordered ordered by by number number of of genomes genomes they they contain.contain. (B (B) )Rarefaction Rarefaction plot plot of of the the number number of of geno genomesmes represented as cHGs accumulate. A A 95% 95% confidenceconfidence interval is shownshown inin shadedshaded blueblue area area and and the the yellow yellow box box whisker whisker plots plots give give the the number number of oftaxa taxa from from random random subsamples subsamples (permutations (permutations = 100)= 100) over over 48 48 gene gene families. families.

The phylogeny of the class Halobacteria inferred from concatenated ribosomal proteins (Figure 2) The phylogeny of the class Halobacteria inferred from concatenated ribosomal proteins (Figure was largely comparable to prior work [74], and with a taxonomy based on concatenations of 2) was largely comparable to prior work [74], and with a taxonomy based on concatenations of conserved proteins [75,76]. For instance, in our phylogeny, the Halorubracaea group with the conserved proteins [75,76]. For instance, in our phylogeny, the Halorubracaea group with the Haloferacaceae Halobacteriaceae Haloarculaceae Haloferacaceae recapitulatingrecapitulating the the order order Halo Haloferacales,feracales, and and the the families, families, Halobacteriaceae,, Haloarculaceae,, Halococcaceae andand Halococcaceae,, groupgroup within within the the order order Halobacteriales. Halobacteriales. Our Our genome genome survey survey in search in search of RM-system of RM- Halorubrum systemgenes encompassedgenes encompassed a broad a taxonomic broad taxonomic sampling, sa andmpling, it explores and it inexplores depth the in genusdepth the genus Halorubrumbecause it is because a highly it speciatedis a highly genus, speciated and genus, because and the because existence the ofexistence many genomes of many genomes from the samefrom thespecies same allows species within allows species within distribution species distribution assessment. assessment. Comparison of the phylogeny in Figure 2 to the heatmap giving the presence/absence of RM system cHG candidates demonstrates that the cHG distribution is highly variable (Figure 2). The one glaring exception is cHG U, a DNA methylase found in 174 of the 217 genomes analyzed. Since it is not coupled with a restriction enzyme of equal abundance,132 it is assumed to be an orphan MTase. The

Genes 2019, 10, x FOR PEER REVIEW 7 of 19

MTase from Hfx. volcanii (gene HVO_0794), which recognizes the CTAG motif [45], is a member of this cHG. Though U is widely distributed, within the genus Halorubrum it is only found in ~37.5% (21/56) of the genomes. While U’s phylogenetic profile is compatible with vertical inheritance over much of the phylogeny, the presence absence data also indicate a few gene transfer and loss events within Halorubrum. cHG U is present in Hrr. tebenquichense DSM14210, Hrr. hochstenium ATCC700873, HrrGenes. sp.2019 AJ767,, 10, 233 and in strains from related species Hrr. distributum, Hrr. arcis, Hrr. litoreum, and7 Hrr. of 19 terrestre, suggesting an acquisition in the ancestor of this group.

Figure 2. Cont. Figure 2. Cont.

133 Genes 2019, 10, 233 8 of 19 Genes 2019, 10, x FOR PEER REVIEW 8 of 19

Figure 2. Figure 2. Presence–absencePresence–absence matrix matrix of of the the 48 48 candidat candidatee RMS RMS cHGs plotted against the reference phylogeny. For most cHGs the pattern of presence–absence does not match the reference phylogeny phylogeny. For most cHGs the pattern of presence–absence does not match the reference phylogeny (compare Figures S2–S5). RMS-candidate cHGs are loosely ordered by system type and with the (compare Figures S2–S5). RMS-candidate cHGs are loosely ordered by system type and with the ambiguously assigned RM candidates at the end. Table 1 gives a key relating the column names to the ambiguously assigned RM candidates at the end. Table 1 gives a key relating the column names to majority functional annotation. the majority functional annotation. Comparison of the phylogeny in Figure 2 to the heatmap giving the presence/absence of RM Instead of U, another orphan MTase is abundantly present in Halorubrum spp., cHG W. It was system cHG candidates demonstrates that the cHG distribution is highly variable (Figure 2). The one found in ~95% of all Halorubrum strains, with three exceptions—an assembled genome from the glaring exception is cHG U, a DNA methylase found in 174 of the 217 genomes analyzed. Since it metagenome sequence data and two from incomplete draft genomes of the species Halorubrum is not coupled with a restriction enzyme of equal abundance, it is assumed to be an orphan MTase. ezzemoulense. Interestingly, when U is present in a Halorubrum sp. genome, so too is W (Figure 2). In The MTase from Hfx. volcanii (gene HVO_0794), which recognizes the CTAG motif [45], is a member a complementary fashion, analysis of W outside of the genus Halorubrum shows that it is found of this cHG. Though U is widely distributed, within the genus Halorubrum it is only found in ~37.5% patchily distributed throughout the rest of the class Halobacteria (~20% −32/158), and always as a (21/56) of the genomes. While U’s phylogenetic profile is compatible with vertical inheritance over second orphan MTase with cHG U. When the members of cHG W were used to search the uniprot much of the phylogeny, the presence absence data also indicate a few gene transfer and loss events database, the significant matches included the E. coli Dam MTase, a very well-characterized GATC within Halorubrum. cHG U is present in Hrr. tebenquichense DSM14210, Hrr. hochstenium ATCC700873, MTase, which provides strong evidence that this cHG is a GATC orphan MTase family. The presence Hrr. sp. AJ767, and in strains from related species Hrr. distributum, Hrr. arcis, Hrr. litoreum, and and absence of cHG U and W in completely sequenced genomes is given in Table S3, together with Hrr. terrestre, suggesting an acquisition in the ancestor of this group. the frequency of the CTAG and GATC motifs in the main chromosome. The rest of the RM cHGs are much more patchily distributed (Figure 2). For instance, the cHGs that make up columns A–G represent different gene families within the Type I RM system 134 Genes 2019, 10, 233 9 of 19

Instead of U, another orphan MTase is abundantly present in Halorubrum spp., cHG W. It was found in ~95% of all Halorubrum strains, with three exceptions—an assembled genome from the metagenome sequence data and two from incomplete draft genomes of the species Halorubrum ezzemoulense. Interestingly, when U is present in a Halorubrum sp. genome, so too is W (Figure 2). In a complementary fashion, analysis of W outside of the genus Halorubrum shows that it is found patchily distributed throughout the rest of the class Halobacteria (~20% −32/158), and always as a second orphan MTase with cHG U. When the members of cHG W were used to search the uniprot database, the significant matches included the E. coli Dam MTase, a very well-characterized GATC MTase, which provides strong evidence that this cHG is a GATC orphan MTase family. The presence and absence of cHG U and W in completely sequenced genomes is given in Table S3, together with the frequency of the CTAG and GATC motifs in the main chromosome. The rest of the RM cHGs are much more patchily distributed (Figure 2). For instance, the cHGs that make up columns A–G represent different gene families within the Type I RM system classification: two MTases (A,B), three REases (C,D,E), and two site specificity units (SSUs) (F,G). Throughout the Haloarchaea, cHGs from columns A, E, and F, representing an MTase, an REase, and an SSU, respectively, are found co-occurring 35 times. In a subset of genomes studied for synteny, A, E, and F are encoded next to one another in Natrinema gari, Halorhabdus utahensis, Halorubrum SD690R, Halorubrum ezzemoulense G37, and Haloorientalis IM1011 (Figure 3). These genes probably represent a single transcriptional unit of genes working together for restriction and modification purposes. Since the Type I RM system is a five-component system, the likely stoichiometry is 2:2:1. These three cHGs co-occur four times within the species Halorubrum ezzemoulense, and two of these cHGs (A and E) co-occur an additional three more times, suggesting either a loss of the SSU, or an incomplete genome sequence for those strains. If it is due to incomplete sequencing, then 7/16 (43%) of the Hrr. ezzemoulense genomes have this set of co-occurring genes, while half do not have an identified Type I system. This is particularly stunning since strains FB21, Ec15, G37, and Ga2p were all cultivated at the same time from the same sample, a hypersaline lake in Iran. Furthermore, one strain—Ga36—has a different identified Type I RM system composed of substituted cHGs A and E with B and D, respectively, while maintaining the same SSU. This suggests the same DNA motif may be recognized by the different cHGs and that these cHGs are therefore functionally interchangeable. Members of cHGs B, F, and D were found as likely cotranscribed units in Halococcus salifodinae, Natronolimnobius aegyptiacus, Halorubrum kocurii, and Haloarcula amylolytica (Figure 3). In Halorubrum DL and Halovivax ruber XH70 genomes that contained members from cHGs A, B, D, E, and F, these genes were not found in a single unit, suggesting that they do not form a single RM system. Together, these analyses suggest this Type I RM system has a wide but sporadic distribution, that this RM system is not required for individual survival, and that functional substitutions occur for cHGs. Type II RM systems contain an MTase and an REase that target the same motif but do not require an associated SSU because each enzyme has its own TRD. The Type II RM system cHGs are in columns H-L for the MTases, and M-P for the REases. Memberships to the Type II MTase cHGs are far more numerous in the Haloarchaea than their REase counterpart, as might be expected when witnessing decaying RM systems through the loss of the REase. The opposite result—more REases—is a more difficult scenario because an unmethylated host genome would be subject to restriction by the remaining cognate REase (e.g., addiction cassettes). There are 14 “orphan” Type II REases in Figure 2, but their cognate MTase’s absence could be explained by incomplete genome sequence data. Type III RM systems have been identified in cHGs Q (MTase) and R and S (REases). Type III MTases and REases (cHGs Q and R) co-occur almost exclusively in the species Halorubrum ezzemoulense, our most highly represented taxon. Furthermore, these Type III RM systems are highly restricted in their distribution to that species, with cHGs co-occurring only twice more throughout the Haloarchaea, and with a different REase cHG (S); once in Halorubrum arcis and another in Halobacterium D1. Orphan MTases occurred twice in cHG Q. Of particular interest is that closely related strains also cultivated from Lake Bidgol in Iran but which are in a different but closely related Halorubrum species (e.g., Ea8,

135 Genes 2019, 10, 233 9 of 19 classification: two MTases (A,B), three REases (C,D,E), and two site specificity units (SSUs) (F,G). Throughout the Haloarchaea, cHGs from columns A, E, and F, representing an MTase, an REase, and an SSU, respectively, are found co-occurring 35 times. In a subset of genomes studied for synteny, A, E, and F are encoded next to one another in Natrinema gari, Halorhabdus utahensis, Halorubrum SD690R, Halorubrum ezzemoulense G37, and Haloorientalis IM1011 (Figure 3). These genes probably represent a single transcriptional unit of genes working together for restriction and modification purposes. Since the Type I RM system is a five-component system, the likely stoichiometry is 2:2:1. These three cHGs co-occur four times within the species Halorubrum ezzemoulense, and two of these cHGs (A and E) co- occur an additional three more times, suggesting either a loss of the SSU, or an incomplete genome sequence for those strains. If it is due to incomplete sequencing, then 7/16 (43%) of the Hrr. ezzemoulense genomes have this set of co-occurring genes, while half do not have an identified Type I system. This is particularly stunning since strains FB21, Ec15, G37, and Ga2p were all cultivated at the same time from the same sample, a hypersaline lake in Iran. Furthermore, one strain—Ga36—has a different identified Type I RM system composed of substituted cHGs A and E with B and D, respectively, while maintaining the same SSU. This suggests the same DNA motif may be recognized by the different cHGs and that these cHGs are therefore functionally interchangeable. Members of cHGs B, F, and D were found as likely cotranscribed units in Halococcus salifodinae, Natronolimnobius Genesaegyptiacus2019, 10,, Halorubrum 233 kocurii, and Haloarcula amylolytica (Figure 3). In Halorubrum DL and Halovivax10 of 19 ruber XH70 genomes that contained members from cHGs A, B, D, E, and F, these genes were not found in a single unit, suggesting that they do not form a single RM system. Together, these analyses IB24,suggest Hd13, this Ea1,Type and I RM Eb13) system do not has have a wide a Type but III spor RMadic system, distribution, implying thoughthat this exposed RM system to the is same not halophilicrequired for viruses, individual they dosurvival, not rely and on that this functional system for substitutions avoiding virus occur infection. for cHGs.

Figure 3. GeneGene maps maps for for syntenic syntenic clusters clusters of of gene gene families families (A (A) EFA) EFA and and (B ()B BFD) BFD found found in ina subset a subset of oforganisms organisms identified identified to tothe the right right of ofeach each map. map. Genes Genes are are colored colored by by gene gene families families with with Type Type I methylases (families A and B) in grays, Type I restriction endonucleases (DE) in blues, and Type I site specificityspecificity unit (F) in green.

MrrType is II a RM Type systems IV REase contain that was an MTase suggested and to an cleave REase methylated that target GATCthe same sites motif [77, 78but]. do Mrr not homologs require arean associated identified inSSU most becauseHaloferax eachsp., enzyme they have has a its sporadic own TRD. distribution The Type among II RM other systemHaloferacaceae cHGs areand in incolumns the Halobacteriaceae H-L for the MTases,and are and absent M-P in for the theNatrialbaceae REases. Memberships(Figure 2). cHGsto the Z-AVType areII MTase not sufficiently cHGs are characterizedfar more numerous to pinpoint in the theirHaloarchaea role in DNA than RMtheir systems REase counterpart, or as MTase. as These might cHGs be expected likely include when homingwitnessing endonucleases decaying RM or systems enzymes through modifying the nucleotidesloss of the REase. in RNA The molecules; opposite however,result—more their REases— function asis a orphan more difficult MTases scenario or restriction because endonucleases an unmethylated can, at host present, genome not bewould excluded. be subject to restriction by the remaining cognate REase (e.g., addiction cassettes). There are 14 “orphan” Type II REases in 3.2. Horizontal Gene Transfer Explains Patchy Distribution Figure 2, but their cognate MTase’s absence could be explained by incomplete genome sequence data. TheType patchy III RM appearance systems have of RM been system identified candidates in cHGs was Q further (MTase) investigated and R and by S plotting (REases). the Type Jaccard III distanceMTases ofand the REases presence–absence (cHGs Q dataand againstR) co-occur the alignment almost exclusively distance of thein referencethe species tree Halorubrum (Figure S2). If the presence–absence data followed vertical descent one would expect the best-fit line to move from the origin with a strong positive slope. Instead, the best fit line is close to horizontal with an r-squared value of 0.0047, indicating negligible relationship between the overall genome phylogeny and RM system complement per genome. The presence–absence clustering patterns were visualized by plotting a principle coordinate analysis (Figure S3). The high degree of overlap between the ranges of the three groups illustrates that there are few RM system genes unique to a given group and a large amount of overlap in repertoires. To further evaluate the lack of long-term vertical descent for RM system genes, a phylogeny was inferred from the presence–absence pattern of cHGs. The resultant tree (Figure S4) is largely in disagreement with the reference phylogeny. The bootstrap support set from the presence–absence phylogeny was mapped onto the ribosomal topology (Figure S5). The resulting support values demonstrate an extremely small degree of agreement between the two methods. The few areas where there is even 10% support are near the tips of the ribosomal phylogeny and correspond to parts of established groups, such as Haloferax, Natronobacterium, and Halorubrum. Internode Certainty (IC) scores are another way to compare phylogenies. An average IC score of 1 represents complete agreement between the two phylogenies, and score of −1 complete disagreement. The average IC scores for the reference tree using the support set from the F81 tree was −0.509, illustrating that the presence absence data do not support the topology of the reference phylogeny.

136 Genes 2019, 10, 233 11 of 19

The patchy distribution of the RM system candidate genes and their lack of conformity to the reference phylogeny suggests frequent horizontal gene transfer combined with gene loss events as the most probable explanation for the observed data. To quantify the amount of transfer, the TreeFix-Ranger pipeline was employed. TreeFix-DTL resolves poorly supported areas of gene trees to better match the concatenated ribosomal protein gene tree used as reference. Ranger-DTL resolves optimal gene tree rooting against the species tree and then computes a reconciliation estimating the number of duplications, transfers, and losses that best explains the data (Table 2). For almost every cHG with four or more taxa, our analysis infers several HGT events. Only cHG R, a putative Type III restriction enzyme found only in a group of closely related Halorubrum ezzemoulense strains, has not been inferred to undergo at least one transfer event.

Table 2. Important traits of cHGs with four or more open reading frames (ORFs).

Alpha (Numeric) No. of Predicted Recognition No. of Taxa Function b Frequency e cHG Transfers a Sites c GAAGGC 31% I (001) 16 9 T_II_M GGRCA 31% CANCATC 53% J (003) 38 21 T_II_M TAGGAG 21% GGCGCC 89% AH (004) 12 4 HNH_endonuclease GATC 11% GGAYNNNNNNTGG 24% F (006) 61 44 T_I_S CAGNNNNNNTGCT 16% R (008) 14 0 T_III_R NA d 100% AA (010) 55 15 RNA_methylase ATTAAT 33% GCAAGG 49% K (011) 137 97 T_II_M GKAAYG 28% GCGAA 29% AC (012) 8 5 Restriction Endonuclease CAACNNNNNTC 29% CTGGAG 29% GCAGG 45% T (014) 130 93 Adenine_DNA_methylase AAGCTT 32% GGCGCC 70% AE (015) 21 13 HNH_endonuclease YSCNS 15% AP (016) 12 6 GVPC CANCATC 83% AACNNNNNNGTGC 73% C (018) 7 4 T_I_R CTANNNNNNRTTC 27% AF (019) 4 3 Endonuclease NAd 100% GGAYNNNNNNTGG 37% A (021) 88 58 T_I_M GTCANNNNNNRTCA 12% CTCGAG 9% DNA_methylase CTAG 59% U (022) 290 120 CATTC 14% CCCGGG 7% O (023) 37 28 T_II_R NAd 100% GAGNNNNNNVTGAC 75% B (024) 16 8 T_I_M GACNNNNNNRTAC 19% GAGNNNNRTAA 75% G (025) 4 2 T_I_S GAGNNNNNTAC 25% V (027) 5 1 DNA_methylase CATTC 100% GATC 75% AO (030) 4 2 ParB-like_nuclease CTAG 25% GATC 70% W (031) 153 70 dam_methylase AB/SAAM 22% GCAAGG 43% AT (032) 116 60 Uncharacterized GKAAYG 26% GGTTAG 14% CAARCA 40% L (033) 66 38 T_II_M-033 CTGAAG 36% GCANNNNNRTTA 69% D (034) 16 11 T_I_R-034 GGCANNNNNNTTC 19%

137 Genes 2019, 10, 233 12 of 19

Table 2. Cont.

Alpha (Numeric) No. of b Predicted Recognition e No. of Taxa a Function c Frequency GenescHG 2019, 10, 233 Transfers Sites 12 of 19 X (035) 19 9 probable_RMS_M GGGAC 83% Table 2. Cont. CCWGG 42% H (036) 38 24 probable_T_II_M CCSGG 18% AK (041) 6 4 HNH_nuclease GTAC NA d 100%16% AI (037) 6 4 HNH_nuclease NA d 100% AJ (039)Q (042) 21 5 8 Adenine_DNA_methylase4 HNH_nuclease probable_T_III_M GGCGCC RGTAAT 71%100% d AK (041) 6 4 HNH_nuclease NANAd 19%100% Y (044) 179 110 dcm_methylaseAdenine_DNA_methylase RGTAAT CGGCCG 24%71% Q (042) 21 8 probable_T_III_M NAGTCGACd 13%19% CGGCCGACGT 11%24% Y (044) 179 110 dcm_methylase GTCGAC 13% E (045) 58 42 T_I_R CCCNNNNNRTTGYACGT 63%11% CCCNNNNNRTTGYGCANNNNNRTTA 28% 63% E (045) 58 42 T_I_R Z (048) 54 35 Adenine_DNA_methylase GCANNNNNRTTACCRGAG 36%28% CCRGAG 36% Z (048) 54 35 Adenine_DNA_methylase GTMKAC 30% GTMKAC 30% a Number of estimated horizontal gene transfer events, b T_I and T_II denote type I and type II a Number of estimated horizontal gene transfer events, b T_I and T_II denote type I and type II restriction enzyme, respectively.restriction M, R,enzyme, and S denote respectively. the methylase, M, R, and restriction S denote endonuclease, the methylase, and specificityrestriction subunits,endonuclease, respectively. and c Top predictedspecificity recognition subunits, sites respectively.d No predicted c Top recognitionpredicted recognition site e Frequency sites ofd No predictions predicted within recognition the cHG. site e Frequency of predictions within the cHG.

RM systemsRM systems usually usually function function as as cooperative cooperative units units [[9,19,48].9,19,48]. It It stands stands to to reason reason that that some some of the of the RM systemRM system candidates candidates may may be be transferred transferred as as units, units, maintainingmaintaining their their cognate cognate functionality. functionality. This This possibilitypossibility was was examined examined by by a correlation a correlation analysis. analysis. AA spearmanspearman correlation correlation was was made made between between all all pairspairs of cHGs. of cHGs. Those Those with with a significanta significan resultt result at at aa Bonferroni-correctedBonferroni-corrected p p< <0.05 0.05 were were plotted plotted in a in a correlogramcorrelogram (Figure (Figure4). As4). As illustrated illustrated in in Figure Figure3 3,, cHGs cHGs with significan significantt similar similar phylogenetic phylogenetic profiles profiles oftenoften are near are near to one to one another another in thein the genomes. genomes.

Figure 4. Heatmap of co-occurrence between the 48 RMS-candidate cHGs. Positive correlation indicates Figure 4. Heatmap of co-occurrence between the 48 RMS-candidate cHGs. Positive correlation the cHGs co-occur while negative indicates that the presence of one means the other will not be present. indicates the cHGs co-occur while negative indicates that the presence of one means the other will not Significance level is p < 0.05 with a Bonferroni correction applied for multiple tests. Blue indicates be present. Significance level is p < 0.05 with a Bonferroni correction applied for multiple tests. Blue significantindicates positive significant correlation; positive redcorrelation; indicates red a significantindicates a significant negative correlation.negative correlation.

138 Genes 2019, 10, 233 13 of 19

4. Discussion A striking result of our study is the irregular distribution of the RM system gene candidates throughout not just the haloarchaeal class, but also within its orders, genera, species, and even communities and populations. The patchy distribution is almost certainly the result of frequent HGT and gene loss. RM system genes are well known for their susceptibility to HGT and loss, and their presence almost never define a clade or an environmental source [36,79]. Frequent acquisition of RM system genes through HGT is illustrated by their sporadic distribution. For example, Halorubrum genomes encode many candidate RM system cHGs that are absent from the remainder of the Halobacteria (e.g., cHG M, R, S, AC, AG, and AM). Only one of these (cHG R) is found in more than three genomes, a Type III restriction protein found in 14 of 57 Halorubrum genomes. Mrr homologs have a sporadic distribution among Haloferacaceae and Halobacteriaceae and are absent in Natrialbaceae (Figure 2). Gene loss undoubtedly contributed to the sparse cHGs distribution; however, without invoking frequent gene transfer, many independent and parallel gene losses need to be postulated. We also observed that a number haloarchaeal species possess multiple Type I subunit genes, allowing for functional substitution of the different subunits in the RM system. The existence of multiple Type I subunits has also been observed in Helicobacter pylori, in which 4 different SSU loci are used by the organism’s Type I system to target different recognition sequences; these SSUs can even exchange TRDs, resulting in variation in the methylome of H. pylori [80–82]. In our results, however, we observed multiple MTase and REase subunits alongside a single SSU, suggesting the functional substitution of the subunits in these haloarchaeal organisms does not result in variation in detected recognition sequences. Mrr is a Type IV REase that cleaves methylated target sites. Studies have demonstrated that this gene reduces transformation efficiency of GATC-methylated plasmids in H. volcanii, and that deletion of the mrr gene increases transformation efficiency on GATC-methylated plasmids, suggesting that this Type IV REase can target GATC-methylated sites for cleavage [77,78]. However, we find no anticorrelation between the presence of Mrr homologs and members of cHG W, which is homologous to the E. coli Dam MTase, a very well-characterized GATC MTase (Figure 2; Figure 4). This suggests that some members of cHG W or the Mrr homologs either are dysfunctional of have a site specificity different from the GATC motif. It seems counterintuitive that RM systems are not more conserved as cellular countermeasures against commonly occurring viruses. It may be that cells do not require extensive protection via RM systems, because they use multiple defensive systems some of which might be more effective. For example, another well-known defense against viruses is the CRISPR-Cas system [83]. CRISPR recognizes short (~40 bp) regions of invading DNA that the host has been exposed to previously and degrades it. While it can be very useful against virus infection, our prior work indicated that CRISPR-Cas was also sporadically distributed within communities of closely related haloarchaeal species [84], indicating they are not required for surviving virus infection. Both the RM and CRISPR-Cas systems are only important countermeasures after external fortifications have failed to prevent a virus from infiltrating and, therefore, their limited distributions also indicate that the cell’s primary defense would be in preventing virus infection altogether, which is accomplished by different mechanisms. By altering surfaces via glycosylation, cells can avoid virus predation prior to infection. In Haloferax species, there are two pathways which control glycosylation of external features. One is relatively conserved and could have functions other than virus avoidance, while the other is highly variable and shows hallmarks of having genes mobilized by horizontal transfer [85]. At least one halovirus has been found to require glycosylation by its host in order to infect properly [86]. Comparison of genomes and metagenomes from hypersaline environments showed widespread evidence for distinct “genomic” islands in closely related halophiles [87] that contain a unique mixture of genes that contribute to altering the cell’s surface structure and virus docking opportunities. Thus, selective pressure on postinfection, cytosolic, and nucleic acid-based virus defenses is eased, allowing them to be lost randomly in populations.

139 Genes 2019, 10, 233 14 of 19

A major consideration in understanding RM system diversity is that viruses, or other infiltrating selfish genetic elements, might gain access to the host’s methylation after a successful infection that was not stopped by the restriction system. Indeed, haloviruses are known to encode DNA methyltransferases in their genomes [88]. In this case, RM systems having a limited within population distribution would then be an effective defense for that part of the population possessing a different RM system. Under this scenario, a large and diverse pool of mobilized RM system genes could offer a stronger defense for the population as a whole. A single successful infection would no longer endanger the entire group of potential hosts. Group selection may be invoked to explain the within population diversity of RM systems; a sparse distribution of RM systems may provide a potential benefit to the population as a whole, because a virus cannot easily infect all members of the population. However, often gene-level selection is a more appropriate alternative to group selection [89,90]. Under a gene centered explanation, RM systems are considered as selfish addiction cassettes that may be of little benefit to its carrier. While RM systems may be difficult to delete as a whole, stepwise deletion that begins with inactivation of the REase activity can lead to their loss from a lineage. Their long-term survival thus may be a balance of gain through gene transfer, persistence through addiction, and gene loss. This gene centered explanation is supported by a study from Seshasayee et al. [36], which examined the distribution of MTase genes in ~1000 bacterial genomes. They observed, similar to our results in the Halobacteria, that MTases associated with RM systems are poorly conserved, whereas orphan MTases share conservation patterns similar to average genes. They also demonstrated that many RM-associated and orphan MTases are horizontally acquired, and that a number of orphan MTases in bacterial genomes neighbor degraded REase genes, suggesting that they are the product of degraded RM systems that have lost functional REases [36]. Similarly, Kong et al. [79] studying genome content variation in Neisseria meningitidis, found an irregular distribution of RM systems, suggesting that these systems do not form an effective barrier to homologous recombination within the species. They also observed that the RM systems themselves had been frequently transferred within the species [79]. We conclude that RM genes in bacteria as well as archaea appear to undergo significant horizontal transfer and are not well-conserved. Only when these genes pick up additional functions do parts of these systems persist for longer periods of time, as exemplified in the distribution of orphan MTases. However, the transition from RM system MTase to orphan MTase is an infrequent event. A study of 43 pangenomes by Oliveira et al. [91] suggests that orphan MTases occur more frequently from transfer via large mobile genetic elements (MGEs) such as plasmids and phages rather than arise de novo from RM degradation. The distribution of orphan methylase cHG U and W, and their likely target motifs, CTAG and GATC, respectively, suggests different biological functions for these two methylases. The widespread conservation of the CTAG MTase family cHG U supports the findings of Blow et al. [37] who identified a well-conserved CTAG orphan MTase family in the Halobacteria. Similar to other bacterial and archaeal genomes [92], the CTAG motif—the likely target for methylases in cHG U—is underrepresented in all haloarchaeal genomes (Table S3). The low frequency of occurrence, only about once per 4000 nucleotides, suggests that this motif and the cognate orphan methylase are not significantly involved in facilitating mismatch repair. The underrepresented CTAG motif was found to be less underrepresented near rRNA genes [92] and on plasmids; the CTAG motif is also a known target sequence for some Insertion Sequence (IS) elements [93] and it may be involved in repressor binding, where the CTAG motif was found to be associated with kinks in the DNA when bound to the repressor [94,95]. Interestingly, CTAG and GATC motifs are either absent or underrepresented in several haloarchaeal viruses [88,96,97]. Both the presence of the cHG U methylase and the underrepresentation of the CTAG motif appear to be maintained by selection; however, at present, the reasons for the underrepresentation of the motif in chromosomal DNA, and the role that the methylation of this motif may play remain open questions.

140 Genes 2019, 10, 233 15 of 19

5. Conclusions RM systems have a sporadic distribution in Halobacteria, even within species and populations. In contrast, orphan methylases are more persistent in lineages, and the targeted motifs are under selection for lower (in case of CTAG) or higher (in case of GATC) than expected frequency. In the case of the GATC motif, the cognate orphan MTase was found only in genomes where this motif occurs with high frequency.

Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/3/233/s1, Figure S1: Workflow of RMS-candidate gene search strategy, Figure S2: Plot of alignment distance as a function of presence–absence distance, Figure S3: PCoA plot of the distances between the RMS presence–absence profiles of the 217 analyzed Halobacterial genomes, Figure S4: Maximum-likelihood phylogeny of cHG presence–absence matrix, Figure S5: Bootstrap support values of the presence–absence phylogeny mapped onto the ribosomal protein reference tree, Table S1: Basic statistics for Halobacteriaceae complete and draft genomes, Table S2: Gene Ontology (GO) terms for each collapsed homologous group, Table S3: Distribution of orphan methylases cHGs U and W and frequency of their putative recognition motifs in completely sequenced halobacterial chromosomes. Author Contributions: Conceptualization, T.R.P. and J.P.G.; Data Curation, M.S.F., M.O., and A.S.L.; Formal Analysis, M.S.F., M.O., and A.S.L.; Funding Acquisition, T.R.P. and J.P.G.; Investigation, M.S.F. and M.O.; Methodology, M.S.F., A.S.L., and J.P.G.; Project Administration, T.R.P. and J.P.G.; Software, M.S.F. and A.S.L.; Supervision, T.R.P. and J.P.G.; Validation, M.S.F.; Visualization, M.S.F. and A.S.L.; Writing—Original Draft, M.S.F. and Johann Peter Gogarten; Writing—Review & Editing, M.O., AS.L., T.R.P., and J.P. Funding: This work was supported through grants from the Binational Science Foundation (BSF 2013061), the National Science Foundation (NSF/MCB 1716046) within the BSF-NSF joint research program, and NASA exobiology (NNX15AM09G, and 80NSSC18K1533). Acknowledgments: The Computational Biology Core, the Institute for Systems Genomics, and the University of Connecticut provided computational resources. The authors thank the anonymous reviewer for pointing out the presence of Mrr in Haloferax volcanii. Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Korlach, J.; Turner, S.W. Going beyond five bases in DNA sequencing. Curr. Opin. Struct. Biol. 2012, 22, 251–261. [CrossRef] 2. Bheemanaik, S.; Reddy, Y.V.R.; Rao, D.N. Structure, function and mechanism of exocyclic DNA methyltransferases. Biochem. J. 2006, 399, 177–190. [CrossRef] 3. Malone, T.; Blumenthal, R.M.; Cheng, X. Structure-guided analysis reveals nine sequence motifs conserved among DNA mmino-methyl-transferases, and suggests a catalytic mechanism for these enzymes. J. Mol. Biol. 1995, 253, 618–632. [CrossRef] 4. Bujnicki, J.M.; Radlinska, M. Molecular evolution of DNA-(cytosine-N4) methyltransferases: evidence for their polyphyletic origin. Nucleic Acids Res. 1999, 27, 4501–4509. [CrossRef] 5. Bujnicki, J.M. Sequence permutations in the molecular evolution of DNA methyltransferases. BMC Evol. Biol. 2002, 2, 3. [CrossRef] 6. Tock, M.R.; Dryden, D.T. The biology of restriction and anti-restriction. Curr. Opin. Microbiol. 2005, 8, 466–472. [CrossRef] 7. Vasu, K.; Nagaraja, V. Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol. Mol. Biol. Rev. 2013, 77, 53–72. [CrossRef][PubMed] 8. Pleška, M.; Qian, L.; Okura, R.; Bergmiller, T.; Wakamoto, Y.; Kussell, E.; Guet, C.C. Bacterial autoimmunity due to a restriction-modification system. Curr. Biol. CB 2016, 26, 404–409. [CrossRef] 9. Ohno, S.; Handa, N.; Watanabe-Matsui, M.; Takahashi, N.; Kobayashi, I. Maintenance forced by a restriction-modification system can be modulated by a region in its modification enzyme not essential for methyltransferase activity. J. Bacteriol. 2008, 190, 2039–2049. [CrossRef][PubMed] 10. Kobayashi, I. Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res. 2001, 29, 3742–3756. [CrossRef]

141 Genes 2019, 10, 233 16 of 19

11. Budroni, S.; Siena, E.; Hotopp, J.C.D.; Seib, K.L.; Serruto, D.; Nofroni, C.; Comanducci, M.; Riley, D.R.; Daugherty, S.C.; Angiuoli, S.V.; Covacci, A.; Pizza, M.; Rappuoli, R.; Moxon, E.R.; Tettelin, H.; Medini, D. Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc. Natl. Acad. Sci. USA 2011, 108, 4494–4499. [CrossRef] 12. Erwin, A.L.; Sandstedt, S.A.; Bonthuis, P.J.; Geelhood, J.L.; Nelson, K.L.; Unrath, W.C.T.; Diggle, M.A.; Theodore, M.J.; Pleatman, C.R.; Mothershed, E.A.; Sacchi, C.T.; Mayer, L.W.; Gilsdorf, J.R.; Smith, A.L. Analysis of genetic relatedness of Haemophilus influenzae isolates by multilocus sequence typing. J. Bacteriol. 2008, 190, 1473–1483. [CrossRef] 13. Roer, L.; Aarestrup, F.M.; Hasman, H. The EcoKI type I restriction-modification system in Escherichia coli affects but Is not an absolute barrier for conjugation. J. Bacteriol. 2015, 197, 337–342. [CrossRef] 14. McKane, M.; Milkman, R. Transduction, restriction and recombination patterns in Escherichia coli. Genetics 1995, 139, 35–43. [PubMed] 15. Chang, S.; Cohen, S.N. In vivo site-specific genetic recombination promoted by the EcoRI restriction endonuclease. Proc. Natl. Acad. Sci. USA 1977, 74, 4811–4815. [CrossRef][PubMed] 16. Lin, E.A.; Zhang, X.-S.; Levine, S.M.; Gill, S.R.; Falush, D.; Blaser, M.J. Natural transformation of Helicobacter pylori involves the Iintegration of short DNA fragments interrupted by gaps of variable size. PLoS Pathog. 2009, 5, e1000337. [CrossRef][PubMed] 17. Muller, H.J. The relation of recombination to mutational advance. Mutat. Res. 1964, 1, 2–9. [CrossRef] 18. Ershova, A.S.; Rusinov, I.S.; Spirin, S.A.; Karyagina, A.S.; Alexeevski, A.V. Role of restriction-modification systems in prokaryotic evolution and ecology. Biochemistry 2015, 80, 1373–1386. [CrossRef] 19. Roberts, R.J.; Belfort, M.; Bestor, T.; Bhagwat, A.S.; Bickle, T.A.; Bitinaite, J.; Blumenthal, R.M.; Degtyarev, S.K.; Dryden, D.T.F.; Dybvig, K.; et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 2003, 31, 1805–1812. [CrossRef] 20. Loenen, W.A.M.; Dryden, D.T.F.; Raleigh, E.A.; Wilson, G.G. Type I restriction enzymes and their relatives. Nucleic Acids Res. 2014, 42, 20–44. [CrossRef] 21. Liu, Y.-P.; Tang, Q.; Zhang, J.-Z.; Tian, L.-F.; Gao, P.; Yan, X.-X. Structural basis underlying complex assembly and conformational transition of the type I R-M system. Proc. Natl. Acad. Sci. USA 2017, 114, 11151–11156. [CrossRef][PubMed] 22. Pingoud, A.; Wilson, G.G.; Wende, W. Type II restriction endonucleases—A historical perspective and more. Nucleic Acids Res. 2014, 42, 7489–7527. [CrossRef] 23. Morgan, R.D.; Bhatia, T.K.; Lovasco, L.; Davis, T.B. MmeI: A minimal Type II restriction-modification system that only modifies one DNA strand for host protection. Nucleic Acids Res. 2008, 36, 6558–6570. [CrossRef] [PubMed] 24. Rao, D.N.; Dryden, D.T.F.; Bheemanaik, S. Type III restriction-modification enzymes: A historical perspective. Nucleic Acids Res. 2014, 42, 45–55. [CrossRef][PubMed] 25. Czapinska, H.; Kowalska, M.; Zagorskaite,˙ E.; Manakova, E.; Slyvka, A.; Xu, S.; Siksnys, V.; Sasnauskas, G.; Bochtler, M. Activity and structure of EcoKMcrA. Nucleic Acids Res. 2018, 46, 9829–9841. [CrossRef] 26. Adhikari, S.; Curtis, P.D. DNA methyltransferases and epigenetic regulation in bacteria. FEMS Microbiol. Rev. 2016, 40, 575–591. [CrossRef][PubMed] 27. Messer, W.; Bellekes, U.; Lother, H. Effect of dam methylation on the activity of the E. coli replication origin, oriC. EMBO J. 1985, 4, 1327–1332. [CrossRef][PubMed] 28. Kang, S.; Lee, H.; Han, J.S.; Hwang, D.S. Interaction of SeqA and Dam methylase on the hemimethylated origin of Escherichia coli chromosomal DNA replication. J. Biol. Chem. 1999, 274, 11463–11468. [CrossRef] 29. Sanchez-Romero, M.A.; Busby, S.J.W.; Dyer, N.P.; Ott, S.; Millard, A.D.; Grainger, D.C. Dynamic distribution of SeqA protein across the chromosome of Escherichia coli K-12. MBio 2010, 1.[CrossRef] 30. Welsh, K.M.; Lu, A.L.; Clark, S.; Modrich, P. Isolation and characterization of the Escherichia coli mutH gene product. J. Biol. Chem. 1987, 262, 15624–15629. [PubMed] 31. Au, K.G.; Welsh, K.; Modrich, P. Initiation of methyl-directed mismatch repair. J. Biol. Chem. 1992, 267, 12142–12148. 32. Putnam, C.D. Evolution of the methyl directed mismatch repair system in Escherichia coli. DNA Repair (Amst). 2016, 38, 32–41. [CrossRef] 33. Zweiger, G.; Marczynski, G.; Shapiro, L. A Caulobacter DNA methyltransferase that functions only in the predivisional cell. J. Mol. Biol. 1994, 235, 472–485. [CrossRef]

142 Genes 2019, 10, 233 17 of 19

34. Domian, I.J.; Reisenauer, A.; Shapiro, L. Feedback control of a master bacterial cell-cycle regulator. Proc. Natl. Acad. Sci. USA 1999, 96, 6648–6653. [CrossRef][PubMed] 35. Gonzalez, D.; Kozdon, J.B.; McAdams, H.H.; Shapiro, L.; Collier, J. The functions of DNA methylation by CcrM in Caulobacter crescentus: A global approach. Nucleic Acids Res. 2014, 42, 3720–3735. [CrossRef] 36. Seshasayee, A.S.N.; Singh, P.; Krishna, S. Context-dependent conservation of DNA methyltransferases in bacteria. Nucleic Acids Res. 2012, 40, 7066–7073. [CrossRef][PubMed] 37. Blow, M.J.; Clark, T.A.; Daum, C.G.; Deutschbauer, A.M.; Fomenkov, A.; Fries, R.; Froula, J.; Kang, D.D.; Malmstrom, R.R.; Morgan, R.D.; et al. The epigenomic landscape of prokaryotes. PLOS Genet. 2016, 12, e1005854. [CrossRef][PubMed] 38. Nölling, J.; de Vos, W.M. Identification of the CTAG-recognizing restriction-modification systems MthZI and MthFI from Methanobacterium thermoformicicum and characterization of the plasmid-encoded mthZIM gene. Nucleic Acids Res. 1992, 20, 5047–5052. [CrossRef][PubMed] 39. Grogan, D.W. Cytosine methylation by the SuaI restriction-modification system: Implications for genetic fidelity in a hyperthermophilic archaeon. J. Bacterio. 2003, 185, 4657–4661. [CrossRef] 40. Ishikawa, K.; Watanabe, M.; Kuroita, T.; Uchiyama, I.; Bujnicki, J.M.; Kawakami, B.; Tanokura, M.; Kobayashi, I. Discovery of a novel restriction endonuclease by genome comparison and application of a wheat-germ-based cell-free translation assay: PabI (5’-GTA/C) from the hyperthermophilic archaeon Pyrococcus abyssi. Nucleic Acids Res. 2005, 33, e112. [CrossRef] 41. Watanabe, M.; Yuzawa, H.; Handa, N.; Kobayashi, I. Hyperthermophilic DNA methyltransferase M.PabI from the archaeon Pyrococcus abyssi. Appl. Environ. Microbiol. 2006, 72, 5367–5375. [CrossRef][PubMed] 42. Couturier, M.; Lindås, A.-C. The DNA methylome of the hyperthermoacidophilic crenarchaeon Sulfolobus acidocaldarius. Front. Microbiol. 2018, 9, 137. [CrossRef][PubMed] 43. Chimileski, S.; Dolas, K.; Naor, A.; Gophna, U.; Papke, R.T. Extracellular DNA metabolism in Haloferax volcanii. Front. Microbiol. 2014, 5, 57. [CrossRef][PubMed] 44. Zerulla, K.; Chimileski, S.; Näther, D.; Gophna, U.; Papke, R.T.; Soppa, J. DNA as a phosphate storage polymer and the alternative advantages of polyploidy for growth or survival. PLoS ONE 2014, 9, e94819. [CrossRef] 45. Ouellette, M.; Gogarten, J.; Lajoie, J.; Makkay, A.; Papke, R. Characterizing the DNA methyltransferases of Haloferax volcanii via bioinformatics, gene deletion, and SMRT sequencing. Genes 2018, 9, 129. [CrossRef] 46. Ouellette, M.; Jackson, L.; Chimileski, S.; Papke, R.T. Genome-wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/E)XK nuclease family protein HVO_A0006. Front. Microbiol. 2015, 6, 251. [CrossRef] 47. Parks, D.H.; Imelfort, M.; Skennerton, C.T.; Hugenholtz, P.; Tyson, G.W. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015, gr-186072. [CrossRef][PubMed] 48. Roberts, R.J.; Macelis, D. REBASE—Restriction enzymes and methylases. Nucleic Acids Res. 2001, 29, 268–269. [CrossRef] 49. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE—A database for DNA restriction and modification: Enzymes, genes and genomes. Nucleic Acids Res. 2015, 43, D298–D299. [CrossRef][PubMed] 50. Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26, 2460–2461. [CrossRef][PubMed] 51. Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004, 32, 1792–1797. [CrossRef][PubMed] 52. HMMER. Available online: http://hmmer.org/ (accessed on 18 March 2019). 53. Analyses for RMS paper. Available online: https://github.com/Gogarten-Lab/rms_analysis. 54. Makarova, K.S.; Wolf, Y.I.; Koonin, E.V. Archaeal clusters of orthologous genes (arCOGs): An update and application for analysis of shared features between Thermococcales, Methanococcales, and Methanobacteriales. Life (Basel, Switzerland) 2015, 5, 818–840. [CrossRef][PubMed] 55. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [CrossRef][PubMed] 56. Csardi, G.; Nepusz, T. The igraph software package for complex network research. InterJournal Complex Syst. 2006, 1695.

143 Genes 2019, 10, 233 18 of 19

57. Caspi, R.; Altman, T.; Dale, J.M.; Dreher, K.; Fulcher, C.A.; Gilham, F.; Kaipa, P.; Karthikeyan, A.S.; Kothari, A.; Krummenacker, M.; Latendresse, M.; Mueller, L.A.; Paley, S.; Popescu, L.; Pujar, A.; Shearer, A.G.; Zhang, P.; Karp, P.D. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2010, 38, D473–D479. [CrossRef][PubMed] 58. Hoang, D.T.; Chernomor, O.; von Haeseler, A.; Minh, B.Q.; Le, S.V. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol..[CrossRef] 59. Nguyen, L.-T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [CrossRef][PubMed] 60. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. [CrossRef][PubMed] 61. Bansal, M.S.; Wu, Y.-C.; Alm, E.J.; Kellis, M. Improved gene tree error correction in the presence of horizontal gene transfer. Bioinformatics 2015, 31, 1211–1218. [CrossRef] 62. Bansal, M.S.; Kellis, M.; Kordi, M.; Kundu, S. RANGER-DTL 2.0: Rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics 2018, 34, 3214–3216. [CrossRef] 63. Yu, G.; Smith, D.K.; Zhu, H.; Guan, Y.; Lam, T.T.-Y. GGTREE: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 2017, 8, 28–36. [CrossRef] 64. Yu, G.; Lam, T.T.-Y.; Zhu, H.; Guan, Y. Two methods formapping and visualizing associated data on phylogeny using ggtree. Mol. Biol. Evol. 2018, 35, 3041–3043. [CrossRef] 65. Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 2003, 14, 927–930. [CrossRef] 66. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis; Springer: Berlin/Heidelberg, 2016; ISBN 3319242776. 67. Hmisc - Main - Vanderbilt Biostatistics Wiki Available online:. Available online: http://biostat.mc.vanderbilt. edu/wiki/Main/Hmisc (accessed on 18 March 2019). 68. Schliep, K.P. phangorn: phylogenetic analysis in R. Bioinformatics 2011, 27, 592–593. [CrossRef] 69. Salichos, L.; Rokas, A. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 2013, 497, 327–331. [CrossRef] 70. Drost, H. Philentropy: Information Theory and Distance Quantification with R. J. Open Source Softw. 2018, 3. [CrossRef] 71. Gogarten, J.P. Perl script to measure frequency and distribution of CTAG and GATC motifs in DNA Available online:. Available online: https://github.com/Gogarten-Lab/CTAG-GATC-frequencies (accessed on 12 January 2019). 72. Conesa, A.; Götz, S.; García-Gómez, J.M.; Terol, J.; Talón, M.; Robles, M. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21, 3674–3676. [CrossRef][PubMed] 73. Consortium, U. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 2016, 45, D158–D169. 74. Soucy, S.M.; Fullmer, M.S.; Papke, R.T.; Gogarten, J.P. Inteins as indicators of gene flow in the halobacteria. Front. Microbiol. 2014, 5, 1–14. [CrossRef] 75. Gupta, R.S.; Naushad, S.; Fabros, R.; Adeolu, M. A phylogenomic reappraisal of family-level divisions within the class Halobacteria: Proposal to divide the order Halobacteriales into the families Halobacteriaceae, Haloarculaceae fam. nov., and Halococcaceae fam. nov., and the order Haloferacales into th. Antonie Van Leeuwenhoek 2016, 109, 565–587. [CrossRef] 76. Gupta, R.S.; Naushad, S.; Baker, S. Phylogenomic analyses and molecular signatures for the class Halobacteria and its two major clades: A proposal for division of the class Halobacteria into an emended order Halobacteriales and two new orders, Haloferacales ord. nov. and Natrialbales ord. n. Int. J. Syst. Evol. Microbiol. 2015, 65, 1050–1069. [CrossRef][PubMed] 77. Allers, T.; Barak, S.; Liddell, S.; Wardell, K.; Mevarech, M. Improved strains and plasmid vectors for conditional overexpression of His-tagged proteins in Haloferax volcanii. Appl. Environ. Microbiol. 2010, 76, 1759–1769. [CrossRef] 78. Holmes, M.L.; Nuttall, S.D.; Dyall-Smith, M.L. Construction and use of halobacterial shuttle vectors and further studies on Haloferax DNA gyrase. J. Bacteriol. 1991, 173, 3807–3813. [CrossRef][PubMed] 79. Kong, Y.; Ma, J.H.; Warren, K.; Tsang, R.S.W.; Low, D.E.; Jamieson, F.B.; Alexander, D.C.; Hao, W. Homologous recombination drives both sequence diversity and gene content variation in Neisseria meningitidis. Genome Biol. Evol. 2013, 5, 1611–1627. [CrossRef][PubMed]

144 Genes 2019, 10, 233 19 of 19

80. Furuta, Y.; Namba-Fukuyo, H.; Shibata, T.F.; Nishiyama, T.; Shigenobu, S.; Suzuki, Y.; Sugano, S.; Hasebe, M.; Kobayashi, I. Methylome diversification through changes in DNA methyltransferase sequence specificity. PLoS Genet. 2014, 10, e1004272. [CrossRef] 81. Furuta, Y.; Kobayashi, I. Mobility of DNA sequence recognition domains in DNA methyltransferases suggests epigenetics-driven adaptive evolution. Mob. Genet. Elements 2012, 2, 292–296. [CrossRef] 82. Furuta, Y.; Kawai, M.; Uchiyama, I.; Kobayashi, I. Domain movement within a gene: A novel evolutionary mechanism for protein diversification. PLoS ONE 2011, 6, e18819. [CrossRef] 83. Gophna, U.; Brodt, A. CRISPR/Cas systems in archaea. Mob. Genet. Elements 2012, 2, 63–64. [CrossRef] 84. Fullmer, M.S.; Soucy, S.M.; Swithers, K.S.; Makkay, A.M.; Wheeler, R.; Ventosa, A.; Gogarten, J.P.; Papke, R.T. Population and genomic analysis of the genus Halorubrum. Front. Microbiol. 2014, 5, 140. [CrossRef] [PubMed] 85. Shalev, Y.; Soucy, S.; Papke, R.; Gogarten, J.; Eichler, J.; Gophna, U. Comparative analysis of surface layer glycoproteins and genes involved in protein glycosylation in the genus Haloferax. Genes 2018, 9, 172. [CrossRef] 86. Kandiba, L.; Aitio, O.; Helin, J.; Guan, Z.; Permi, P.; Bamford, D.H.; Eichler, J.; Roine, E. Diversity in prokaryotic glycosylation: an archaeal-derived N-linked glycan contains legionaminic acid. Mol. Microbiol. 2012, 84, 578–593. [CrossRef][PubMed] 87. Rodriguez-Valera, F.; Martin-Cuadrado, A.-B.; Rodriguez-Brito, B.; Paši´c,L.; Thingstad, T.F.; Rohwer, F.; Mira, A. Explaining microbial population genomics through phage predation. Nat. Rev. Microbiol. 2009, 7, 828–836. [CrossRef][PubMed] 88. Tang, S.-L.; Nuttall, S.; Ngui, K.; Fisher, C.; Lopez, P.; Dyall-Smith, M. HF2: A double-stranded DNA tailed haloarchaeal virus with a mosaic genome. Mol. Microbiol. 2002, 44, 283–296. [CrossRef][PubMed] 89. Olendzenski, L.; Gogarten, J.P. Evolution of genes and organisms: The tree/web of life in light of horizontal gene transfer. Ann. N. Y. Acad. Sci. 2009, 1178, 137–145. [CrossRef] 90. Naor, A.; Altman-Price, N.; Soucy, S.M.; Green, A.G.; Mitiagin, Y.; Turgeman-Grott, I.; Davidovich, N.; Gogarten, J.P.; Gophna, U. Impact of a homing intein on recombination frequency and organismal fitness. Proc. Natl. Acad. Sci. USA 2016, 113, E4654-61. [CrossRef] 91. Oliveira, P.H.; Touchon, M.; Rocha, E.P.C. The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res. 2014, 42, 10618–10631. [CrossRef][PubMed] 92. Karlin, S.; Mrázek, J.; Campbell, A.M. Compositional biases of bacterial genomes and evolutionary implications. J. Bacteriol. 1997, 179, 3899–3913. [CrossRef][PubMed] 93. Fournier, P.; Paulus, F.; Otten, L. IS870 requires a 5’-CTAG-3’ target sequence to generate the stop codon for its large ORF1. J. Bacteriol. 1993, 175, 3151–3160. [CrossRef] 94. Otwinowski, Z.; Schevitz, R.W.; Zhang, R.-G.; Lawson, C.L.; Joachimiak, A.; Marmorstein, R.Q.; Luisi, B.F.; Sigler, P.B. Crystal structure of trp represser/operator complex at atomic resolution. Nature 1988, 335, 321–329. [CrossRef][PubMed] 95. Burge, C.; Campbell, A.M.; Karlin, S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 1992, 89, 1358–1362. [CrossRef] 96. Bath, C.; Cukalac, T.; Porter, K.; Dyall-Smith, M.L. His1 and His2 are distantly related, spindle-shaped haloviruses belonging to the novel virus group, Salterprovirus. Virology 2006, 350, 228–239. [CrossRef] [PubMed] 97. Porter, K.; Tang, S.-L.; Chen, C.-P.; Chiang, P.-W.; Hong, M.-J.; Dyall-Smith, M. PH1: An archaeovirus of Haloarcula hispanica related to SH1 and HHIV-2. Archaea 2013, 2013, 456318. [CrossRef][PubMed]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

145

Chapter 7. Conclusion

This dissertation has provided new insight into DNA methylation and restriction-

modification (RM) systems in the Halobacteria. In Chapter 3, the methylome of the model

halobacterial species Haloferax volcanii was characterized, and it was observed that there are

m6 m4 two motifs methylated throughout the genome: GCA BN6VTGC and C TAG. In Chapter 4,

the DNA methyltransferases (MTases) responsible for methylating these motifs were identified:

m6 Type I RM system RmeRMS methylates GCA BN6VTGC, and orphan MTase HVO_0794

methylates Cm4TAG. However, the products of three other genes in H. volcanii (HVO_C0040,

HVO_A0079, and HVO_A0237) were observed to share homology with MTases, even though

they do not contribute to the methylome. The reason for the lack of methylation activity from

these putative MTase genes is currently unknown. In the case of HVO_A0237, it is likely that it

became inactive due to a genome rearrangement event that split a former Type IIG gene into

HVO_A0237 with the MTase domain and HVO_A0006 with the restriction endonuclease (REase)

domain. It is possible that combining the two split genes back together could restore functional

activity, including MTase activity, to the enzyme. Regarding HVO_C0040 and HVO_A0079, it is possible that these genes are no longer being expressed properly, either because of a lack of mRNA expression or because no functional protein product is being produced. In a preliminary analysis, the genes HVO_A0079, HVO_A0006, and HVO_A0237 were also expressed on a plasmid with their natural promoters in the H. volcanii ΔRM mutant and analyzed using SMRT

sequencing. However, no effect was observed on DNA methylation, indicating that these genes

are not being properly expressed. Future studies examining the expression of these genes,

including transcription of mRNA and translation of protein products, would help elucidate the

reason these genes lack function. Nevertheless, the presence of nonfunctional RM genes in H.

146

volcanii is not an unusual observation, as nonfunctional RM genes have been observed in other

organisms, such as those belonging to the MmeI gene family (Morgan et al., 2009). The presence of these nonfunctional RM genes in H. volcanii, and the fact that they are flanked by transposases, provides evidence of MTase and RM system genes acting as mobile genetic elements, similar to other RM genes (Furuta et al., 2010), which may have been transferred into

H. volcanii and are currently under degradation.

Chapter 5 provided evidence that RM systems limit mating in H. volcanii, thus acting as

potential barriers to recombination in the Halobacteria. In Chapter 6, a survey of MTase and RM

genes indicated that genes associated with RM systems are patchily distributed in the

Halobacteria, whereas orphan MTase gene families are more well-conserved. Using these

observations, as well as those from research into RM systems in bacterial organisms, a model

can be constructed which outlines the putative life cycle of RM systems in the Halobacteria as

akin to selfish genetic elements (Figure 1). In this model, an RM system gene cassette is acquired

by a host organism either via recombination or as part of a mobile genetic element such as a

plasmid (Kobayashi, 2001). Recombination is one possible explanation for how H. volcanii

acquired RM genes such as HVO_A0006, HVO_A0079, and HVO_A0237, which are flanked by

recombination-related genes such as transposases and integrases. Once the cassette is obtained, it

will persist in the host through gene “addiction,” with any loss of the RM system cassette

resulting in cell death via post-segregational killing (Kobayashi, 2001; Mochizuki et al., 2006).

While the host possesses the RM system, it will be limited in recombination and gene exchange

with other cells which do not have compatible RM systems, since the RM system will cleave

DNA that is not properly methylated (Budroni et al., 2011). This low barrier to recombination could contribute to speciation and diversification within a halobacterial population alongside

147

other weak barriers to recombination (Papke et al., 2007; Fullmer et al., 2014). However, the

RM systems may also facilitate recombination of certain DNA fragments, and may limit the size

of recombinant fragments (Chang and Cohen, 1977; Lin et al., 2009). Over time, the REase or

RM gene could become lost or non-functional due to genome rearrangement which disrupts the

gene or the promoter, either resulting in loss of the entire RM gene or leaving only the MTase.

The remaining MTase would then either slowly acquire important cellular functions in the host

over time and be maintained as an orphan MTase, or selection would favor its degradation over

time, resulting in the loss of the entire RM system.

Figure 1. Model of RM system life cycle in the Halobacteria. Restriction endonuclease (REase) genes are represented in red, and DNA methyltransferase (MTase) genes are represented in blue.

Illustration made in BioRender (https://app.biorender.com/).

The gradual decay and loss of a gene over time has been observed to occur in other

selfish genetic elements such as inteins, where the homing endonuclease region will decay over

time when there is no selective pressure to maintain it (Goddard and Burt, 1999; Gogarten et al., 148

2002; Soucy et al., 2014; Naor et al., 2016). However, the evidence in H. volcanii suggests that

genome rearrangement events and disruptions in the promotor are more likely to contribute to

RM decay in the Halobacteria. The continual acquisition and gradual loss of RM systems via

genome rearrangement proposed by this model would explain the patchy distribution of RM

systems observed in the Halobacteria, in which the MTase was frequently observed to be present

without a cognate REase. The degradation of RM systems would also provide an explanation for

the lack of MTase activity from HVO_C0040, HVO_A0079, and HVO_A0237, as examples of

RM systems and MTases in the process of decay. The observation of a split between

HVO_A0006 and HVO_A0237 provides a specific example of how genome rearrangement can

disrupt an RM gene cassette and result in its degradation. The evidence that many annotated

MTase genes in H. volcanii do not contribute to the methylome also suggests that many other

annotated MTase genes in other Halobacteria are also non-functional and may have been

disrupted in a similar manner.

This model would predict that the CTAG and GATC orphan MTase families identified in

Chapter 6 have some role in regulating physiological functions in the Halobacteria. However, the roles of these orphan MTase families remains unclear. Orphan MTases have been observed to be involved in regulating the timing of DNA replication and methyl-directed mismatch repair, as well as regulating the expression of various genes (Adhikari and Curtis, 2016). The GATC and

CTAG families could have similar functions, although the low frequency of the CTAG motif in halobacterial genomes suggests it might not have a role in mismatch repair. Future studies into the distribution of GATC and CTAG sites in halobacteiral genomes could help determine which genes might be regulated by these orphan MTases, and thus elucidate which roles these orphan

MTases might have in the Halobacteria.

149

This dissertation also discussed the construction of a deletion mutant of H. volcanii with all putative RM genes deleted (ΔRM), along with tryptophan and thymidine auxotrophic variants of the strain (RM ΔtrpA and RM ΔhdrB). These strains have the potential to be useful in future work studying the importance of DNA methylation and RM systems in the Halobacteria. Chapter

5 of this dissertation, for example, demonstrated that these strains could be used to study the impact of RM systems on cell-to-cell mating and recombination. Future studies could use these strains to determine the role of DNA methylation on mismatch repair by comparing mutation rates between the parental strain and ΔRM. Competition assays between the ΔRM strains and parental strains could also be performed to determine any possible fitness cost of maintaining or losing RM genes. The ΔRM strain could also be used for characterizing MTase genes in other halobacterial species, in which the MTase gene is expressed in H. volcanii and its methylome sequenced to determine which motifs the MTase of interest methylates. Since the MTases are likely adjusted to hypersaline environments, expressing them in H. volcanii would likely be more effective than attempting to use a standard system like Escherichia coli, which is not adapted to hypersaline environments. Overall, these RM deletion mutants will likely be useful tools for future research of RM systems and DNA methylation in the Halobacteria.

References

Adhikari, S., and Curtis, P.D. (2016). DNA methyltransferases and epigenetic regulation in bacteria. FEMS Microbiol Rev 40, 575-591.

Budroni, S., Siena, E., Dunning Hotopp, J.C., Seib, K.L., Serruto, D., Nofroni, C., Comanducci, M., Riley, D.R., Daugherty, S.C., Angiuoli, S.V., Covacci, A., Pizza, M., Rappuoli, R., Moxon, E.R., Tettelin, H., and Medini, D. (2011). Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc Natl Acad Sci U S A 108, 4494-4499.

Chang, S., and Cohen, S.N. (1977). In vivo site-specific genetic recombination promoted by the EcoRI restriction endonuclease. Proc Natl Acad Sci U S A 74, 4811-4815.

150

Fullmer, M.S., Soucy, S.M., Swithers, K.S., Makkay, A.M., Wheeler, R., Ventosa, A., Gogarten, J.P., and Papke, R.T. (2014). Population and genomic analysis of the genus Halorubrum. Front Microbiol 5, 140.

Furuta, Y., Abe, K., and Kobayashi, I. (2010). Genome comparison and context analysis reveals putative mobile forms of restriction-modification systems and related rearrangements. Nucleic Acids Res 38, 2428-2443.

Goddard, M.R., and Burt, A. (1999). Recurrent invasion and extinction of a selfish gene. Proc Natl Acad Sci U S A 96, 13880-13885.

Gogarten, J.P., Senejani, A.G., Zhaxybayeva, O., Olendzenski, L., and Hilario, E. (2002). Inteins: structure, function, and evolution. Annu Rev Microbiol 56, 263-287.

Kobayashi, I. (2001). Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res 29, 3742-3756.

Lin, E.A., Zhang, X.S., Levine, S.M., Gill, S.R., Falush, D., and Blaser, M.J. (2009). Natural transformation of Helicobacter pylori involves the integration of short DNA fragments interrupted by gaps of variable size. PLoS Pathog 5, e1000337.

Mochizuki, A., Yahara, K., Kobayashi, I., and Iwasa, Y. (2006). Genetic addiction: selfish gene's strategy for symbiosis in the genome. Genetics 172, 1309-1323.

Morgan, R.D., Dwinell, E.A., Bhatia, T.K., Lang, E.M., and Luyten, Y.A. (2009). The MmeI family: type II restriction-modification enzymes that employ single-strand modification for host protection. Nucleic Acids Res 37, 5208-5221.

Naor, A., Altman-Price, N., Soucy, S.M., Green, A.G., Mitiagin, Y., Turgeman-Grott, I., Davidovich, N., Gogarten, J.P., and Gophna, U. (2016). Impact of a homing intein on recombination frequency and organismal fitness. Proc Natl Acad Sci U S A 113, E4654- 4661.

Papke, R.T., Zhaxybayeva, O., Feil, E.J., Sommerfeld, K., Muise, D., and Doolittle, W.F. (2007). Searching for species in haloarchaea. Proc Natl Acad Sci U S A 104, 14092-14097.

Soucy, S.M., Fullmer, M.S., Papke, R.T., and Gogarten, J.P. (2014). Inteins as indicators of gene flow in the Halobacteria. Front Microbiol 5, 299.

151

Matthew Ouellette

Request to Re-Publish

Frontiers Editorial Office Fri, Apr 12, 2019 at 10:52 AM To: Matthew Ouellette

Dear Matthew,

Thank you for your email.

Under the Frontiers Terms and Conditions, authors retain the copyright to their work. Furthermore, all Frontiers articles are Open Access and distributed under the terms of the Creative Commons Attribution License (CC-BY 3,0), which permits the use, distribution and reproduction of material from published articles, provided the original authors and source are credited, and subject to any copyright notices concerning any third-party content. More information about CC-BY can be found here: http://creativecommons.org/licenses/by/4.0/

You can therefore freely reuse your articles published as part of your thesis.

Best regards,

Gearóid

---

Senior Manager, Research Integrity: Gearóid Ó Faoleán, PhD

Frontiers

www.frontiersin.org Avenue du Tribunal-Fédéral 34 1005 Lausanne, Switzerland

Office T +44 203 514 26 98

For technical issues, please visit our Frontiers Help Center frontiers.zendesk.com

On Fri, Apr 12, 2019 at 3:05 PM Matthew Ouellette wrote: To Whom It May Concern,

I am currently working on my Ph.D. dissertation at the University of Connecticut, and I was wondering if I would be allowed to re-publish the following articles as chapters in my dissertation:

https://www.frontiersin.org/articles/10.3389/fmicb.2013.00376/full https://www.frontiersin.org/articles/10.3389/fmicb.2015.00251/full

Thank you for your time.

Sincerely, Matt Ouellette

152 Matthew Ouellette

Request to Re-Publish

Genes Editorial Office Mon, Apr 15, 2019 at 2:24 AM To: Matthew Ouellette

Dear Matt,

The papers published in Genes are open access and you retain the copyright as author. You can reproduce it, always if you correctly cite the published paper.

Good luck with your dissertation.

Kind regards,

Cristina -- Cristina Arimany-Nardi, PhD Co-Managing Editor -- Genes Editorial Office E-mail: [email protected] https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mdpi.com%2Fjournal% 2Fgenes%2F&data=02%7C01%7Cmatthew.ouellette%40uconn. edu%7Cdc3206eb2e7744aba32308d6c16b147a%7C17f1a87e2a254eaab9d f9d439034b080%7C0%7C0%7C636909062983088211&sdata=doqIqe5 5eGw253lzywLrlkIqHXlkUOKKYybkdTmyBKs%3D&reserved=0 -- MDPI Barcelona Office Avenida Madrid, 95, 1º - 3, 08028 Barcelona, Spain Tel. +34 93 639 7662 https://nam01.safelinks.protection.outlook.com/?url=www.mdpi.com&data=02%7C01%7Cma tthew.ouellette%40uconn.edu%7Cdc3206eb2e7744aba32308d6c16b14 7a%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636909062983 088211&sdata=JzR2WnvLWFawDfEucgFuCdlT%2FfD9CBvvYnjJahRzTvI%3D&reserved=0

Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone.

El 12/04/2019 a las 16:08, Matthew Ouellette escribió: To Whom It May Concern,

I am currently working on my Ph.D. dissertation at the University of Connecticut, and I was wondering if I would be allowed to re-publish the following articles as chapters in my dissertation:

https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mdpi.com%2F2073- 4425%2F9%2F3%2F129&data=02%7C01%7Cmatthew.ouellette% 40uconn.edu%7Cdc3206eb2e7744aba32308d6c16b147a%7C17f1a87e2a2 54eaab9df9d439034b080%7C0%7C0%7C636909062983098219&sdata

153 =pgYcJRpJBShSgN6OOrc6jrcAozLa%2Bt6thhyqESudIzU%3D&reserved=0 https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mdpi.com%2F2073- 4425%2F10%2F3%2F233&data=02%7C01%7Cmatthew.ouellette% 40uconn.edu%7Cdc3206eb2e7744aba32308d6c16b147a%7C17f1a87e2a2 54eaab9df9d439034b080%7C0%7C0%7C636909062983098219&sdata =g91p4HTmwKOgSdzqmv85sjFTCT4IhFMhbiFm%2FZ17ULU%3D&reserved=0

Thank you for your time.

Sincerely, Matt Ouellette

ReplyFo

154