<<

KØBENHAVNS UNIVERSIT ET

PhD Thesis

Yang Guo

Functional characterization and Gene regulation of the archaeal SIRV2

Supervisor: Xu Peng

Submitted: 05/11/14

2

2

Preface The thesis entitled `` Functional characterization and gene regulation of the archaeal virus SIRV2 ´´ was submitted to the Faculty of Science, University of Copenhagen to obtain the degree. I have been financed by a scholarship from China Scholarship Council and a stipend from the European Union.

Almost all the experimental work presented in this thesis was performed at Danish Centre (DAC), Department of the Biology, University of Copenhagen, Copenhagen, Denmark, under the supervision of Associate Professor Dr. Xu Peng. Protein Circular dichroism (CD) spectroscopy was performed at the SBIN lab, Department of the Biology, University of Copenhagen and the high-throughput sequencing was carried out at Department of System Biology, Technical University of Denmark, Copenhagen, Denmark.

The thesis starts with a briefly summary of archaea and its . Some typical viruses with unexpected morphotypes and genome structures were described and SIRV2 infection cycle was also presented in detail. Then the second parts is the summary of the results mainly described ssDNA binding, annealing and nuclease activities of a conserved gene cluster of SIRV2, and a regulation map of two transcription regulators of Sulfolobus solfataricus P2 upon SIRV2 infection. At last, it ends with the conclusions and further perspectives for future work. Two manuscripts are enclosed behind.

Author: Yang Guo

Date: 05/11/14

Place: Copenhagen, Denmark

II

Acknowledgments

First and most, I would like to greatly thank my academic supervisor Xu Peng for endless support and always positive attitude, for her patient guidance and fruitful discussions. I also like to thank Xu for bringing me along the inspiring and exciting research trips from the very beginning of my biology study.

Further, I would like to thank Qunxin She and Roger A. Garrett for the valuable suggestions and scientific discussions.

Moreover, I want to thank the lab. technicians at Denmark Archaea Center, Hein Phan and Mariana Awayez for the great technical assistance in the laboratory.

Many great thanks to all the members in Denmark Archaea Center, Ling Deng, Fei He, Laura Alvarez, Daniel Jensen, Soley Gudbergsdottir, Wenyuan han, Wenfang Peng, Guannan Liu ,Carlos Sobrino, Kristine Uldahl and Marzieh Mousaei. You are inspring and talented people on various projects, and bring me many fun times in the laboratory.

Last but not least, I would like to thank my beloved family and friends outside the scientific environment for unreserved love and support throughout the duration of my Ph.D.

III

Table of content

Preface II Acknowledgments III Summary (English) V Resumé (Danish) VII List of the Publications IX Abbreviations X Thesis Objective XIII Introduction 1 1. Archaea 2 1.1 Classification of Archaea 3 1.2 Sulfolobus 5 2. Archaeal Viruses 8 2.1 Crenarchaeal viron morphotypes and their genome 9 2.2 Sulfolobus islandics rod-shaped virus 2 13 2.3 SIRV2 life cycle 15 2.3.1 Attachment and Entry 15 2.3.2 SIRV2 Gene Transcription and Regulation 17 2.3.3 SIRV2 Replication 18 2.3.4 SIRV2 Release Mechanism 20 Summary of Results 23 Future Perspectives 28 Reference 30 Manuscript I 40 Manuscript II 85

IV

Summary (English)

Viruses infecting hyperthermophilic archaea have gained wide attention during recent years owing to its remarkable diversity on morphology and genome structures. Although a substantial work was made to decipher the functions of the unique proteins encoded by archaeal viruses and to characterize the relationship of the viruses and host cells, the knowledge on the biology of the archaeal viruses is still limited. The crenarchaeal virus Sulfolobus islandics rod-shaped virus 2 (SIRV2), was emerging as a promising model for genetic and biochemical studies as well as for the characterization of different stages in viral infection cycle. However, similar to other archaeal viruses, the majority of the SIRV2 genome sequence showed little similarity to the public databases, which hindered the virus functional researches and raised challenges in protein comparison and prediction.

This thesis comprises two parts of results. Firstly, the functional characterization of a highly conserved operon of SIRV2 was described, revealing their unique protein structures, biochemical activities as well as possible biological process they may participate in. In the second part, the genome wide regulations of two Sulfolobus sofataricus P2 transcription regulators upon SIRV2 infection were firstly constructed.

A SIRV2 gene operon (gp17, gp18 and gp19) was found to be the only and highly conserved gene clusters in rudiviruses and filamentous viruses, suggesting an important function in both viral families. The experimental results showed that ORF131b (gp17) was a novel ssDNA binding protein, without a canonical ssDNA binding domain. A few positively charged residues forming a U-shaped binding channel on the gp17 dimer are crucial for its ssDNA binding activity. The intrinsically disordered C-terminus of gp17 was demonstrated to be involved in the interaction with gp18, which was predicted previously as a helicase but showed a ssDNA annealing activity in this study. gp19 was shown to possess a 5´ to 3´ ssDNA nuclease activity, in addition to the previously demonstrated endonuclease activity, and a weak interaction between gp18 and gp19 was also detected. The functional characterization of the entire operon and the strand-displacement replication mode proposed previously for SIRV2 strongly point to a role of the operon in genome maturation and/or DNA recombination in viral gene DNA replication and repair.

Two transcription regulators sso2474 and sso10340 from Sulfolobus solfataricus P2 were differently expressed upon SIRV2 infection. A method similar to, but simpler than,

V

Chromatin immunoprecipitation combined with subsequent high-throughput sequencing (Chip-seq) was applied in this study to get insight into the gene composition of the two protein regulons in vivo. After mapping the sequence data with the genomes of Sulfolobus solfataricus P2 and SIRV2, protein sso2474 was detected to have a high binding affinity to virus genome by an unknown mechanism, whereas sso10340 or its interacted protein preferred to bind and regulate the host genes on several binding sites. A total of 27 enriched DNA fragments extracted from sso10340 complex were selected as candidate binding targets from the host genome for the further analysis using EMSA (Electrophoretic mobility shift assay) and foot printing assay. A palindromic sequence motif was defined based on the enriched sequences, and most of these target genes were involved in energy metabolism, transport and amino acid metabolism. The genome-wide binding profile presented here reflected two different kinds of regulon conditions and contribute to the knowledge expansion of the transcription regulation upon virus infection in Sulfolobus.

VI

Resumé (Danish)

Vira der inficerende hypertermofile archebakterier har fået stor opmærksomhed i de seneste år på grund af deres bemærkelsesværdige mangfoldighed indenfor morfologi og genom strukturer. Selv om et stort arbejde bliver gjort for at identificer funktionen af de unikke proteiner arke virus kodet for og at beskrive forholdet mellem vira og værtsceller, er viden om arke viras biologi stadig begrænset. Den crenarchaeal virus Sulfolobus islandics stavformet virus 2 (SIRV2), er et lovende model for genetiske og biokemiske undersøgelser samt til karakterisering af forskellige stadier af virusinfektionscyklus. Men i lighed med andre arke vira, har størstedelen af SIRV2 genom sekvens ringe lighed med sekvenser i de offentlige databaser, dette hindrede funktionel virus forskning og giver store udfordringer ved sammenligning af og funktionelle forudsigelse af proteiner.

Denne afhandling består af to dele. Første beskrives hvordan en særdeles konserveret operon fra SIRV2 bliver funktionelle karakteriseret, her afsløres operons proteiners unikke strukturer, biokemiske aktiviteter samt mulig biologisk processer, de kan deltage i. I den anden del bliver to Sulfolobus sofataricus P2 transskription regulators mål identificeret i hele host genomet for første gang.

En SIRV2 gen operon (gp17, gp18 og gp19) blev anset for at være de eneste og højkonserverede genklynger i rudiviruses og trådformede vira, hvilket tyder på en vigtig funktion i begge vira familier. De eksperimentelle resultater viste, at ORF131b (gp17) var en hidtil ukendt ssDNA bindende protein uden et kanonisk ssDNA bindende domæne. Et par positivt ladede aminosyre, danner en U-formet substrat kanal på gp17 dimer. Dette er afgørende for gp17s ssDNA bindende aktivitet. Den naturlige uordnet C-terminalen del af gp17 blev påvist at være involveret i interaktionen med gp18. Som tidligere forudsigelser har klassificeret som en helicase, men i denne undersøgelse viste gp18 ssDNA bindende aktivitet. Det blev påvist at gp19 har 5' til 3' ssDNA nuklease aktivitet, udover den tidligere påvist endonukleaseaktivitet. Ydermere blev en svag interaktion mellem gp18 og gp19 blev også påvist. Den funktionelle karakterisering af hele operonet og streng-fortrængning replikation metode som tidligere er foreslået for SIRV2 peger kraftigt på operonens rolle i genomet modning og / eller DNA-rekombination af viral-DNA under replikation og reparation.

VII

To transskription regulatorer sso2474 og sso10340 fra Sulfolobus solfataricus P2 blev forskelligt udtrykt ved SIRV2 infektion. En metode, der ligner, men er enklere end, Chromatin immunopræcipitation kombineret med efterfølgende høj-throughput sekventering (Chip-seq) blev anvendt i denne undersøgelse for at få indsigt i den genetiske opbygning af de to proteiners regulon in vivo. Efter kortlægning af sekvens data mod genomerne fra Sulfolobus solfataricus P2 og SIRV2 blev det påvist protein sso2474 have en høj affinitet til virus genom via en ukendt mekanisme, hvorimod sso10340 eller dets interaktion partner foretrak at binde og regulere værtsgener på flere steder på genomet. I alt 27 berigede DNA-fragmenter blev udvundet fra sso10340 kompleks blev udvalgt som mulige bindings mål i værtsgenomet og yderligere analyse ved hjælp af EMSA (Electrophoretic mobility shift assay) og fodaftryk analyse. Et palindromt mønster blev defineret på basis af de berigede sekvenser. De fleste af de genre relateret til dette mønster var involveret i stofskiftet, aminosyretransport og metabolismen. Profilen for de to proteiners binding til DNA, der dækker hele genomet, afspejler to forskellige typer af regulons og er med til at udvide viden om regulation af transskription i relation til virus infektion i Sulfolobus.

VIII

List of the Publications:

 Article I

Guo,Y., Kragelund,B., White, M., and Peng, X. Single-strand DNA binding, annealing and nuclease activities encoded by a conserved archaeal viral gene cluster.

Submitted to Nucleic acid research.

 Article II

Guo, Y., and Peng,X. Genome-wide binding profile of two transcription regulators from Sulfolobus solfataricus.

In prep.

 Article III

Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y., and Peng,X. (2014) Unveiling cell surface and type IV secretion proteins responsible for archaeal rudivirus entry. J Virol 88: 10264-10268.

IX

Abbreviations

ABV, bottle-shaped virus

ACV, Aeropyrum coil-shaped virus AFV1, Acidianus filamentous virus 1

APBV1, Aeropyrum pernix bacilliform virus 1 ARV1, Acidianus rod-shaped virus 1 ASV1, Acidianus spindle-shaped virus 1 ATP, Adenosine Triphosphate ATV, Acidianus two-tailed virus Ala, Alanine Amp, Ampicillin bp, Base pair BSA, Bovin serum albumin Cam, Chloramphenicol CRISPR, Clustered Regularly Interspaced Short Palindromic Repeats dsDNA, Double-Strand DNA DTT, Dithiothreitol EB, Elution Buffer E.coli, Escherichia coli EDTA, Ethylenediaminetetraacetic acid EMSA, Electrophoric mobility shift assay GST, Glutathione-S-trasferase Hjr, Holliday junction resolvases ICTV, International Committee on Taxonomy of Viruses IPTG, Isopropyl-beta-D-thiogalactopyranoside ITR, Inverted terminal repeats Kan, Kanamycin KDa, Kilo-Dalton

X

LB medium, Lysogeny Broth medium Ni-NTA, Ni-nitrilotriacetic acid MALDI-TOF, Matrix-assisted laser desorption/ionization-time of flight MCP, Major capsid protein M.O.I, Multiplicity of infection mM, Mili Molar OD, Optical density ORF, Open Reading Frame

PAV, Pyrococcus abyssi virus PAGE, Poly acrylamide gel electrophoresis PBS, Phosphate Buffered Saline PCNA, Proliferating cell nuclear antigen PCR, Polymerase Chain Reaction PDB, Protein Data Bank PMSF, Phenylmethylsulfonyl fluoride PSV1, Pyrobaculum spherical virus 1 Pfu, Pyrococcus furiosus RCR, Rolling-circle replication rRNA, Ribosomal RNA

SIFV, Sulfolobus islandicus filamentous virus SIRV1/2, Sulfolobus islandicus rod-shaped virus ½

SNDV, Sulfolobus neozealandicus droplet-shaped virus SMRV1, Sulfolobales Mexican rudivirus 1 SRV, Stygiolobus rod-shaped virus SSU, Small-subunit SSV, Sulfolobus spindle-shaped virus SSVK1, Sulfolobus spindle-shaped virus K1, Kamchatka SSVrh, Sulfolobus spindle-shaped virus RH, Yellowstone

STIV1/2, Sulfolobus turreted icosahedral virus 1/2

XI

STSV1, Sulfolobus tengchongensis spindle-shaped virus 1 TEMED, N, N, N', N'-tetramethylethylenediamine TTSV1, Thermoproteustenax spherical virus 1 TTV1, Thermoproteus tenax virus 1 VAP, Virus-associated pyramid WT, Wild-type

XII

Thesis Objective

The objective of this PhD is mainly focused on the functional characterization of a conserved archaeal viral gene cluster-ORF131b (gp17), ORF436 (gp18) and ORF207 (gp19) of SIRV2 to investigate their possible roles in the whole virus life cycle. Besides, the genome wide regulation of two Sulfolobus solfataricus transcription regulators upon SIRV2 infection were also studied to get a better understanding of the regulation network between virus and host cells.

XIII

Introduction

2

1 Archaea

Evolution is a process through which the composition of genes in a population changes over generations, and it seems to progress in a quantized way, from one lever or domain of organization rising ultimately to a more complex one. In the early to middle 20th century, microbiologists tried to classify microorganisms based on the structures of their cell walls, their shapes, and the substances they consume. Until five decades ago, Zuckerkandl and Pauling claimed that it is at the level of molecules (particularly molecular sequences) that one really becomes privy to the workings of the evolutionary process. The comparative analysis of the molecular sequences started to become a powerful approach for determining evolutionary relationship (Zuckerkandl and Pauling, 1965).

The Ribosomal RNA was chosen to be a candidate molecule to detect relatedness among distant species due to broad distribution, slowly changed sequence and a component of self-replicating systems (Zablen et al., 1975). In 1977, Woese and Fox digested the 16S (18S) ribosomal RNA of the organisms with T1 RNase and subjected the products to two-dimensional electrophoretic separation, producing oligonucleotide fingerprint to identify the relationships of the living system. They found out that many of the prokaryotes once classified as belong to their own domain, which was later classified as a third domain – Archaea, meaning ancient and primitive in ancient Greek language (Woese and Fox, 1977).

The discovery of the new microbial kingdom eventually led to the classification of all known life into three major Domains: Eucarya (all ), Archaea, and Bacteria, which was a significant breakthrough in the history of biology (Forterre et al., 2002). Actually in the early 1980s, people already realized that Thermoplasma and Halobacterium had close evolutionary affinity with Methanogens, all of which were the representatives of known archaea (Woese et al., 1990). For a long time, archaea were seen as extremophiles that only exist in extreme habitats such as hot springs and salt lakes with high salt concentration (Oren, 2002), low pH (Johnson et al., 2008) or high temperature (Stetter, 2006). At the end of the last century, more organisms were discovered along with new habitats were studied, archaea have been found in a wide variety of non-extreme environments, including marine waters (DeLong, 1992), freshwater sediments (Schleper et al., 1997) as well as all kinds of soil environments (Bintrim et al., 1997;Oline et al., 2006).

2

They are globally distributed in nature and have become common microbes in environment. Since archaea can survive in such harsh conditions, they can provide a source of enzymes that resist to heat and/or to acidity, which is a valuable treasure for industry (Breithaupt, 2001). The most familiar application of an archaeal enzyme is the thermostable Pfu DNA polymerase from Pyrococcus furiosus, allowing the accurate polymerase chain reaction (PCR) to be widely used in biology science researches. There are many hidden treasures in archaea still waiting to be deciphered by old and new Archaea lovers.

1.1 Classification of Archaea

Based on the pioneering work of Carl Woese, the small subunit ribosomal RNA (ss rRNA) is widely used in molecular phylogenetic studies to investigate the relationship between organisms, rather like some classification systems that trying to group archaea based on the shared structural features and common ancestors (Gevers et al., 2006). At the early stage, archaea was further classified into two distinct groups, that the methanogens as well as their relatives were named as Euryachaeotes and the thermoacidophiles, sulfurdependent ones were categorized as Crenarchaeota (Woese et al., 1990). Most of the cultivable and well-studied archaeal species exhibit in these two main phyla. In 2002, the peculiar species Nanoarchaeum equitant was found. It harbors the smallest archaeal genome with a spherical cell shape and had been given its own phylum – Nanoarcheota (Hohn et al., 2002). Another small new group of thermophilic archaeal species, exhibiting an apparent affinity to the Crenarchaeota, but also sharing features with Euryarchaeota, were identified as Korarchaeota (Elkins et al., 2008;Anderson et al., 2008). A fifth group has also been created as Thaumarchaeota in recent years (Guy and Ettema, 2011).

Euryarchaeota, as one of the major phyla in Archaea, encompasses the most diversified phenotypes. The cultivated Euryarchaeota is subdivided into eight groups ( Thermococci, Methanopyri, Methanococci, Methanobacteria, Thermoplasmata, Archaeoglobi, Halobacteria and Methanomicrobia ), while Methanogenesis was the main invention that occurred in the euryarchaeal phylum along with halophiles, some thermoacidophiles as well as some hyperthermophiles (Gribaldo and Brochier-Armanet, 2006). In contrast, most of the cultivable Crenarchaeota strains belong to the thermophilic or hyperthermophilic species, showing a very limited phenotypic diversity (Forterre et al., 2002). However, since the marine archaeal group was discovered and identified as characteristic Crenarchaeota by

3

environmental rRNA, it is thought that they may be the extremely abundant archaea in the marine environment and could be a significant component of deep-sea metabolism (Fuhrman et al., 1992).

The orders -Thermoproteales, Caldisphaerales, Desulfurococcales and Sulfolobales represent the four lineages of the Crenarchaeaotal branch of Archaea. Thermoproteales are rod-shaped extreme thermophiles or hyperthermophiles. They are the only organisms known to lack the canonical SSB proteins, instead possessing the protein ThermoDBP specifically bound to ssDNA (Paytubi et al., 2012). The Sulfolobus species are relatively easy to cultivate due to the aerobic lifestyle and relatively short doubling times, and as the only genetic manipulatable representatives in Crenarchaeota, have developed into model organisms to study their DNA repair, replication, transcription, chromosome integration, RNA processing, cell division, virus-host interaction systems as well as many other cellular aspects (Bernander, 2007).

4

Figure 1. Small subunit ribosomal RNA-based phylogenetic tree. The thick lineages represent Hyperthermophiles. (modified from Stetter, 2006)

1.2 Sulfolobus

Since the first description by T. Brock in 1972 about Sulfolobus acidocaldarius, isolated from a hot spring in Yellowstone National Park, this new group of sulfur-oxidizing

5

organisms has been of interest both evolutionarily and geochemically (Brock et al., 1972). Sulfolobus species has been isolated from a wide variety of acid thermal areas (in the USA, Italy, Iceland, Russia and elsewhere), with optimal growth occurring at pH 2-3 and temperatures of 75-80 °C, making them acidophiles and thermophiles respectively. So far, most strains isolated are able to grow heterotrophically as well as autotrophically. Since the Sulfolobus species have a wide geographic distribution, they are normally named after the location where they were first isolated, e.g. Sulfolobus islandics strains were isolated in Iceland (`island` is German for `Iceland`) (Zillig et al., 1994), Sulfolobus tengchongensis from Teng Chong, China (Xiang et al., 2003) and Sulfolobus solfataricus from volcanic hot springs at Pisciarelli Solfatara (Zillig et al., 1980). Among these species, S.solfataricus is one of the best-characterized and most commonly used strains in laboratories.

Figure 2. Electron micrographs of Sulfolobus solfataricus strain, DSM 1617, thin section (from Zillig et al., 1980).

The Sulfolobus strains DSM 1616 and DSM 1617 (Fig 2.) were firstly named as Sulfolobus solfataricus by Zillig for having a similar GC content but significantly different RNA polymerase molecular weights with respect to S. acidocaIdarius (Zillig et al., 1980). Moreover, they were newly renamed as S. solfataricus P1, P2 and have developed as the main model species that researchers work on, especially when the genome sequence of S. solfataricus strain P2 was published by She, the transcriptome map was drawn by Wurtzel, providing rich detailed information for the further work on DNA replication mechanism, cell cycle, transcription and large numbers of unknown genes (She et al., 2001;Wurtzel et al., 2010). Wealthy data, standardized methods, maturing genetic system, the easy

6

lab-cultivating advantages as well as the host species for studying virus-host interactions contributed to the construction of Sulfolobus solfataricus as a model organism (Albers et al., 2009;Worthington et al., 2003;Deng et al., 2009).

The widely used strain S. solfataricus P2 (DSM1617) showed a low fraction of susceptible cells to both Sulfolobus islandicus rod-shaped virus 2 (SIRV2) and Sulfolobus turreted icosahedral virus (STIV) infection (Okutan et al., 2013;Ortmann et al., 2008). In this work, the highly susceptible mutant strain S. solfataricus P2 5E.6 was selected as the host strain for SIRV2 infection study. The strain carries a deletion of CRISPR (Clustered regularly interspaced short palindromic repeats) clusters A-D, but shows similar phenotype on chromosome degradation and virus life cycle as the natural host strain S. islandicus LAL14/1 upon SIRV2 infection (Okutan et al., 2013;Bize et al., 2009).

7

2 Archaeal Viruses

Along with archaeal communities, viruses thrive in extreme conditions and play an important role in ecosystem dynamics. Like members of the other domains of life, archaea are infected with viruses. At the initial stage, the thermophilic viruses isolated from archaea domain resembled with head-tail in morphotype (Martin et al., 1984), and the subsequently discovered Euryarchael viruses were also similar with bacterial viruses. In contrast, as more habitats were studied, viruses were found infecting members of the kingdom Crenarchaeota (Sulfolobus, Acidianus, Pyrobaculum and Thermoproteus) and exhibiting highly diverse morphotypes and genomic properties. The extraordinary shape of some viruses have never been observed before (Prangishvili, 2003). Most crenarchaeal viruses have been isolated from hot terrestrial habitats, and they show adaption to their extreme environments like their host.

Due to the abundance and unique biology, some challenges arise with archaeal viruses. The low lever of sequence similarity to public databases, novel biochemical mechanisms as well as difficulties in virus and host cultivation need to be addressed (Prangishvili and Garrett, 2005). After a relativly intensive study on archaeal viruses in the last decade, about 100 viral species have been sequenced and their genomic properties as well as relationships with the host cell have also been described. Among these viruses, only two single-strand DNA virus species have been discovered (Pietila et al., 2010;Mochizuki et al., 2012) , the others all possess double-strand DNA genomes. According to the International Committee on Taxonomy of Viruses (ICTV), bacterial viruses comprise nine morphotypes, which belong to ten families. While archaeal viruses, exhibit 16 different morphotypes, and are classified into 15 families (Ackermann and Prangishvili, 2012) (Fig 3.). Although limited in number, compared with the viruses infecting bacteria, the diverse and unique morphotypes of archaea viruses revealed new insights into the viral world.

8

Figure 3. Virion morphotypes of prokaryotic viruses. Names of viral genera or families based on International Committee on Taxonomy of Viruses (ICTV) are indicated below the schematic virus particles. If an archaeal virus has not been assigned to any genus or family, individual virus names are given. The virions are not drawn to scale (from Pietila et al., 2014) .

2.1 Crenarchaeal viron morphotypes and their genomes

Thermophilic viruses infecting the crenarchaea have been classified into ten families based on their morphology, eight families were approved by ICTV and the remaining two are waiting for approval (Pietila et al., 2014). The ten crenarchaeal virus families are: One tail spindle –shaped (SSV1-7, SSVK1, SSVrh, ASV1); two tail spindle–shaped (ATV); Bottle-shaped (ABV); Droplet-shaped (SNDV); Linear filamentous (AFV1-9,SIFV,TTV1); Linear rod-shaped rudiviridae (SIRV1-2,SRV,ARV1); Spherical Globulaviridae (PSV1,TTSV1); Bacilliform (APBV1), Tailless icosahedral `` (STIV,STIV2) and Coil-shaped Spiravridae (ACV), the last two families are waiting for the approval. Some other viruses

9

like Sulfolobus tengchongensis spindle-shaped virus (STSV1) and Pyrococcus abyssi virus (PAV1) are awaiting assignment to a viral family. Some well-studied and intriguing viruses will be described in more detail as below:

Sulfolobus spindle-shaped virus 1 (SSV1). The Sulfolobus spindle-shaped viruses (SSVs) of the family Fuselloviridae were the first discovered family of archaeal viruses. Most of the SSVs (except for SSV6 and ASV1) are spindle-shaped, 100 x 60 nm in size and carry tail structures at one pole (Fig 4.).

Figure 4. Electron micrographs of virus particles. (A) Cell apparently extruding virus. (B) Free virus and virus particles attached to cellular material. Two large particles are arrowed. (C) Purified free virus particles exhibiting tail structures. Three bullet-shaped particles are seen on the right. (D) Thin sections of cells sampled 6 h after u.v. irritation showing three cell-cell contacts. The bars represent 0.2 um (from Martin et al., 1984).

The virus SSV1 was isolated from UV-induced growing cultures of Sulfolobus shibatae (strain B12). It contains a 15.5-kb positively supercoiled circular double-stranded DNA, with a GC-content of 39.7 %, resembling that of the host DNA (Palm et al., 1991). During infection SSV1 is stably carried by its lysogenic host S. shibatae and is found intracellularlly either in a covalently closed circular () form or site-specifically integrated within an arginine tRNA gene in the host chromosome (Yeats et al., 1982). The transcription pattern of the SSV1 genome is relatively simple, some genes are significantly upregulated by UV irradiation and the genes can be clearly divided into early, late and UV-inducible categories (Reiter et al., 1987;Frols et al., 2007).

Acidianus two-tailed virus (ATV). This archaeal virus was discovered in an acidic hot spring (85–93 °C; pH 1.5) at Pozzuoli, Italy. As the sole member of the viron family Bicaudaviridae, ATV contains a lemon-shaped central structure, but when it exits the host cell, it then develops elongated tails protruding from both pointed ends, specifically at

10

temperatures above 75°C, close to the temperature of the natural habitat of the host (Fig. 5). The circular, dsDNA genome contains 62730 bp, encodes 72 predicted proteins, 11 of which are structural proteins with molecular masses in the range of 12 to 90 kDa. The unique host-independent as well as extracellular functional activity might be associated with an 88.7-kDa ATV viron protein P800, which is rich in coiled-coil motifs and can generate structures that resemble intermediate filaments (Haring et al., 2005c;Prangishvili et al., 2006c). ATV was the first known virus from hot, acidic habitats that causes lysis of its host cell, whereas most archaeal viruses maintain a stable relationship with their host.

Figure 5. Electron micrographs of Acidianus convivator and different forms of the Acidianus two-tailed virus. a, Virions in an enriched sample taken from acidic hot springs in Pozzuoli, Italy (pH 1.5, 85–93 °C). b, Extrusion of lemon-shaped virions from an ATV-infected A. convivator cell. c, Virions in a growing culture of ATV-infected A. convivator, 2 days after infection. d, Cultured virions after purification and incubation at 75 °C for 0, 2, 5, 6 and 7 days ( panels from right to left, respectively) (from Haring et al., 2005c).

Acidianus bottle-shaped virus (ABV). The enveloped virion of ABV, has a complex form resembling a bottle (230-nm long, 4–75-nm wide, Fig 6.C), the morphology is so unique that it has been assigned to a new family Ampullaviridae. The narrow end of `the bottle` is likely to be involved in cellular adsorption and in channeling of viral DNA into the host cell (Fig 6. A) and the broad end exhibits 20 thin filaments, which are inserted into a disk and interconnected at the base, the function of these filaments remains unclear but very intriguing (Fig 6.B) (Haring et al., 2005a).

ABV was isolated from the same hot spring in Pozzuoli, Italy, where ATV was isolated. It infects strains of the hyperthermophilic archaeal genus Acidianus, and contains a linear double-stranded DNA. The viron genome has a length of 23,814 bp, with a G +C content of

11

35%, and a 590-bp inverted terminal repeat. It encodes 57 predicted ORFs, of which a putative RNA molecule was predicted to have a notable secondary structural similarity to the RNA molecule, which has been implicated in DNA packaging. Moreover, in contrast to other crearchaeal viruses, ABV encodes a Family B DNA polymerase (Peng et al., 2007).

C

Fig. 6. Electron micrographs of particles of ABV after negative staining with 3% uranyl acetate. (A) ABV particles adsorbed with their pointed end toward a membrane vesicle of the host “A. convivator.” (B) ABV particles attached to each other with their thin filaments at the broad end. Bars, 100 nm. (C) A scheme of the structure of an ABV virion (from Haring et al., 2005a).

Acidianus filamentous virus 1 (AFV1). AFV is a Lipothrixvirus that infects the Acidiannus genus of the Crenarchaeota in a stable carrier state and was observed in an enrichment culture from a hot spring at 80 °C in Crater Hills region of Yellowstone National Park (Rachel et al., 2002). AFV1 is composed of a protein core covered with a lipid envelope, containing at least five different proteins with molecular masses in the range of 23-130 kDa. The 20.8-kb-long linear genome contains 40 ORFs and particles of AFV1 are measured with size of 900 × 24 nm.

AFV1 exhibits claw-like terminal structures, connected to the virion body by appendages at the both ends (Fig 7.A). Apparently, the unusual termini of the virions have a special function in the process of adsorption, which was detected to have an attachment with the host pili and the contact seems rather strong (Fig 7.B) (Bettstetter et al., 2003). Crystal structures of two major coat proteins AFV1-132 and AFV1-140 have been resolved, both carry a novel four-helix-bundle fold and AFV1-140 also carries an extra C terminal domain possibly interacting with the virion envelope (Goulet et al., 2009b). Recently, a new replication model was proposed by analyzing the replicative intermediates on

12

two-dimensional (2D) agarose gel, revealing that the genome replication started from a D-loop formation, proceeded via strand displacement, and terminated by recombination. This process in some degree resembled the T4 DNA replication, but further studies are still needed to support the proposed model (Pina et al., 2014).

Fig 7. (A) Electron micrographs of particles of AFV1 with tail structures A B in their native conformation. (B) Electron micrographs of particles of AFV1 adsorbed to pili of host A. hospitalis CH10/1, stained with 3% uranyl acetate. Black arrows indicate pili; white arrows show knots which are putative viral terminal structures separated from the virus body. Bars, 100 nm. (from Bettstetter et al., 2003)

2.2 Sulfolobus islandics rod-shaped virus 2 (SIRV2).

Rudiviridae and Lipothrixviridae belong to the linear viruses, and they are ubiquitous in high temperature (>75°C) and low pH (pH <3) terrestrial geothermal environments. Comparative-genomic analysis suggests a common evolutionary ancestry of the rudiviruses and lipothrixviruses, based on the conservation of orthologous core genes and the similarity of the major viron coat proteins (Prangishvili et al., 2006a). Together with Sulfolobus islandics rod-shaped virus 1 (SIRV1), Stygiolobus rod-shaped virus (SRV) (Vestergaard et al., 2008b), Acidianus rod-shaped virus 1 (ARV1) (Vestergaard et al., 2005) and Sulfolobales Mexican rudivirus 1 (SMRV1) (Servin-Garciduenas et al., 2013), SIRV2 was grouped as rudiviruses by their rod-shaped morphology, gene architecture and sequence.

SIRV2 is one of the most extensively studied archaeal viruses and has developed into the archaeal model virus thanks to the structural, genomic and transcription studies. SIRV2 was first isolated from the colony-cloned S.islandicus stains HVE 10/2 isolated from solfataric fields in Iceland-Hveragerdi. This non enveloped virus is a stiff rod of 23 nm in width, 900 nm in length. As shown in the electron micrographs, a central cavity ends plugging by

13

approximately 50 nm stoppers were clearly visible in Figure 8, and both ends decorated with three short tail fibers. Sensitivity of SIRV2 genome to BAL31 but not λ exonuclease indicated the existence of covalently closed hairpin ends (Prangishvili et al., 1999).

The linear 35,502 bp double stranded DNA SIRV2 genome carries inverted terminal repeats (ITRs) of 1628 bp at each end, and with a low G+C content of 25%. The virion body is a superhelix formed by genomic DNA and multiple copies of the highly glycosylated 20-kDa capsid protein. SIRV2 genome encodes 54 ORFs, sharing 44 homologs with SIRV1, and approximately half of the encoded proteins have been characterized by sequence, structural and biochemical analysis, which is the highest proportion on recognizing gene functions among crenarchaeal viruses. Four SIRV2 viron proteins were identified as virion structure proteins: the major capsid protein (MCP), P134 (gp26), shares a common fold with MCPs of lipothrixvirus AFV (Acidianus filamentous virus) (Goulet et al., 2009a). The largest viral protein P1070 (gp38), has a MW of 105 kDa, possesses a coiled-coil domain. It is a component of the three fibers (Steinmetz et al., 2008). Besides, the two structural proteins ORF488 (gp33) and ORF564 (gp39) are also found in the SIRV2 virons, although in a low amount (Vestergaard et al., 2008b). Crystal structure resolution of SIRV1 P119 as well as the detection of nicking and joining activities by experiments suggest that this protein could be involved in initiation of SIRV1 genome replication (Oke et al., 2011). P121 has a high sequence similarity with archaeal Holliday junction resolvases (Hjrs) and the Hjr activity was experimentally examined (Birkenbihl et al., 2001). There are more hypothetical proteins predicted by sequence analysis and experimental evidence that involved in transcription, replication and nucleic acid metabolism, more detailed information about the whole life cycle of SIRV2 will be discussed below.

14

Figure 8. Structure of SIRV2 particles. (A). TEM of negatively stained SIRV2 particles. Inset shows a high resolution image of the end structure of the capsid. The scale bar is 500 nm. (B). Schematic depiction of SIRV2 particles ( from Steinmetz et al., 2008).

2.3 SIRV2 life cycle

2.3.1 Attachment and Entry

The archaeal viruses display an unusual and diverse morphotypes, genome sequences, as well as the structure of proteins (Krupovic et al., 2012). Recent researches revealed that the interaction between archeal viruses and their hosts seem also to be unique (Bize et al., 2009). However, compared with wealth of data available on bacterial and eukaryotic systems, the studies on archaeal viruses mainly consistent of biochemical and genetic characterization of their virions, while the attachment and entry process are still elusive.

Virus infection is initiated by entry into the host cell, and the first step of the entry process is to recognize the receptors present on the host cell surface by specific interaction. Then they must have ways to transporting their genetic information to the cell compartment where their genome is replicated (Poranen et al., 2002). The vast majority of known viruses have a tail structure decorated to one or two ends of nucleocapsid, which facilitate the attachment of virions to the host membrane. In the Lipothrixviridae family, each of the virion is tapered and carries different specific terminal structures. These structures can represent claws (AFV1), T-bars (AFV9), mop-like structures (SIFV), three (AFV3) or six (SFV) short

15

filaments or tips resembling bottle brushes (AFV2), which are implicated in cellular adsorption (Bettstetter et al., 2003;Bize et al., 2008;Haring et al., 2005b). Both termini of the SIRV2 virion are connected with three tail fibers composed of the minor structural protein P1070. These termini were detected to bind the tips of the pilus-like filaments, which are abundant on the surface of host cells, by transmission electron microscopy and whole-cell electron cryotomography (cryo-ET). Figure 9 demonstrated the interaction between SIRV2 termini fibers and purified host cellular filaments. The virus adsorption was very fast and irreversible, but the infected cells were no longer able to adsorb more virus efficiently (Quemin et al., 2013). Many bacterial viruses like Ff inoviruses, utilize the filamentous cellular appendages as primary receptors. Then retraction of the host pilus bring the viron close to the host cell surface, where it could bind to the secondary receptor (Rakonjac et al., 2011). Although no retracting pili have been identified in archaea, there should be secondary receptors on the host cell surface to adsorb the virus particles. Indeed, Sulfolobus mutant strain lacking cluster sso3138-sso3141 and cluster sso2386-sso2387 was resistant to SIRV2. No growth retardation was observed when this mutant strain was diluted and infected with SIRV2 at the same M.O.I, compared with wide type strain. The first clusters sso3138 to sso3141 were predicted to possess transmembrane helices and to be located extracellularly, probably acting as a receptor for SIRV2. The proteins encoded by the other gene cluster may be involved in the secretion of the receptor components. Besides, the genetic complementation experiments confirmed the involvement of the mutation in virus resistance and further support that these proteins are responsible for SIRV2 entry (Deng et al., 2014).

Figure 9. Transmission electron micrographs of SIRV2 interaction with purified cellular filaments. The filaments were removed from S. islandicus LAL14/1 cells (from Quemin et al., 2013) .

16

However, how the virus overcomes 12.5-m-long filament to reach the cell body, and the mechanism of removing their coat protein as well as the association of the two identified gene clusters with the structure of pili is still poorly understood and need further studies.

2.3.2 SIRV2 Gene Transcription and Regulation

SIRV2 enters into the cell, removes the coat protein and is likely to recruit the host RNA polymerase complex for transcribing the SIRV2 genes, as no viral gene was shown to encode a RNA polymerase. Generally, the virus transcription is time regulated and could be classified as early, middle and late transcribed genes that encode the proteins involved in regulation, translation, replication and structure proteins for assembly in a chronological way.

In bacterial and eukaryal virus-host systems, modification of cellular transcription as a result of virus infection is well studied, such as T7 bacteriophage. Besides transcribed by host E.coli RNA polymerase, T7 bacteriophage encodes its own RNA polymerase, which is a single subunit enzyme of 99 kDa. The DNA genome of T7 is transcribed entirely from left to right, firstly in the early region by E.coli RNA polymerase, and then from a portion of the early region to the entire late region by newly-made T7 RNA polymerase (Dunn and Studier, 1983;Steitz, 2004). Whereas in archaeal domain, the mechanisms and controls of viral gene expression as well as host gene regulation upon virus infection are still not elucidated. To date, several archaeal viruses have developed as suitable models to study molecular details of the archaeal viron life cycle and host responses, e.g. the temperate Sulfolobus spindle-shaped viruses (SSV) (Frols et al., 2007) and the lytic Sulfolobus turreted icosahedral virus (STIV) (Ortmann et al., 2008;Maaty et al., 2012).

Reminiscent to the life cycles of bacteriophages and eukaryotic viruses, SSV1 exhibits a tight temporal regulation of its own transcription after UV treatment, initiating from a small UV-specific gene and then continues as three distinct sets of genes representing immediate-early, early and late transcripts. But very few of host genes was regulated upon virus infection (Frols et al., 2007). However, the microarray study about transcription of the lytic virus STIV was completely an opposite story. STIV transcription did not show a typical temporal regulation. In the virus life cycle, transcription signals of nine early viron

17

genes were detected at 8h, then all the remaining genes were transcribed subsequently. Surprisingly, a total of 177 host genes were determined to be differentially expressed during the infection, of which two thirds were up-regulated and one third were down-regulated (Ortmann et al., 2008). SIRV2 infects Sulfolobus spp. but does not integrate into the host chromosome. It was thought to be a lysogenic virus existing in a stable carrier state in the host, corresponding to the uniform transcription pattern revealed in an earlier study (Kessler et al., 2004). During the characterization of the special release mechanism of SIRV2, it was then demonstrated to be a lytic virus (Bize et al., 2009). Independent microarray study performed in infected S. solfataricus 5E6 cells and transcriptomic analysis of infected S. islandicus LAL14/1 cells exhibited that SIRV2 transcription starts from the terminal genes located at both ends of the linear genome (Quax et al., 2013;Okutan et al., 2013). Although SIRV2 transcription is not tightly regulated chronologically like SSV1, the gene expression showed a temporal pattern. Some early genes like two identical ORF83a, ORF83b as well as ORF119C, the viral replication initiation protein (Oke et al., 2011), were detected to be transcribed in 15-30 min and highly expressed at 1h. Whereas the structure proteins like the major capsid protein and virus-associated pyramids protein were most abundant at the late stage of the infection cycle. Despite lacking strong temporal regulation of transcription on its own virus, the host response to SIRV2 infection was significant. More than one third of the host genes were differentially regulated, with a similar number of down and upregulated genes. Most of the host genes that are strongly activated upon infection are assumed to function in defense against viruses, as well as cellular collapse, energy metabolism and membrane transport, which may suggest that the virus control the replication phases less dependent on its own differential gene expression, but co-opted host genes.

2.3.3 SIRV2 Replication

As we know that SIRV2 genome is a linear duplex with covalently closed hairpin termini and long ITRs at both ends (Blum et al., 2001). This termini structure is normally involved in replication initiation that parallels to that of and other large cytoplasmic eukaryotic viruses. Both DNA sequence and structure within the termini are important for template recognition, which is nicked by Rep initiation protein and exposed a 3'-OH group as a primer for DNA synthesis (Du and Traktman, 1996).

18

By sequence and structure analysis, ORF119 of SIRV2 was found to be a member of the replication initiator (Rep) family, having a conserved key active-site motif with rolling-circle replication (RCR) rep proteins (Vega-Rocha et al., 2007). It forms a dimer and sequence-specifically nicks one strand of the SIRV2 terminal hairpin only when the substrate is in the single-strand form. The joining activity of ligating the fragments by a strand transfer (flip-flop) mechanism was also confirmed (Oke et al., 2011). According to all these features, along with the detection of head-to-head concatemers of the replicative intermediates (Peng et al., 2001), a related but unrestricted mechanisms to rolling-circle replication (RCR) was proposed. The Rep protein recognizes and nicks one ori site of the genome, then one subunit of rep protein covalent connected with the new generated ori site and the other subunit of the protein ligated the old two fragments, forming a contiguous DNA circle. Displacement replication is then used to replicate the rest of the genome and generated a double strand DNA circle, but with a nicked hairpin termini adducted rep protein. The next steps of the replication are similar to RCR of the poxviruses. At the junctions between genome monomers, opposing inverted terminal repeats can be extruded to form hairpin fourway junctions. Therefore, a Holiday junction resolving enzyme (Hjr) was supposed to introduced to resolve the concatamers, producing monomer copies with linear hairpin ends (Culyba et al., 2006;Oke et al., 2011). Holliday junction resolving enzymes are ubiquitously found in all the domains of life, such as RuvC in Bacteria, Human GEN1 in Eukarya (Declais and Lilley, 2008), and two different holiday junction resolving enzymes (Hjr and Hje) from Sulfolobus solfataricus of crenarchaeon (Kvaratskhelia and White, 2000). As expected, SIRV2 encodes a 14 kDa Holliday junction resolving enzyme (SIRV2 Hjr), which is conserved among rudiviruses. Unlike the bacteriophage resolving enzymes, which cleave a variety of branched DNA structures formed during replication, the SIRV2 Hjr showed a very narrow substrate range, only cleaves the four-way junctions DNA structures, and the cleavage pattern is also unique by nicking only exchange strand pairs (Gardner et al., 2011a). This protein was presumed to be important for the processing of replicative DNA intermediates late during the infection cycle before packaging into newly synthesized heads commences. Unlike some bacterial viruses encoding DNA replication related proteins, most of the archaeal viruses lack its own DNA polymerase genes, indicating that their replication probably rely on the host replication machinery. It was proved by recently published work that two of the heterotrimeric S. solfataricus sliding clamp (SsoPCNA1 to 3) (proliferating

19

cell nuclear antigen) interacted with some SIRV2 viral proteins. PCNA is a key protein functioning as a cofactor of DNA polymerases recruiting different crucial DNA metabolism proteins in DNA replication and repair (Moldovan et al., 2007). Most of the interacting viral proteins have not been assigned function, except SIRV2 Hjr, which agreed to previous research released that SsoPCNA could stimulate the Hjr enzyme activity in S.solfataricus (Dorazi et al., 2006). It is intriguing that the early transcribed genes ORF83a/b were also shown to interact with PCNA, suggesting its important roles during the replication cycle of SIRV2 (Gardner et al., 2014).

Although the functions of some replicative viral proteins were confirmed, and a preliminary model was proposed, a lot of further studies are still needed to discover and explain the virus replication mechanism.

2.3.4 SIRV2 Release Mechanism

The final step for completion of the viral replication cycle is the release of virus particles. In bacterial domain, most lytic viruses cross the cell envelope and spread to the environment with the assistant of phage-encoded small integral membrane proteins--holins (Krupovic and Bamford, 2008). How the archaeal viruses overcome the challenging task of rupturing the cell membrane and escape from the host cells have attracted a lot of attention in recent years, especially after the discovery of a unique release mechanism (Bize et al., 2009).

Among archaeal viruses, SIRV2 and STIV are the best studied viruses with respect to host cell interactions. Both of them are lytic viruses, and shared the same extraordinary virion egress mechanism. SIRV2 induces the degradation of the host chromosome and assemble virus particles in cytoplasm (Fig. 10A). In the late stages of the virus infection cycle, numerous prominent virus-associated pyramids (VAPs) were formed on the host cell surface (Fig. 10B and C), and these special structures open outward at the end of infection cycle, allowing the escape of the mature viruses through the created apertures (Bize et al., 2009;Brumfield et al., 2009). Apparently, this release mechanism is not universal for hyperthermophilic viruses. Although sharing the release mechanism, the two viruses are dramatically different in their morphological properties. Therefore, it is possible that the morphogenetic and egress systems evolved independently.

20

A B C

D E F

Figure 10. (A) Schematic representation of the major stages of SIRV2 infection cycle in the Sulfolobus host cell. Times after infection are indicated in hours. The gradual opening out of VAPs(at time points between10and14 h) is illustrated inmoredetails with fragments from the TEM of thin sections. (B) Cells 10 h after infection. (C) Thin sections in a plane perpendicular to the cell envelope. Arrows indicate VAPs (Bize et al., 2009). D and E , Negative contrast electron micrographs of isolated VAPs. (D) The side view of intact VAPs. (E) Top view of a VAP in the open conformation (Scale bars: 100 nm.). (F) Thin sections through S. acidocaldarius expressing SIRV2-P98. Arrows indicate VAPs. (Scale bars: 200 nm.) (from Quax et al., 2010)

In order to investigate the special structural protein components of SIRV2-infected cells, three different virus-infected cell fractions were collected, compared and analyzed: the total cell lysate, the membrane and the cytosol fractions. It was found that the 10 kDa P98 of SIRV2 is the only protein appearing specifically in the membrane fraction of infected cells and is exposed on the surface that rupturing the S-layer, no other viral protein is involved in the assembly of pyramids. After overexpression of SIRV2-ORF98 in E. coli and S. acidocaldarius, the VAPs were also formed with the same size and shape as those formed in S. islandicus infected with SIRV2 (Fig. 10F) (Quax et al., 2010;Quax et al., 2011). The sequence alignment data revealed that no other archaeal virus carried the homologue of the SIRV2-ORF98, except for STIV and the Rudiviridae (SIRV1/2, SRV). It is also intriguing that the VAPs was a separate structural unit which can be isolated and purified, and the solo protein SIRV2-ORF98 is capable of self-assembling into ordered sevenfold isosceles

21

triangle–shaped pyramid (Fig. 10D and E), which seems to be a baseless and hollow structure (Quax et al., 2011).

Although the similar VAPs were observed in heterologous expressed E.coli and S. acidocaldarius, they only existed in the surface of the inner membrane and all of the VAPs were in closed state. There must be at least one special factor induced the VAPs opening, which is absent in S. acidocaldarius and E. coli, but is present in its native host cells.

22

Summary of Results

23

Sulfolobus islandics rod-shaped virus 2 (SIRV2), as a member of the family Rudiviridae, is a promising candidate to become a general model for detailed studies of archaeal virus biology due to its relatively easy laboratory cultivation and sufficient yields. To date, several important stages of its biological life cycle have been characterized such as viral entry, transcription pattern, genome replication as well as its unique egress mechanism, which provide us a much better understanding of the unknown archaeal viral world.

Even so, similar to the vast majority of other archaeal viruses showing little sequence similarity to public databases, the functions of many SIRV2 proteins remain to be identified. Only one fifth of the 54 ORFs encoded by SIRV2 genome were experimentally confirmed a function, and the knowledge of its basic molecular processes like DNA repair, recombination, genome maturation as well as the interaction with its host are still limited.

Although possessing limited sequence similarities with the public gene bank, a CRISPR-associated Cas4-like protein ORF207 (gp19), previously identified as a ssDNA endonuclease, has drawn a lot of interests (Gardner et al., 2011b). It was detected to be transcribed from a single promoter with two other proteins (gp17 and gp18), located at its upstream, and generated a polycistronic transcript (Kessler et al., 2004). This organization of proteins suggests related functions. Moreover, the bioinformatic analysis revealed that this operon constitutes the most conserved gene cluster in archaeal linear viruses including rudiviruses and filamentous viruses. Then it has raised questions regarding the functions of this entire operon and the related virus infection stages they may be involved in.

The resolved crystal structure of gp17 homolog encoded by SIRV1 indicated a DNA binding activity, although no obvious structural similarity was matched in Protein Data Bank. Different structural DNA substrates were tested for its binding activities, and the results demonstrated that either ssDNA or dsDNA with a single or double flaps can be shifted with the protein gp17, no blunt-end dsDNA could form the protein-DNA complex, which indicate that gp17 is a ssDNA binding protein. However, none of the documented classical ssDNA binding domains were found in the structure of gp17, therefore this protein constitutes a novel non-canonical ssDNA binding protein. Sequence alignment of gp17 homologs revealed 3 highly conserved and 5 relatively conserved residues. Mutagenesis of a

24

few conserved basic residues distributed in two adjacent loops within each monomer suggested a U-shaped binding path for ssDNA. As gp18 couldn’t be cloned into Sulfolobus cells due to its toxicity and as recombinant expression in E.coli resulted in the formation of inclusion bodies, a denaturation and refolding strategy was employed to purify the His-tagged gp18 from E.coli. Both the circular dichroism spectroscopy and gel-filtration chromatography assay showed that the refolded protein gp18 is functional stable to be used for the further study. BlastP search of gp18 sequence suggested a weak similarity to bacterial ATPase domains of Lon protease, and a tertiary structure prediction suggested a function as hexameric helicase. However, neither protease or helicase activity of gp18 could be detected under all possible conditions. Instead, gp18 was detected to be able to increase the dsDNA yield from two complementary oligonucleotides. The failure of detecting the helicase activity could be due to the lack of proper experimental conditions or possible mask of helicase activity by the stronger annealing activity. It also could be that gp18 carries no helicase activity, but only annealing activirty, possessing the similar features as the annealing helicases (HARP, AH2). (Yusufzai and Kadonaga, 2008;Yusufzai and Kadonaga, 2010).

To better understand the function of the entire gene operon, the protein product of the third gene, gp19, was further characterized in this study, which was detected to have a 5’-3’ ssDNA exonuclease activity, in addition to the previously demonstrated ssDNA endonuclease activity.

There are 38 aa residues missing at the C-terminus of gp17 in the crystal structure, which was predicted as disordered region by two different program IUpred and PONDR. The disordered C-terminus of bacterial SSB proteins are normally involved in protein-protein interactions. Since the entire operon all work in the same type of substrate, ssDNA, the interactions among the three proteins were performed by GST affinity chromatography. The experimental results demonstrated that gp17 interacts with gp18 and the C-terminal disordered domain of gp17 is essential for the interaction. No interaction was detected between gp17 and gp19, but a weak interaction was shown between gp18 and gp19. In order to confirm whether gp17 could recruit some ssDNA-processing proteins as bacterial SSBs, this gene was inserted into plasmid and transfermed into host Sulfolobus solfataricus P2 cells for the pull-down assay in vivo. Two host proteins sso2277 and reverse gyrase

25

(sso0422) were detected to interact with gp17. Protein structure prediction of sso2277 revealed a high similarity with RecF and RecN proteins, which involved in DNA replication / recombination.

The operonic or clustered organization of the three genes in rudi- and filamentous viruses and the observed interactions between their protein products strongly suggest their close cooperation in a same process(es) involving ssDNA. This process could be the SIRV2 genome maturation, replication or recombination, and new evidences are needed to support the hypothesis.

Besides available information concerns unusual viral morphological and genomic properties, the SIRV2 transcription pattern as well as the regulation of host genes during virus infection was studied, either by microarray analysis or by deep transcriptome sequencing (Okutan et al., 2013;Quax et al., 2013). Although lacking of strong temporal regulation of transcription on its own virus, the host response to SIRV2 infection was significant. More than one third of the host genes were differentially regulated, with a similar number of downregulated and upregulated genes. Among these regulated genes, there are two transcription regulators sso2474 and sso10340 from Sulfolobus sulfataricus P2 were responded differently. Then we are curious to find out if any host genes or virus genes were regulated by these two regulators, and whether they are a local or global acting transcription factors. In this work, we investigate the binding targets of the two proteins in an in vivo context by performing a method similar to chromatin immunoprecipitation combined with DNA sequencing.

In order to detect the regulation on both host genes and viral genes, after the proteins were overexpressed for 15h, the cells were infected with SIRV2 at about M.I.O of 10 for 2.5 h. His-tagged protein purification was carried out from virus infected cells, the protein sso2474 was detected to bind hundreds of fold more DNA than the control group, and sso10340 exhibited a range of oligomeric states resembling the feature of Lrp/AsnC family proteins. The bound DNA was separated from DNA-protein complex from the two purified proteins, respectively, and sent for the high-throughput sequencing. The alignment between sequenced data and virus genome or host genome revealed that most of the DNA bound by sso2474 is viral DNA and sso10340 is mainly associated with the host regulation.

26

However, the specific binding targets and the binding mechanisms of sso2474 are still not identified, and the experiments showed that this protein purified from E.coli preferred to bind dsDNA than ssDNA. A total 27 binding target regions by protein sso10340 or its interacted proteins in S.solfataricus P2 were identified, and half of them located in the upstream or partial upstream of the corresponding genes, while the other half fell within the gene coding regions. A 11bp palindromic binding motif was defined by analysis of the enriched oligonucleotide sequences, which was present in 96% of the binding targets. The functions of these related genes were categorized and most of which were involved in energy metabolism, transport and amino acid metabolism.

27

Future Perspectives

28

The PhD work is the first study providing the functional characterization of an entire gene operon conserved in archaeal rudiviruses and filamentous viruses as well as the general regulation profile of two host regulators. There are still some work to be done in the future to enlarge the knowledge of the archaeal viral biology and virus-host interaction.

The toxicity of gp18 to Sulfolobus cells as well as its insolubilities in E.coli hindered the progress of characterization of the whole operon. Although insertion of both gp17 and gp18 genes into Sulfolobus solfataricus cells could decrease the strong toxicity of gp18, almost no expressed gp18 can be purified (data not shown). One method we would like further to try is co-expressing the two or three proteins in E.coli in a suitable system to test if gp17 or gp19 could help the folding of gp18, since both of them were demonstrated to interact with gp18. If so, the next plan is crystalizing the complex with a synthesized oligonucleotide to further investigate the interaction in detail between the complex and ssDNA.

The ssDNA binding, annealing and nuclease activities in vitro were all characterized in this study, and the possible function in viral infection cycle are discussed, whereas more in vivo evidence is still needed to complete the scenario we constructed. Due to the limited viral genetic technologies and relatively large size of this virus, silencing these genes in virus is not possible until now. We already tried to overexpressing the c-terminal truncated gp17 in the host cells for competition with the wide-type one in virus to detect its influence either in virus replication or genome maturation. However, the result is not conclusive due to different reasons. It would be interesting and exciting to use some good ideas and methods to detect the viral function of this operon in vivo.

The very surprising thing about sso2474 is its special high affinity to viral DNA in a non-sequence binding way. The DNA binding mechanism of sso2474 was not clear, and the phonotype changes and no growth retardation to virus are expected to be observed in the sso2474 mutated organism, if it can be knocked out. At last, the global regulation of sso10340 need to be further validated. Is there any other DNA binding protein or regulators interacting with sso10340 and whether this regulator activates or represses the transcription of the corresponding genes are still need to be confirmed.

29

Reference

Ackermann,H.W., and Prangishvili,D. (2012) Prokaryote viruses studied by electron microscopy. Arch Virol 157: 1843-1849.

Albers,S.V., Birkeland,N.K., Driessen,A.J., Gertig,S., Haferkamp,P., Klenk,H.P. et al. (2009) SulfoSYS (Sulfolobus Systems Biology): towards a silicon cell model for the central carbohydrate metabolism of the archaeon Sulfolobus solfataricus under temperature variation. Biochem Soc Trans 37: 58-64.

Anderson,I., Rodriguez,J., Susanti,D., Porat,I., Reich,C., Ulrich,L.E. et al. (2008) Genome sequence of Thermofilum pendens reveals an exceptional loss of biosynthetic pathways without genome reduction. J Bacteriol 190: 2957-2965.

Bernander,R. (2007) The cell cycle of Sulfolobus. Mol Microbiol 66: 557-562.

Bettstetter,M., Peng,X., Garrett,R.A., and Prangishvili,D. (2003) AFV1, a novel virus infecting hyperthermophilic archaea of the genus acidianus. Virology 315: 68-79.

Bintrim,S.B., Donohue,T.J., Handelsman,J., Roberts,G.P., and Goodman,R.M. (1997) Molecular phylogeny of Archaea from soil. Proc Natl Acad Sci U S A 94: 277-282.

Birkenbihl,R.P., Neef,K., Prangishvili,D., and Kemper,B. (2001) Holliday junction resolving enzymes of archaeal viruses SIRV1 and SIRV2. J Mol Biol 309: 1067-1076.

Bize,A., Karlsson,E.A., Ekefjard,K., Quax,T.E., Pina,M., Prevost,M.C. et al. (2009) A unique virus release mechanism in the Archaea. Proc Natl Acad Sci U S A 106: 11306-11311.

Bize,A., Peng,X., Prokofeva,M., Maclellan,K., Lucas,S., Forterre,P. et al. (2008) Viruses in acidic geothermal environments of the Kamchatka Peninsula. Res Microbiol 159: 358-366.

Blum,H., Zillig,W., Mallok,S., Domdey,H., and Prangishvili,D. (2001) The genome of the archaeal virus SIRV1 has features in common with genomes of eukaryal viruses. Virology 281: 6-9.

Bochkarev,A., and Bochkareva,E. (2004) From RPA to BRCA2: lessons from single-stranded DNA binding by the OB-fold. Curr Opin Struct Biol 14: 36-42.

Breithaupt,H. (2001) The hunt for living gold. The search for organisms in extreme environments yields useful enzymes for industry. EMBO Rep 2: 968-971.

Brock,T.D., Brock,K.M., Belly,R.T., and Weiss,R.L. (1972) Sulfolobus: a new genus of sulfur-oxidizing bacteria living at low pH and high temperature. Arch Mikrobiol 84: 54-68.

30

Brugger,K., Redder,P., and Skovgaard,M. (2003) MUTAGEN: multi-user tool for annotating genomes. Bioinformatics 19: 2480-2481.

Brumfield,S.K., Ortmann,A.C., Ruigrok,V., Suci,P., Douglas,T., and Young,M.J. (2009) Particle assembly and ultrastructural features associated with replication of the lytic archaeal virus sulfolobus turreted icosahedral virus. J Virol 83: 5964-5970.

Chong,J.P., Hayashi,M.K., Simon,M.N., Xu,R.M., and Stillman,B. (2000) A double-hexamer archaeal minichromosome maintenance protein is an ATP-dependent DNA helicase. Proc Natl Acad Sci U S A 97: 1530-1535.

Culyba,M.J., Harrison,J.E., Hwang,Y., and Bushman,F.D. (2006) DNA cleavage by the A22R resolvase of vaccinia virus. Virology 352: 466-476.

Declais,A.C., and Lilley,D.M. (2008) New insight into the recognition of branched DNA structure by junction-resolving enzymes. Curr Opin Struct Biol 18: 86-95.

DeLong,E.F. (1992) Archaea in coastal marine environments. Proc Natl Acad Sci U S A 89: 5685-5689.

Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y., and Peng,X. (2014) Unveiling cell surface and type IV secretion proteins responsible for archaeal rudivirus entry. J Virol 88: 10264-10268.

Deng,L., Zhu,H., Chen,Z., Liang,Y.X., and She,Q. (2009) Unmarked gene deletion and host-vector system for the hyperthermophilic crenarchaeon Sulfolobus islandicus. Extremophiles 13: 735-746.

Dickey,T.H., Altschuler,S.E., and Wuttke,D.S. (2013) Single-stranded DNA-binding proteins: multiple domains for multiple functions. Structure 21: 1074-1084.

Dorazi,R., Parker,J.L., and White,M.F. (2006) PCNA activates the Holliday junction endonuclease Hjc. J Mol Biol 364: 243-247.

Dosztanyi,Z., Csizmok,V., Tompa,P., and Simon,I. (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21: 3433-3434.

Du,S., and Traktman,P. (1996) Vaccinia virus DNA replication: two hundred base pairs of telomeric sequence confer optimal replication efficiency on minichromosome templates. Proc Natl Acad Sci U S A 93: 9693-9698.

Dunn,J.J., and Studier,F.W. (1983) Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elements. J Mol Biol 166: 477-535.

31

Elkins,J.G., Podar,M., Graham,D.E., Makarova,K.S., Wolf,Y., Randau,L. et al. (2008) A korarchaeal genome reveals insights into the evolution of the Archaea. Proc Natl Acad Sci U S A 105: 8102-8107.

Forterre,P., Brochier,C., and Philippe,H. (2002) Evolution of the Archaea. Theor Popul Biol 61: 409-422.

Frols,S., Gordon,P.M., Panlilio,M.A., Schleper,C., and Sensen,C.W. (2007) Elucidating the transcription cycle of the UV-inducible hyperthermophilic archaeal virus SSV1 by DNA microarrays. Virology 365: 48-59.

Fuhrman,J.A., McCallum,K., and Davis,A.A. (1992) Novel major archaebacterial group from marine plankton. Nature 356: 148-149.

Gardner,A.F., Bell,S.D., White,M.F., Prangishvili,D., and Krupovic,M. (2014) Protein-protein interactions leading to recruitment of the host DNA sliding clamp by the hyperthermophilic Sulfolobus islandicus rod-shaped virus 2. J Virol 88: 7105-7108.

Gardner,A.F., Guan,C., and Jack,W.E. (2011a) Biochemical characterization of a structure-specific resolving enzyme from Sulfolobus islandicus rod-shaped virus 2. PLoS One 6: e23668.

Gardner,A.F., Prangishvili,D., and Jack,W.E. (2011b) Characterization of Sulfolobus islandicus rod-shaped virus 2 gp19, a single-strand specific endonuclease. Extremophiles 15: 619-624.

Gevers,D., Dawyndt,P., Vandamme,P., Willems,A., Vancanneyt,M., Swings,J., and De,V.P. (2006) Stepping stones towards a new prokaryotic taxonomy. Philos Trans R Soc Lond B Biol Sci 361: 1911-1916.

Goulet,A., Blangy,S., Redder,P., Prangishvili,D., Felisberto-Rodrigues,C., Forterre,P. et al. (2009a) Acidianus filamentous virus 1 coat proteins display a helical fold spanning the filamentous archaeal viruses lineage. Proc Natl Acad Sci U S A 106: 21155-21160.

Goulet,A., Spinelli,S., Blangy,S., van,T.H., Leulliot,N., Basta,T. et al. (2009b) The thermo- and acido-stable ORF-99 from the archaeal virus AFV1. Protein Sci 18: 1316-1320.

Gribaldo,S., and Brochier-Armanet,C. (2006) The origin and evolution of Archaea: a state of the art. Philos Trans R Soc Lond B Biol Sci 361: 1007-1022.

Gudbergsdottir,S., Deng,L., Chen,Z., Jensen,J.V., Jensen,L.R., She,Q., and Garrett,R.A. (2011) Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with vector-borne viral and plasmid genes and protospacers. Mol Microbiol 79: 35-49.

32

Guilliere,F., Peixeiro,N., Kessler,A., Raynal,B., Desnoues,N., Keller,J. et al. (2009) Structure, function, and targets of the transcriptional regulator SvtR from the hyperthermophilic archaeal virus SIRV1. J Biol Chem 284: 22222-22237.

Guy,L., and Ettema,T.J. (2011) The archaeal 'TACK' superphylum and the origin of eukaryotes. Trends Microbiol 19: 580-587.

Haring,M., Rachel,R., Peng,X., Garrett,R.A., and Prangishvili,D. (2005a) Viral diversity in hot springs of Pozzuoli, Italy, and characterization of a unique archaeal virus, Acidianus bottle-shaped virus, from a new family, the Ampullaviridae. J Virol 79: 9904-9911.

Haring,M., Vestergaard,G., Brugger,K., Rachel,R., Garrett,R.A., and Prangishvili,D. (2005b) Structure and genome organization of AFV2, a novel archaeal lipothrixvirus with unusual terminal and core structures. J Bacteriol 187: 3855-3858.

Haring,M., Vestergaard,G., Rachel,R., Chen,L., Garrett,R.A., and Prangishvili,D. (2005c) Virology: independent virus development outside a host. Nature 436: 1101-1102.

Hohn,M.J., Hedlund,B.P., and Huber,H. (2002) Detection of 16S rDNA sequences representing the novel phylum "Nanoarchaeota": indication for a wide distribution in high temperature biotopes. Syst Appl Microbiol 25: 551-554.

Howard,J.A., Delmas,S., Ivancic-Bace,I., and Bolt,E.L. (2011) Helicase dissociation and annealing of RNA-DNA hybrids by Escherichia coli Cas3 protein. Biochem J 439: 85-95.

Johnson,D.B., Joulian,C., d'Hugues,P., and Hallberg,K.B. (2008) Sulfobacillus benefaciens sp. nov., an acidophilic facultative anaerobic Firmicute isolated from mineral bioleaching operations. Extremophiles 12: 789-798.

Kawano,S., Iyaguchi,D., Okada,C., Sasaki,Y., and Toyota,E. (2013) Expression, purification, and refolding of active recombinant human E-selectin lectin and EGF domains in Escherichia coli. Protein J 32: 386-391.

Kelley,L.A., and Sternberg,M.J. (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4: 363-371.

Kessler,A., Brinkman,A.B., van der Oost,J., and Prangishvili,D. (2004) Transcription of the rod-shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon sulfolobus. J Bacteriol 186: 7745-7753.

Kowalczykowski,S.C. (2000) Initiation of genetic recombination and recombination-dependent replication. Trends Biochem Sci 25: 156-165.

Krupovic,M., and Bamford,D.H. (2008) Holin of bacteriophage lambda: structural insights into a membrane lesion. Mol Microbiol 69: 781-783.

33

Krupovic,M., White,M.F., Forterre,P., and Prangishvili,D. (2012) Postcards from the edge: structural genomics of archaeal viruses. Adv Virus Res 82: 33-62.

Kvaratskhelia,M., and White,M.F. (2000) Two Holliday junction resolving enzymes in Sulfolobus solfataricus. J Mol Biol 297: 923-932.

Lemak,S., Beloglazova,N., Nocek,B., Skarina,T., Flick,R., Brown,G. et al. (2013) Toroidal structure and DNA cleavage by the CRISPR-associated [4Fe-4S] cluster containing Cas4 nuclease SSO0001 from Sulfolobus solfataricus. J Am Chem Soc 135: 17476-17487.

Maaty,W.S., Steffens,J.D., Heinemann,J., Ortmann,A.C., Reeves,B.D., Biswas,S.K. et al. (2012) Global analysis of viral infection in an archaeal model system. Front Microbiol 3: 411.

Martin,A., Yeats,S., Janekovic,D., Reiter,W.D., Aicher,W., and Zillig,W. (1984) SAV 1, a temperate u.v.-inducible DNA virus-like particle from the archaebacterium Sulfolobus acidocaldarius isolate B12. EMBO J 3: 2165-2168.

Mochizuki,T., Krupovic,M., Pehau-Arnaudet,G., Sako,Y., Forterre,P., and Prangishvili,D. (2012) Archaeal virus with exceptional virion architecture and the largest single-stranded DNA genome. Proc Natl Acad Sci U S A 109: 13386-13391.

Moldovan,G.L., Pfander,B., and Jentsch,S. (2007) PCNA, the maestro of the replication fork. Cell 129: 665-679.

Mosig,G. (1998) Recombination and recombination-dependent DNA replication in bacteriophage T4. Annu Rev Genet 32: 379-413.

Munoz,V., and Serrano,L. (1994) Elucidating the folding problem of helical peptides using empirical parameters. Nat Struct Biol 1: 399-409.

Oke,M., Carter,L.G., Johnson,K.A., Liu,H., McMahon,S.A., Yan,X. et al. (2010) The Scottish Structural Proteomics Facility: targets, methods and outputs. J Struct Funct Genomics 11: 167-180.

Oke,M., Kerou,M., Liu,H., Peng,X., Garrett,R.A., Prangishvili,D. et al. (2011) A dimeric Rep protein initiates replication of a linear archaeal virus genome: implications for the Rep mechanism and viral replication. J Virol 85: 925-931.

Okutan,E., Deng,L., Mirlashari,S., Uldahl,K., Halim,M., Liu,C. et al. (2013) Novel insights into gene regulation of the rudivirus SIRV2 infecting Sulfolobus cells. RNA Biol 10: 875-885.

Oline,D.K., Schmidt,S.K., and Grant,M.C. (2006) Biogeography and landscape-scale diversity of the dominant Crenarchaeota of soil. Microb Ecol 52: 480-490.

34

Oren,A. (2002) Molecular ecology of extremely halophilic Archaea and Bacteria. FEMS Microbiol Ecol 39: 1-7.

Ortmann,A.C., Brumfield,S.K., Walther,J., McInnerney,K., Brouns,S.J., van de Werken,H.J. et al. (2008) Transcriptome analysis of infection of the archaeon Sulfolobus solfataricus with Sulfolobus turreted icosahedral virus. J Virol 82: 4874-4883.

Palm,P., Schleper,C., Grampp,B., Yeats,S., McWilliam,P., Reiter,W.D., and Zillig,W. (1991) Complete nucleotide sequence of the virus SSV1 of the archaebacterium Sulfolobus shibatae. Virology 185: 242-250.

Paytubi,S., McMahon,S.A., Graham,S., Liu,H., Botting,C.H., Makarova,K.S. et al. (2012) Displacement of the canonical single-stranded DNA-binding protein in the Thermoproteales. Proc Natl Acad Sci U S A 109: E398-E405.

Peng,X., Basta,T., Haring,M., Garrett,R.A., and Prangishvili,D. (2007) Genome of the Acidianus bottle-shaped virus and insights into the replication and packaging mechanisms. Virology 364: 237-243.

Peng,X., Blum,H., She,Q., Mallok,S., Brugger,K., Garrett,R.A. et al. (2001) Sequences and replication of genomes of the archaeal rudiviruses SIRV1 and SIRV2: relationships to the archaeal lipothrixvirus SIFV and some eukaryal viruses. Virology 291: 226-234.

Peng,X., Kessler,A., Phan,H., Garrett,R.A., and Prangishvili,D. (2004) Multiple variants of the archaeal DNA rudivirus SIRV1 in a single host and a novel mechanism of genomic variation. Mol Microbiol 54: 366-375.

Pietila,M.K., Demina,T.A., Atanasova,N.S., Oksanen,H.M., and Bamford,D.H. (2014) Archaeal viruses and bacteriophages: comparisons and contrasts. Trends Microbiol 22: 334-344.

Pietila,M.K., Laurinavicius,S., Sund,J., Roine,E., and Bamford,D.H. (2010) The single-stranded DNA genome of novel archaeal virus halorubrum pleomorphic virus 1 is enclosed in the envelope decorated with glycoprotein spikes. J Virol 84: 788-798.

Pietila,M.K., Roine,E., Paulin,L., Kalkkinen,N., and Bamford,D.H. (2009) An ssDNA virus infecting archaea: a new lineage of viruses with a membrane envelope. Mol Microbiol 72: 307-319.

Pina,M., Basta,T., Quax,T.E., Joubert,A., Baconnais,S., Cortez,D. et al. (2014) Unique genome replication mechanism of the archaeal virus AFV1. Mol Microbiol 92: 1313-1325.

Poranen,M.M., Daugelavicius,R., and Bamford,D.H. (2002) Common principles in viral entry. Annu Rev Microbiol 56: 521-538.

35

Prangishvili,D. (2003) Evolutionary insights from studies on viruses of hyperthermophilic archaea. Res Microbiol 154: 289-294.

Prangishvili,D., Arnold,H.P., Gotz,D., Ziese,U., Holz,I., Kristjansson,J.K., and Zillig,W. (1999) A novel virus family, the Rudiviridae: Structure, virus-host interactions and genome variability of the sulfolobus viruses SIRV1 and SIRV2. Genetics 152: 1387-1396.

Prangishvili,D., Forterre,P., and Garrett,R.A. (2006a) Viruses of the Archaea: a unifying view. Nat Rev Microbiol 4: 837-848.

Prangishvili,D., and Garrett,R.A. (2005) Viruses of hyperthermophilic Crenarchaea. Trends Microbiol 13: 535-542.

Prangishvili,D., Garrett,R.A., and Koonin,E.V. (2006b) Evolutionary genomics of archaeal viruses: unique viral genomes in the third domain of life. Virus Res 117: 52-67.

Prangishvili,D., Koonin,E.V., and Krupovic,M. (2013) Genomics and biology of Rudiviruses, a model for the study of virus-host interactions in Archaea. Biochem Soc Trans 41: 443-450.

Prangishvili,D., Vestergaard,G., Haring,M., Aramayo,R., Basta,T., Rachel,R., and Garrett,R.A. (2006c) Structural and genomic properties of the hyperthermophilic archaeal virus ATV with an extracellular stage of the reproductive cycle. J Mol Biol 359: 1203-1216.

Quax,T.E., Krupovic,M., Lucas,S., Forterre,P., and Prangishvili,D. (2010) The Sulfolobus rod-shaped virus 2 encodes a prominent structural component of the unique virion release system in Archaea. Virology 404: 1-4.

Quax,T.E., Lucas,S., Reimann,J., Pehau-Arnaudet,G., Prevost,M.C., Forterre,P. et al. (2011) Simple and elegant design of a virion egress structure in Archaea. Proc Natl Acad Sci U S A 108: 3354-3359.

Quax,T.E., Voet,M., Sismeiro,O., Dillies,M.A., Jagla,B., Coppee,J.Y. et al. (2013) Massive activation of archaeal defense genes during viral infection. J Virol 87: 8419-8428.

Quemin,E.R., Lucas,S., Daum,B., Quax,T.E., Kuhlbrandt,W., Forterre,P. et al. (2013) First insights into the entry process of hyperthermophilic archaeal viruses. J Virol 87: 13379-13385.

Rachel,R., Bettstetter,M., Hedlund,B.P., Haring,M., Kessler,A., Stetter,K.O., and Prangishvili,D. (2002) Remarkable morphological diversity of viruses and virus-like particles in hot terrestrial environments. Arch Virol 147: 2419-2429.

Rakonjac,J., Bennett,N.J., Spagnuolo,J., Gagic,D., and Russel,M. (2011) Filamentous bacteriophage: biology, phage display and nanotechnology applications. Curr Issues Mol Biol 13: 51-76.

36

Reiter,W.D., Palm,P., Yeats,S., and Zillig,W. (1987) Gene expression in archaebacteria: physical mapping of constitutive and UV-inducible transcripts from the Sulfolobus virus-like particle SSV1. Mol Gen Genet 209: 270-275.

Schleper,C., Holben,W., and Klenk,H.P. (1997) Recovery of crenarchaeotal ribosomal DNA sequences from freshwater-lake sediments. Appl Environ Microbiol 63: 321-323.

Servin-Garciduenas,L.E., Peng,X., Garrett,R.A., and Martinez-Romero,E. (2013) Genome sequence of a novel archaeal rudivirus recovered from a mexican hot spring. Genome Announc 1.

She,Q., Singh,R.K., Confalonieri,F., Zivanovic,Y., Allard,G., Awayez,M.J. et al. (2001) The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad Sci U S A 98: 7835-7840.

Shereda,R.D., Kozlov,A.G., Lohman,T.M., Cox,M.M., and Keck,J.L. (2008) SSB as an organizer/mobilizer of genome maintenance complexes. Crit Rev Biochem Mol Biol 43: 289-318.

Steinmetz,N.F., Bize,A., Findlay,K.C., Lomonossoff,G.P., Manchester,M., Evans,D.J., et al. (2008) Site-specific and spatially controlled addressability of a new viral nanobuilding block: Sulfolobus islandics Rod-shaped Virus 2. Advanced functional materials 18: 3478-3486.

Steitz,T.A. (2004) The structural basis of the transition from initiation to elongation phases of transcription, as well as translocation and strand separation, by T7 RNA polymerase. Curr Opin Struct Biol 14: 4-9.

Stetter,K.O. (2006) Hyperthermophiles in the history of life. Philos Trans R Soc Lond B Biol Sci 361: 1837-1842.

Suck,D. (1997) Common fold, common function, common origin? Nat Struct Biol 4: 161-165.

Theobald,D.L., Mitton-Fry,R.M., and Wuttke,D.S. (2003) Nucleic acid recognition by OB-fold proteins. Annu Rev Biophys Biomol Struct 32: 115-133.

Vega-Rocha,S., Gronenborn,B., Gronenborn,A.M., and Campos-Olivas,R. (2007) Solution structure of the endonuclease domain from the master replication initiator protein of the faba bean necrotic yellows virus and comparison with the corresponding geminivirus and structures. Biochemistry 46: 6201-6212.

Vestergaard,G., Aramayo,R., Basta,T., Haring,M., Peng,X., Brugger,K. et al. (2008a) Structure of the acidianus filamentous virus 3 and comparative genomics of related archaeal lipothrixviruses. J Virol 82: 371-381.

37

Vestergaard,G., Haring,M., Peng,X., Rachel,R., Garrett,R.A., and Prangishvili,D. (2005) A novel rudivirus, ARV1, of the hyperthermophilic archaeal genus Acidianus. Virology 336: 83-92.

Vestergaard,G., Shah,S.A., Bize,A., Reitberger,W., Reuter,M., Phan,H. et al. (2008b) Stygiolobus rod-shaped virus and the interplay of crenarchaeal rudiviruses with the CRISPR antiviral system. J Bacteriol 190: 6837-6845.

Woese,C.R., and Fox,G.E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A 74: 5088-5090.

Woese,C.R., Kandler,O., and Wheelis,M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A 87: 4576-4579.

Worthington,P., Hoang,V., Perez-Pomares,F., and Blum,P. (2003) Targeted disruption of the alpha-amylase gene in the hyperthermophilic archaeon Sulfolobus solfataricus. J Bacteriol 185: 482-488.

Wu,Y. (2012) Unwinding and rewinding: double faces of helicase? J Nucleic Acids 2012: 140601.

Wurtzel,O., Sapra,R., Chen,F., Zhu,Y., Simmons,B.A., and Sorek,R. (2010) A single-base resolution map of an archaeal transcriptome. Genome Res 20: 133-141.

Xiang,X., Dong,X., and Huang,L. (2003) Sulfolobus tengchongensis sp. nov., a novel thermoacidophilic archaeon isolated from a hot spring in Tengchong, China. Extremophiles 7: 493-498.

Xue,B., Dunbrack,R.L., Williams,R.W., Dunker,A.K., and Uversky,V.N. (2010) PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim Biophys Acta 1804: 996-1010.

Yeats,S., McWilliam,P., and Zillig,W. (1982) A plasmid in the archaebacterium Sulfolobus acidocaldarius. The EMBO Journal 1 : 1035-1038.

Yusufzai,T., and Kadonaga,J.T. (2008) HARP is an ATP-driven annealing helicase. Science 322: 748-750.

Yusufzai,T., and Kadonaga,J.T. (2010) Annealing helicase 2 (AH2), a DNA-rewinding motor with an HNH motif. Proc Natl Acad Sci U S A 107: 20970-20973.

Zablen,L.B., Kissil,M.S., Woese,C.R., and Buetow,D.E. (1975) Phylogenetic origin of the chloroplast and prokaryotic nature of its ribosomal RNA. Proc Natl Acad Sci U S A 72: 2418-2422.

38

Zhang,J., Kasciukovic,T., and White,M.F. (2012) The CRISPR associated protein Cas4 Is a 5' to 3' DNA exonuclease with an iron-sulfur cluster. PLoS One 7: e47232.

Zillig,W., Stetter,O.K., Wunderl,S., Schulz,W., Priess,H., Scholz,I.(1980) The Sulfolobus-``Caldariella`` Group: Taxonomy on the Basis of the Structure of DNA-Dependent RNA Polymerases. Archives of Microbiology 125:259-269.

Zillig,W., Kletzin,A., Schleper,C., Holz,I., Janekovic,D., Hain,J. et al. (1994) Screening for Sulfolobales, Their and Their Viruses in Icelandic Solfataras. Systematic and Applied Microbiology 16: 609-628.

Zuckerkandl,E., and Pauling,L. (1965) Molecules as documents of evolutionary history. J Theor Biol 8: 357-366.

39

Manuscript I

Single-stranded DNA binding, annealing and nuclease activities encoded by a conserved archaeal viral gene cluster

Yang Guo, Birthe B. Kragelund, Malcolm F. White and Xu Peng

Submitted to Nucleic Acid Research

40

Single-stranded DNA binding, annealing and nuclease activities

encoded by a conserved archaeal viral gene cluster

Yang Guo1, Birthe B. Kragelund1, Malcolm F. White2 and Xu Peng1*

1 Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 CPH N. Denmark

2 Biomedical Sciences Research Complex, University of St. Andrews, North Haugh, St. Andrews, Fife, UK

* To whom correspondence should be addressed. Tel: +45 35322018; Fax: +45 35322128; Email: [email protected]

41

ABSTRACT

Single-stranded DNA (ssDNA) occurs in various cellular and viral processes of DNA metabolism, including DNA replication, homologous recombination and repair pathways. Here, we describe a novel type of ssDNA binding protein, a novel ssDNA annealing protein and a ssDNA nuclease encoded by an operon comprised of ORF131b (gp17), ORF436 (gp18) and ORF207 (gp19), respectively, of Sulfolubus islandicus rod-shaped virus 2 (SIRV2). Rather than comprising one of the canonical ssDNA binding domains, SIRV2 gp17 forms a dimer with each monomer containing two α -helices and three β-strands. Mutagenesis of a few conserved basic residues distributed in two adjacent loops within each monomer suggested a U-shaped binding path for ssDNA. Although predicted previously as a helicase, the recombinant gp18 showed a ssDNA annealing activity often associated with helicases and recombinases. Moreover, gp19 was shown to possess a 5´ to 3´ ssDNA exonuclease activity, in addition to the previously demonstrated ssDNA endonuclease activity. Further, in vitro pull-down assay demonstrated interactions between gp17 and gp18 and between gp18 and gp19 with the former being mediated by the intrinsically disordered C-terminus of gp17. The strand-displacement replication mode proposed previously for rudiviruses and the close interaction between the ssDNA binding, annealing and nuclease activities strongly point to a role of the gene operon in genome maturation and/or DNA recombination which may function in viral DNA replication/repair.

INTRODUCTION

Viruses that infect extreme hyperthermophilic archaea, the third domain of life, are unusual in their morphology, genome structure and proteins. In the last decade, a major effort has been undertaken to study the archaeal viruses, which have attracted intense interest as model systems to understand the biochemistry and molecular biology required for life at high temperatures. Based on their morphological and genomic characteristics, 15 viral families have been classified and about 100 viral isolates described, all with either linear or circular double-stranded (ds) DNA, except two species possessing a single-stranded (ss) DNA genome (1-3).

The Sulfolobus islandicus rod-shaped virus 2 (SIRV2) (4), together with SIRV1 (5), Stygiolobus rod-shaped virus, SRV (6), Acidianus rod-shaped virus 1, ARV1 (7) and Sulfolobales Mexican rudivirus 1 (SMRV1) (8), belong to the family Rudiviridae. The

42

rudiviruses have linear dsDNA genomes (24.6 to 35.8 kbp) with inverted terminal repeats and the two strands at the genomic termini are covalently linked. Recently SIRV2 has been the focus of genomic, structural, genetic and transcriptional studies, which have provided important insights into its entry, gene regulation and unique release mechanisms (5;9-13). Even so, similar to the vast majority of other archaeal viruses showing little sequence similarity to public databases (14), the functions of many SIRV2 proteins remain to be identified.

Among the 54 ORFs encoded in the genome of SIRV2, only one fifth had been experimentally assigned a function (15). The virus is coated with one major capsid protein gp26 and three minor structural proteins gp33, gp38 and gp39 (16). Together with the genes encoding the viral structural proteins, gp49, encoding the component of the pyramidal egress structure (11), is repressed by the transcription regulator gp15 (SvtR) during the early virus infection cycle (17;18). gp16, belonging to the replication initiator (Rep) family and nicking one strand of the viral genomic termini, was proposed to be involved in the initiation of the DNA replication (19). The Holiday junction resolving enzyme (Hjr) gp35 was suggested to resolve the concatemers of the replicative intermediates, producing monomeric copies with linear hairpin ends (20). Taken together, the functions of many SIRV2 genes remain unknown and the knowledge of its biology and basic molecular processes such as DNA replication, recombination and maturation is still limited.

In this work we studied a SIRV2 operon containing three genes, gp17, gp18 and gp19 that are highly conserved in rudiviruses and filamentous viruses. gp19 was previously shown to be an endonuclease specifically cutting ssDNA (21). We demonstrate here that gp17 is a ssDNA binding protein and interacts with gp18 while the latter stimulates annealing of complementary oligonucleotides. In addition to the previously identified ssDNA endonuclease activity, we detected the 5’ to 3’ ssDNA exonuclease activity of gp19, which also interacts with gp18. Based on the data, the possible functions of the gene operon are discussed.

MATERIALS AND METHODS

Cloning, expression and purification of C-terminally His-tagged recombinant proteins

The coding sequences of gp17, gp18 and gp19 were amplified by PCR from SIRV2 genome using primers listed in Table S1, digested with NdeI and XhoI and subsequently inserted

43

into a similarly digested pET-30(a) (Novagen) expression vector. To introduce single or multiple amino acid (aa) mutations into the recombinant gp17, fusion PCR using 4 primers (Table S1) was performed for each mutant.

E.coli BL21 CodonPlus cells were transformed with individual plasmid construct and a single clone transformant was inoculated in LB medium containing 30 g/ml kanamycin and 25 g/ml chloramphenicol. At an optical density (OD600) of 0.4, IPTG (0.5 mM) was added to the culture and the cells were further cultured at 25°C for 12 hours. Harvested cell pellet was resuspended in lysis buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl, 1 mM EDTA, 1% Triton X-100 and 1 mM PMSF) and lysed by sonication. The lysate was cleared by centrifugation at 10000 x g for 20 minutes and the supernatant was then incubated with Ni-NTA-agarose beads (Qiagen, Germany) for 1 h at room temperature. The beads were washed three times with washing buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl, 40 mM Imidazol ) and the protein eluted with elution buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl, 250 mM Imidazol). The purity of the proteins was evaluated by SDS-PAGE and the gel (12.5%) was stained with PAGE blue (Sigma Aldrich, UK). In the case of gp18, a denaturing and refolding method was applied (see below).

Cloning, expression and purification of N-terminally GST-tagged proteins

The wild-type gp17, gp17-I, its truncated mutants gp17-II (1-121aa) and gp17-III (1-111aa) and gp19 were amplified by PCR using primers listed in Table S1 and the PCR products were digested with BamHI and XhoI and ligated to a BamHI and XhoI digested pGEX-6p-2 (GE Healthcare Life Science, Sweden ) expression vector. The constructs were introduced individually into BL21 CodonPlus cells. Transformed cells were grown in LB medium containing 100 g/ml ampicillin and 25 g/ml chloramphenicol and induced with 0.5 mM

IPTG at an OD600 of 0.4. After 12 hours incubation at 25°C, the cells were pelleted, resuspended in lysis buffer (PBS buffer pH 8.0, 1 mM EDTA, 1% Triton X-100 and 1 mM PMSF) and lysed by sonication. The supernatant was incubated with Glutathione Sepharose 4 Fast Flow beads (GE Healthcare Life Science, Sweden) for 1 h at room temperature. The beads were washed three times with PBS buffer and the proteins remain bound on the beads for subsequent pull-down assays with His-tagged proteins.

Refolding of His-tagged gp18 from inclusion bodies

44

The cell lysate was centrifuged at 10000 × g for 20 min, and the inclusion bodies in the pellet were solubilized in lysis buffer containing 8 M urea at room temperature for 1h. The purification of the denatured protein using Ni-NTA-agarose beads followed the same procedure as described above for other His-tagged proteins, except that 8 M urea was included in both washing and elution buffers. The eluted 2 ml protein from 1L E.coli cells was dialysed first in 200 ml 0.5 M L-Arginine buffer for 2 h, and then in 2 L PBS buffer for 3 h and the latter dialysis was repeated for 3 times.

Preparation of substrates for DNA mobility shift and nuclease activity assays

To prepare DNA substrates for DNA mobility shift and exonuclease assays, oligonucleotide 1 (Table S2) was annealed to a series of fully or partially complementary ssDNA oligonucleotides (Table S2) to generate either a 23-nucleotide (23-nt) 5´-ssDNA tailed duplex (substrate B), a 23-nt 3´-ssDNA tailed duplex (substrate C), or a blunt-ended duplex (substrate A) (Table 1). Oligonucleotide 4 (Table S2) was annealed to its partially complementary ssDNA oligonucleotides 5 (Table S2) to generate a Y-shaped dsDNA (substrate D). The annealing mixture was heated at 95°C for 2 min and then slowly cooled to room temperature (25°C) over a period of 1 h. M13mp18 DNA (New England Biolabs, America) was chosen as circular single-stranded DNA substrate for the endonuclease assay.

Gel mobility shift analysis

50 nM of the ssDNA (oligo 4) or dsDNA (substrate A, B and D in Table 1) were incubated for 20 min at 50oC with increasing concentrations of gp17 or its mutant variants (0-2000 nM) in 20 l DNA-binding buffer (10 mM Tris-Cl, pH 8.0, 100 mM KCl, 2 mM DTT, 10% [vol/vol] glycerol). The samples were loaded onto 12% acrylamide gel and electrophoresed in 0.5 × TBE buffer for 1 h 50 min. Following electrophoresis, the gels were stained with SYBR® Gold (Life Technologies) and scanned by Typhoon FLA 7000 (GE Healthcare Life Science). The bands were quantified using ImageQuant TL (GE Healthcare Life Science).

Gel-filtration chromatography

Gel-filtration chromatography was carried out using an ÄKTA–FPLC system. Briefly, purified proteins in PBS buffer were applied individually to a Superdex 200 HR 10/300 GL column (GE Healthcare Bio- Sciences, America) equilibrated with the same buffer. The column was

45

operated at a flow rate of 0.5 ml/min, and 0.5 ml fractions were collected. The proteins were detected by measuring the absorbance at 280 nm, 254 nm and 215 nm. The column was calibrated with proteins of known molecular weight: Thyroglobulin, Bovine (669 kDa), Apoferritin, Horse Spleen (443 kDa), β-Amylase, Sweet Potato (200 kDa), Alchol Dehydrogenase, Yeast (150 kDa).

Circular dichroism (CD) spectroscopy A far-UV CD spectrum was recorded on a Jasco 810 spectropolarimeter at a wavelength range from 260 to 190 nm, a scan rate of 20 nm/min, 15 accumulations and 2 s response time, at room temperature. Samples were recorded in a quartz cuvette with a 1mm path length. A corresponding spectrum of the buffer was recorded and subtracted and the resulting spectrum smoothed (Jasco software). The spectrum was recorded of 3.85 µM protein in PBS, pH 8.0 and the ellipticity given as mean residue ellipticity [ϴ]MRW in deg*cm2*dmol-1. A temperature denaturation profile was recorded at 220 nm by heating the

app sample from 25°C to 95°C with a rate of 1°C/min, and apparent melting temperatures Tm ’s derived from fitting of the data to the following equation:

( ) ( ) ( ) ( )

( )

Where ΔH(Tm) is the enthalpy change at Tm, and ΔS(Tm) the entropy change at Tm. ssDNA annealing activity

For ssDNA annealing assay, the [32P] end-labelled 57-nt oligo 4 (1 nM) (Table S2) was incubated in annealing buffer (30 mM Tris-HCl, pH7.5, 5 mM MgCl2, 75 mM NaCl, 50 mM KCl and 1 mM DTT) with increasing amounts of gp18 at 25˚C for 5 min. The reaction was initiated by adding the unlabelled complementary oligonucleotide 5 (1.2 nM) (Table S2) and incubated at 50˚C for 15 min. The reaction was then stopped by the addition of 20 nM cold oligo 4, 0.5% [wt/vol] SDS and 1 mg/ml proteinase K. The deproteination was carried out at 25˚C for 10 min and the samples were loaded on a 10% native polyacrylamide gel and run at 100V for 1 h 20 min in 0.5 × TBE buffer. Following electrophoresis, gels were dried and exposed to X-ray film for documentation. DNA was quantified using ImageQuant TL (GE Healthcare Life Science).

46

Nuclease activity assays

The nuclease activity assays (20 l) were performed by mixing 0.08 M DNA duplex (substrate A, B or C) or 0.05 M M13mp18 DNA in reaction buffer containing 20 mM

Tris-HCl, pH 7.5, 50 mM NaCl, 5 mM MgCl2, 1 mM DTT and 10% glycerol. Reactions were initiated by the addition of 0.5 M SIRV2 gp19, and the mixtures were incubated at 50°C for the indicated length of time. Time course analyses were carried out by scaling up the reaction volume to 150 l and withdrawing 20 l aliquots at the indicated times. Reactions were terminated by the addition of 6 l stop solution (0.1% [wt/vol] bromophenol blue, 0.1% [wt/vol] xylene cyanol, 8% [vol/vol] glycerol, 1% [wt/vol] SDS, 50 mM EDTA and 2 mg/ml protease K). As a negative control, the substrates were incubated in the reaction mix in the absence of protein gp19. Samples for the exonuclease assay were analyzed by 12% polyacrylamide gel electrophoresis in 0.5 x TBE buffer and stained with SYBR® Gold (Life Technologies). Samples for the endonuclease assay were resolved in 0.7% agarose gel.

Western blot analysis

Proteins were separated in a 12.5% SDS polyacrylamide gel and transferred with transfer buffer (39 mM Glycine, 50 mM Tris pH 8.7, 0.04% SDS, 20% Methanol) onto a nitrocellulose membrane (Whattman, Germany) at 70 mA for 1h 15 min. The membrane was blocked for 1 h with PBST buffer (PBS buffer containing 0.05% Tween® 20) containing 8% milk powder and then incubated with anti-His antibody (Qiagen, Germany, 1:2000 dilution in PBS buffer containing 3% BSA). The membrane was further incubated for 1 h with a peroxidase-coupled secondary antibody (anti-mouse 1:10 000 IgG, Sigma-Aldrich, America) followed by 3 times wash with PBS buffer. An alkaline phosphate (Sigma-Aldrich, America) substrate was used for detection according to the instructions provided.

In vitro detection of protein-protein interactions

The GST-tagged gp17-I, gp17-II (1-121aa), gp17-III (1-111aa) or gp19 remaining on the GSH beads (50 g) was incubated at 25°C for 1 h with His-tagged prey proteins (90 g) in PBS buffer (0.1 µg/ml BSA, 0.1% Triton X-100). The mixture was centrifuged for 3 min at 3000 × g and the supernatant was stored for later analysis. The beads were washed several times with PBS buffer to remove unbound components and then heated in SDS loading buffer at 98°C for 10 min before SDS-PAGE in a 12.5% gel. The gel was stained with PAGE

47

blue (Sigma Aldrich, UK) and the presence of His-tagged interaction partners was detected by Western blot using anti-His antibody, as described above.

Identification of Sulfolobus proteins interacting with gp17

The gp17 fragment amplified by PCR (Table S1) was inserted between NdeI and Notl restriction sites of the Sulfolobus/E.coli shuttle vector pEXA2 (22), allowing the expression of His-tagged gp17 under the control of arabinose promoter. The constructed plasmid, as well as the empty plasmid pEXA2, were then electroporated individually into the uracil deficient competent cells (23). Single colonies of the transformants were inoculated into test tubes containing 5 ml SCV (basal medium supplemented with 0.2% sucrose, 0.2% casamino acids and 1% vitamin solution) (23), and incubated in an Innova 3100 oil-bath shaker. Large-scale culturing was performed in ACV medium (0.2% D-arabinose was substituted for sucrose) with Erlenmeyer flasks of long necks.

Purification of the His-tagged gp17 from the transformed Sulfolobus cells was performed as described above, and the eluted proteins were evaluated by 12.5% SDS-PAGE and stained with PAGE blue (Sigma Aldrich, UK). Protein bands present exclusively in the gp17-containing transformant were sliced from the gel and subjected to MALDI-TOF analysis (Alphalyse A/S, Odense, Denmark).

RESULTS

Bioinformatic analysis revealed high conservation of SIRV2 gp17, gp18 and gp19 in archaeal rudiviruses and filamentous viruses gp17, gp18 and gp19 occur as a gene cluster in the genome of SIRV2 and were shown previously to be transcribed from a single promoter generating a polycistronic transcript (10). The operon organization suggests the three genes are functionally related. However, except gp19 which was experimentally determined to possess a ssDNA endonuclease activity (21), very little is known about the function of gp17 and gp18. The crystal structure of gp17 homolog encoded by SIRV1 has been resolved (24), but no functional insight is available. Although a weak similarity between a limited part of gp18 and bacterial ATPase domains of Lon proteases was described (15), a tertiary structure prediction using the threading program Phyre 2 (25) suggested a different function. About two thirds of gp18 sequence

48

(residues 140 - 430) matched with high confidence (98.8%) to MCM homolog 2 (c3f8tA) from Methanopyrus kandleri, suggesting that gp18 might be a hexameric helicase.

The three genes are conserved in all rudiviruses and in one of the filamentous viruses, AFV1 (Fig. 1). The amino acid (aa) sequence similarities range between 36 - 96% for gp17 homologs, 51 - 100% for gp18 homologs and 51 – 98% for gp19 homologs. Although no significant sequence similarity was detected between SIRV2 gp19 and the other filamentous viral genomes, a putative nuclease is clearly encoded by the latter (except AFV2) and it belongs to the Cas4 superfamily similar to SIRV2 gp19 (6;26). Interestingly, a highly conserved gene encoding a putative helicase is found upstream of the putative nuclease gene in all filamentous viral genomes. Upstream of the putative helicase gene is another highly conserved gene encoding a 79 aa hypothetical protein showing no sequence similarity to SIRV2 gp17 (Fig. 1). It is not clear whether this small gene is functionally related to SIRV2 gp17, while homologs or analogs of SIRV2 gp18 and gp19 are present in almost all rudiviral and filamentous viral genomes. Genome comparison of all the rudiviruses and filamentous viruses using the Mutagen program (27) revealed that SIRV2 gp17, gp18 and gp19 constitute the only conserved gene cluster in the archaeal linear viruses (Fig. S1 and Table S3). Thus, the three genes appear important for both viral families. gp17 is a single-stranded DNA binding protein

Structural features of SIRV2 gp17. To gain insights into the functions of the gene cluster, we first examined the structure of gp17, which forms a dimer in the crystals (Protein Data Bank identifier [ID] 2X5T)(24)(Fig. S2A). Although it doesn’t show obvious structural similarity to any known domains present in Protein Data Bank (PDB), the electrostatics and the shape of the molecule indicate a DNA binding activity. It is dominated by basic (blue) residues on the concave side (Fig. S2C) and by acidic (red) residues on the convex side (Fig. S2D). The concave side fits well as a DNA straddling pocket. Two arginine side chains point down in the middle of the arch. The dimer binds a sulphate group, which may reflect its affinity towards phosphates.

The total length of gp17 is 131 residues, and structural information is missing for 38 residues at the C-terminus. In line with this lack of structural information, analysis of the sequence with protein disorder predictors revealed a potential intrinsically disordered C-terminus of about 35 residues (28-30) (Fig. S3). Analysis of gp17 homologs encoded by

49

other rudiviruses and AFV1 revealed intrinsically disordered C-terminus of similar size (data not shown), indicating the importance of the disorder in this domain for the function of the protein.

ssDNA binding activity of gp17. gp17 was amplified from the SIRV2 genome and cloned into E. coli vector pET30a. The C-terminally His-tagged protein was purified from E.coli to homogeneity (Fig. S4) and tested for binding activity to different DNA substrates. As shown in Fig. 2A, gp17 binds to substrates that are either ssDNA or dsDNA containing a single or double flaps, whereas no binding to the blunt ended dsDNA was detected with the same range of protein concentrations. The same result was obtained when ssDNA and blunt ended dsDNA were mixed with equal molar concentration in the reaction, where almost all ssDNA, but no or very little dsDNA, were shifted in the presence of 130 nM gp17 (Fig. 2B). At higher concentrations of gp17, no free ssDNA is available, and the protein exhibited binding to the dsDNA, albeit with a much lower affinity. At a gp17 concentration of 3.9 µM, a significant amount of the dsDNA still remains unshifted, indicating that the affinity of gp17 towards ssDNA is at least 30 times higher than to dsDNA.

A few positively charged residues forming a U-shaped binding channel on the gp17 dimer are crucial for its ssDNA binding activity

To identify essential elements of the ssDNA-binding domain of gp17, we first aligned the sequences of gp17 homologs and identified 3 fully conserved positively charged residues, R60, K61 and K82 (Fig.S5). R60 and K61 are located in a loop at the central cleft of the concave side, while K82 is found on the surface of the convex side (Fig. 3A). The three residues were mutated individually into alanine and the binding affinity of the mutant proteins to ssDNA was compared to that of the wild-type gp17. Whereas the K82A variant exhibited a similar level of binding affinity as the WT gp17, a 2 and a 5 fold reduction in binding affinity was observed, respectively, for the R60A and the K61A variants. Interestingly, simultaneous mutation of R60 and K61 to alanine abolished almost completely its ssDNA binding activity (Fig. 3B).

Four other positively charged residues, R24, K27, K29 and R33, are relatively conserved in the rudiviruses (Fig S5), and the corresponding residue of SIRV2 R33 in AFV1 ORF135 (K32) is also positively charged. Therefore the four residues were mutated to alanine either individually or simultaneously. While the double mutant R24K27 and the triple

50

mutant R24K27K29 showed very little or only a mild reduction in the binding affinity (Fig. S6), the R33 mutant demonstrated the most profound effect within the tested single mutants, with an 8 fold drop of the binding activity observed (Fig. 3B and 3C).

The above experiments demonstrated that R33, R60 and K61 are important for the ssDNA binding whereas R24, K27, K29 and K82 are less or not important. By examining the location and the orientation of the residues, it is obvious that the side chains of the former residues point to the central cleft and those of the latter residues point to the outer surface (Fig. 3A). It appears that the residues important for DNA binding (R33, R60, K61) form a positively charged and U-shaped structure in the gp17 dimer, thus straddling on ssDNA and causing bending of the ssDNA (Fig. 3D).

Within the U-shaped path another relatively conserved residue, H54, has a positive charge (Fig. S5) and was thus mutated to alanine to test its possible contribution to ssDNA binding. In line with its charge, conservation and location in the protein, H54 appears also important for ssDNA binding, as the H54A mutant demonstrated a 4 fold drop of binding activity compared with the WT protein (Fig. 3B, 3C and 3D).

Purification, refolding and stability of gp18

To characterize gp18 biochemically, a certain amount of soluble protein was needed. As gp18 couldn’t be cloned into Sulfolobus (see below) due to its toxicity and as recombinant expression in E.coli resulted in the formation of inclusion bodies, a denaturation and refolding strategy was employed to purify the His-tagged gp18 from E.coli. Following cell lysis, the inclusion bodies were pelleted and dissolved in 8 M urea (Fig. 4A lane 3), and gp18-His was purified using Ni-NTA-agarose beads. The denatured gp18-His was then refolded in L-Arginine buffer (31).

The protein appeared refolded properly as it remained soluble in the solution after 20 min incubation at 70°C (Fig. 4A lane 4). To assess the fold integrity of gp18 after recombinant production and refolding, the final preparation was subjected to structure analyses by CD spectroscopy. The far-UV CD spectrum recorded at room temperature revealed distinct negative molar ellipticity with minima at 218 and 208 nm, strongly indicating that the protein is folded with content of both α-helices and β-strands (Fig. 4B). Additionally, a temperature denaturation monitored at 220 nm showed that the protein was

app stable with two cooperative transitions, one with an apparent melting temperature (Tm ) of

51

app ~60°C and a major, highly cooperative transition with a Tm of 91°C, ΔH(Tm) = -469 ± 15 kJ/mol and ΔS(Tm) = -1.29 ± 0.04 kJ/mol (Fig. 4C). Because of the extreme stability of the protein, the post transition was not perfectly revealed at the current experimental conditions, and hence the thermodynamic parameters are approximations. However, conclusively, recombinantly produced gp18 was highly stable, cooperatively folded and with content of both α-helices and β-strands.

The oligomerization status of the refolded gp18 was analysed by gel filtration chromatography, which resulted in the formation of a broad peak containing two “shoulders” with a total elution volume of 23.6 ml at a flow rate of 0.5 ml/min (Fig. 4D). The elute volume of the main peak (labelled 1 in Fig. 4D) was between 9.85 and 10.60 ml and those of the two “shoulders” were 8.05 ml and 12.75 ml, respectively. Assuming that gp18 has a shape and partial specific volume similar to those of standard proteins, the molecular mass of the main peak is estimated to be between 685 kDa and 458 kDa and that of the two “shoulders” to be Vo and 147 kDa, respectively, calculated from a standard linear regression equation,

Kav=-0.2976(logMW)+1.8388 (Fig.S7). Since the molecular weight of the monomeric gp18 is 50.36 kDa, the protein was refolded as a series of oligomers from trimmers, nonamers to dodecamers. The molecular mass of the main top (560.6 kDa) ranges from nonamers to dodecamers, which might be the functional folds of gp18. gp18 stimulates the annealing of complementary oligonucleotides

Since structural prediction suggested that gp18 could be a hexameric helicase such as MCM, an essential protein for the initiation and elongation phases of DNA replication (32), we performed helicase assays. However, no helicase activity was detected in spite of multiple trials with different nucleic acid substrates and varied experimental conditions with different metal ions, NTPs, and temperatures (data not shown). Surprisingly, during the helicase assays, we found that instead of unwinding dsDNA, gp18 seemed to be able to increase the dsDNA yield from two complementary oligonucleotides.

To determine whether gp18 catalyzes ssDNA annealing, a [32P]-labelled 57-nt oligonucleotide (oligo-4 in Table S2) was first mixed with gp18. The annealing reactions were initiated by the addition of the complementary strand (oligo-5) and stopped by a 20-fold excess of the unlabeled oligo-4. After deproteination, the reaction products were resolved on a native PAGE gel. gp18-mediated annealing was dependent upon protein

52

concentration, with the reaction being most efficient at 200 nM of gp18 under the experimental condition (Fig. 5A, lane 5), also supporting the oligomeric structure of the protein. Moreover, the efficiency of annealing was not changed when ATP was excluded from the reaction (Fig. 5A, lane 6), suggesting that ATP hydrolysis is not needed in the catalyzed process.

As shown in the left panel of Fig. 5B, spontaneous annealing between the two oligonucleotides occurred slowly in a time-dependent manner with only 40% of the ssDNA annealed after 4 minutes incubation (Fig. 5C). The annealing process was drastically accelerated by gp18 (right panel of Fig. 5B). The oligonucleotides were almost completely annealed to form the slow-migrating dsDNA after 4 minutes of incubation in the presence of gp18 (Fig. 5B and 5C). gp19 demonstrates both ssDNA endonuclease and 5´-3´ exonuclease activities

Although previously demonstrated as a ssDNA endonuclease (21), gp19 shares sequence similarity with different CRISPR-associated Cas4 proteins (26), which possess metal-dependent endonuclease and 5´→3´exonuclease activities against ssDNA (33). We therefore examined the possible exonuclease activity of gp19 using DNA substrates of different structures. As shown in Fig. 6A, the migration of the blunt–end duplex DNA did not change upon the addition of gp19, indicating that it is not a substrate of gp19. While the 3´-flap duplex DNA remained unchanged as the blunt-ended DNA, the 5´-flap duplex DNA was cleaved with the final product having the same size of the blunt-ended DNA. It appeared that gp19 initiated cleavage from the 5´ single-strand end and stopped at the single strand and double strand junction (Fig. 6B). This indicates that gp19 has the 5´-3´ ssDNA exonuclease activity.

To confirm the ssDNA endonuclease activity of gp19, the circular ssDNA M13mp18 was tested with the same reaction buffer. As shown in Fig. 6C, the incubation with gp19 led to slow degradation of the circular ssDNA. These results confirm that SIRV2 gp19 possesses both 5´→3´exonuclease activity and endonuclease activity against ssDNA.

Interactions between gp17 and gp18 and between gp18 and gp19

Given that gp17, gp18 and gp19 all work on the same type of substrate, ssDNA, it appeared possible that the three proteins interact with one another. Since the removal of the last 10

53

residues showed little effect on the DNA binding activity of gp17 (Fig. S6), the intrinsically disordered C-terminus is possibly involved in other functions such as protein-protein interactions as demonstrated for the disordered C-terminus of bacterial SSB proteins (34). Therefore, we first tested its possible interactions with gp18 and gp19.

gp17 and its C-terminally truncated variants were expressed as GST fusion proteins and purified separately (Fig.7A). Individual GST fusion proteins were incubated with His-tagged gp18, and immobilized on GSH beads. After centrifugation, the beads were washed and boiled in SDS buffer before loading on SDS gel for Western blotting. Western hybridization using His-tag antibody revealed the presence of gp18 on the GSH beads with immobilized wild type gp17 protein, indicative of the interaction between gp17 and gp18 (Fig. 7B). However, no interaction was detected between gp18 and the two gp17 variants with 10 and 20 C- terminal residues removed, respectively (Fig. 7C). These results demonstrate that gp17 interacts with gp18 and the C-terminal disordered domain of gp17 is essential for the interaction.

The same method was applied to test possible interactions between GST-tagged gp17 and His-tagged gp19, and between GST-tagged gp19 and His-tagged gp17, none of which showed positive results (data not shown). While no interaction was detected between gp17 and gp19, the GST-tagged gp19 retained a small amount of His-tagged gp18 on the GSH beads (Fig. 7B), demonstrating a weak interaction between gp18 and gp19. gp17 binds to two Sulfolobus host proteins ssDNA binding proteins are essential for protecting ssDNA and recruiting specific ssDNA-processing proteins. In bacteria, SSBs were found to interact with more than a dozen different proteins involved in DNA replication, recombination and repair (34). To identify possible interactions with other proteins, gp17 was cloned into the E.coli/Sulfolobus shuttle vector pEXA2 under the control of arabinose promoter (22) and expressed in Sulfolobus. By Ni-NTA-Agarose chromatography the His-tagged gp17 was co-purified with two large proteins, of about 60 and 150 kDa, respectively (Fig. 7D). The absence of the two bands in proteins purified from the control cells transformed with empty pEXA2 supported that they were pulled-down specifically by gp17. Western blot hybridization using His-tag antibody revealed a single band with the expected size of gp17-His, and the two large bands were thus not oligomers of gp17-His (Fig.7E).

54

The two bands were sliced from the gel and identified by MALDI-TOF analysis. Band 1 contained a hypothetical protein encoded by SSO2277 with a theoretical mass of 57 kDa, carrying an ATPase domain. Band 2 was identified to be reverse gyrase from S. solfataricus P2 (SSO0422) with a mass of 142 kDa (Table S4). The same procedure was repeated with SIRV2 infected transformants and revealed again the same results (data not shown). No viral proteins such as gp18 were co-purified with gp17, which could be due to low expression of gp18, as demonstrated previously by microarray analysis (12).

We attempted to clone gp18 and gp19 individually into Sulfolobus using pEXA2 as cloning vector. Whereas gp18 was shown to be highly toxic and couldn’t be transformed into Sulfolobus, overexpression of gp19 caused growth retardation of the transformant, and no host proteins were identified to interact with gp19 (data not shown).

DISCUSSION

Single stranded DNA binding proteins are ubiquitous across all three domains of life and are found in many viruses playing essential roles in genome maintenance, DNA replication, recombination, repair and transcription. They can coat, protect and remove secondary structures of the ssDNA intermediates. Besides, some specific ssDNA-processing proteins are recruited and coordinated by ssDNA binding proteins during DNA metabolism pathways (35-37). In spite of high sequence, structural and functional divergence, almost all classical ssDNA binding proteins contain one of the following four structural topologies: oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, K homology (KH) domains, RNA recognition motifs (RRMs), and whirly domains (38). Recently a group of hyperthermophilic archaeal organisms were found to lack a classical ssDNA binding protein and instead to harbour a distinct ssDNA binding protein termed ThermoDBP (39). The ssDNA binding protein encoded by SIRV2 gp17 differs in structure from the classical ssDNA binding proteins as well as from the ThermoDBPs, and thus constitutes a novel non-canonical ssDNA binding protein.

Single strand annealing activity has been detected in different proteins including some helicases and recombinases encoded by cellular life and by some viruses (reviewed by (40). In many of the helicases containing annealing activity, a separate protein domain distinct from the helicase domain is responsible for the annealing activity (40). Remarkably, a helicase domain-containing protein, HARP, was recently discovered to possess annealing,

55

but no unwinding, activity (41). HARP binds to the ssDNA binding protein RPA and anneals RPA-coated complementary ssDNA. Mutations in HARP are associated with Schimke Immuno-Osseous Dysplasis (SIOD) disease and the defects in the annealing activity of two HARP mutants correlate with the severity of the disease (41). Together with AH2, another protein with similar features (42), HARP was termed annealing helicase. In this study, the annealing activity was clearly demonstrated for the SIRV2 gp18 protein. The failure of detecting the helicase activity, which was predicted by structural modelling of the gp18 sequence, could be due to the lack of proper experimental conditions or possible mask of helicase activity by the stronger annealing activity. A third possibility is that gp18 carries no helicase activity, as demonstrated for the annealing helicases. While only the structural modelling revealed a connection between SIRV2 gp18 and a MCM helicase, a high sequence similarity to Cas3 and other helicases was clearly detected by BlastP searches of the gp18 analogues encoded in the genomes of most filamentous viruses (Fig. 1). Interestingly, the E. coli Cas3 was found to possess both helicase and annealing activities (43).

To better understand the function of the entire gene operon, the protein product of the third gene, gp19, was further characterized in this study which revealed a 5’-3’ ssDNA exonuclease activity, in addition to the previously demonstrated ssDNA endonuclease activity (Fig. 6 and Garder et al., 2011b). The operonic or clustered organization of the three genes in rudi- and filamentous viruses (Fig. 1) and the observed interactions between their protein products (Fig. 7) strongly suggest their close cooperation in a same process(es) involving ssDNA. The SIRV2 genome replication study by different approaches demonstrated that SIRV2 forms ssDNA intermediates larger than a single genome size, and large concatemers are abundant during the replication process (Martinez-Alvarez et al., in preparation). This requires, first of all, abundant ssDNA binding protein to protect the ssDNA intermediates and the highly expressed gp17 (12) may fulfil this requirement. To mature into dsDNA monomers, the long ssDNA concatemers must anneal between the two complementary strands, which could be facilitated by gp18. Subsequent nicking by a ssDNA endonuclease and ligation by an unknown ligase would produce a mature dsDNA genome. Through protein-protein interactions, gp17 can recruit gp18 to facilitate ssDNA annealing whereas gp19 can be recruited by gp18 to perform the final cleavage. In support of this scenario, gp17 was found to be still present at high amount at the late stage of SIRV2 life

56

cycle, together with the tail-fiber protein of SIRV2 virions (11). Thus, it is very likely that the gene operon is involved in genome maturation of SIRV2 replicative intermediates.

Another common and interesting feature shared by the rudiviruses and filamentous viruses is the presence of multiple 12 bp insertion/deletions (indels) in their genomes, revealed by sequence alignment between closely related viral genomes and between homologous genes (6;44). In the latter case where the nucleotide sequences diverged too much to be aligned, amino acid sequence alignment between homologs allowed the detection of a single or multiple 4 residue indels. Given the proposed function of Cas4 in CRISPR spacer acquisition and the fact that gp19 belongs to the Cas4 nuclease superfamily (26;33), it is possible that gp19 is involved in the generation of the 12 bp indels and the annealing activity of gp18 fits well with both insertion and deletion scenarios. Following strand-displacement replication as proposed for both AFV1 (45) and SIRV2 (19) unpublished data from Martinez-Alvarez et al.), ssDNA bubbles may arise frequently and spontaneously during genome maturation. Repair of such structures involving ssDNA binding protein (gp17), annealing protein (gp18) and ssDNA nuclease (gp19) could in principle produce either insertions or deletions.

A third possible function of the gene operon is recombination involved in general repair or replication initiation, which has been proved important for many viruses (e.g. T4 as in (46). After dsDNA unwinding by a helicase, which remains to be identified in this case, ssDNA nuclease, binding and annealing activities are all needed in the classical recombination processes (47) and the identified functions of the three proteins fit well with the scenario. In support of this, Phyre2 structural modelling of SSO2277, a Sulfolobus protein interacting with gp17 (Fig. 7D) and annotated as hypothetical, revealed a good match (99.9% confidence over half of the protein) with proteins of the family RecF, RecN, Rad50 etc (data not shown). The latter proteins are involved in recombination (48).

In conclusion, this is the first study providing the functional characterization of an entire gene operon conserved in archaeal rudiviruses and filamentous viruses. Due to low or no sequence homology with characterized proteins, the majority of archaeal viral genes remain hypothetical. This had hindered the progress of the archaeal virology field. The results from this study will therefore contribute to better understanding of the novel viruses infecting Archaea, the third domain of life. More importantly, the sequence and/or structural divergence of the three proteins from previously characterized ssDNA binding, annealing

57

and nuclease proteins not only add novelty to, but also provide important information for evolutionary studies of these proteins, which are nearly ubiquitous from bacteria, archaea to eukaryotes including humans.

FUNDING

This work was supported by the European Union Frame Work 7 program 265933. Y.G. received a stipend from China Scholarship Council.

REFERENCES

1. Pietila,M.K., Roine,E., Paulin,L., Kalkkinen,N. and Bamford,D.H. (2009) An ssDNA virus infecting archaea: a new lineage of viruses with a membrane envelope. Mol. Microbiol., 72, 307-319.

2. Mochizuki,T., Krupovic,M., Pehau-Arnaudet,G., Sako,Y., Forterre,P. and Prangishvili,D. (2012) Archaeal virus with exceptional virion architecture and the largest single-stranded DNA genome. Proc. Natl. Acad. Sci. U. S. A, 109, 13386-13391.

3. Pietila,M.K., Demina,T.A., Atanasova,N.S., Oksanen,H.M. and Bamford,D.H. (2014) Archaeal viruses and bacteriophages: comparisons and contrasts. Trends Microbiol., 22, 334-344.

4. Prangishvili,D., Arnold,H.P., Gotz,D., Ziese,U., Holz,I., Kristjansson,J.K. and Zillig,W. (1999) A novel virus family, the Rudiviridae: Structure, virus-host interactions and genome variability of the sulfolobus viruses SIRV1 and SIRV2. Genetics, 152, 1387-1396.

5. Blum,H., Zillig,W., Mallok,S., Domdey,H. and Prangishvili,D. (2001) The genome of the archaeal virus SIRV1 has features in common with genomes of eukaryal viruses. Virology, 281, 6-9.

6. Vestergaard,G., Aramayo,R., Basta,T., Haring,M., Peng,X., Brugger,K., Chen,L., Rachel,R., Boisset,N., Garrett,R.A. et al. (2008) Structure of the acidianus filamentous virus 3 and comparative genomics of related archaeal lipothrixviruses. J. Virol., 82, 371-381.

7. Vestergaard,G., Haring,M., Peng,X., Rachel,R., Garrett,R.A. and Prangishvili,D. (2005) A novel rudivirus, ARV1, of the hyperthermophilic archaeal genus Acidianus. Virology, 336, 83-92.

8. Servin-Garciduenas,L.E., Peng,X., Garrett,R.A. and Martinez-Romero,E. (2013) Genome sequence of a novel archaeal rudivirus recovered from a mexican hot spring. Genome Announc., 1.

58

9. Peng,X., Blum,H., She,Q., Mallok,S., Brugger,K., Garrett,R.A., Zillig,W. and Prangishvili,D. (2001) Sequences and replication of genomes of the archaeal rudiviruses SIRV1 and SIRV2: relationships to the archaeal lipothrixvirus SIFV and some eukaryal viruses. Virology, 291, 226-234.

10. Kessler,A., Brinkman,A.B., van der Oost,J. and Prangishvili,D. (2004) Transcription of the rod-shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon sulfolobus. J. Bacteriol., 186, 7745-7753.

11. Quax,T.E., Krupovic,M., Lucas,S., Forterre,P. and Prangishvili,D. (2010) The Sulfolobus rod-shaped virus 2 encodes a prominent structural component of the unique virion release system in Archaea. Virology, 404, 1-4.

12. Okutan,E., Deng,L., Mirlashari,S., Uldahl,K., Halim,M., Liu,C., Garrett,R.A., She,Q. and Peng,X. (2013) Novel insights into gene regulation of the rudivirus SIRV2 infecting Sulfolobus cells. RNA. Biol., 10, 875-885.

13. Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y. and Peng,X. (2014) Unveiling cell surface and type IV secretion proteins responsible for archaeal rudivirus entry. J. Virol., 88, 10264-10268.

14. Prangishvili,D., Garrett,R.A. and Koonin,E.V. (2006) Evolutionary genomics of archaeal viruses: unique viral genomes in the third domain of life. Virus Res., 117, 52-67.

15. Prangishvili,D., Koonin,E.V. and Krupovic,M. (2013) Genomics and biology of Rudiviruses, a model for the study of virus-host interactions in Archaea. Biochem. Soc. Trans., 41, 443-450.

16. Vestergaard,G., Shah,S.A., Bize,A., Reitberger,W., Reuter,M., Phan,H., Briegel,A., Rachel,R., Garrett,R.A. and Prangishvili,D. (2008) Stygiolobus rod-shaped virus and the interplay of crenarchaeal rudiviruses with the CRISPR antiviral system. J. Bacteriol., 190, 6837-6845.

17. Guilliere,F., Peixeiro,N., Kessler,A., Raynal,B., Desnoues,N., Keller,J., Delepierre,M., Prangishvili,D., Sezonov,G. and Guijarro,J.I. (2009) Structure, function, and targets of the transcriptional regulator SvtR from the hyperthermophilic archaeal virus SIRV1. J. Biol. Chem., 284, 22222-22237.

18. Quax,T.E., Voet,M., Sismeiro,O., Dillies,M.A., Jagla,B., Coppee,J.Y., Sezonov,G., Forterre,P., van der Oost,J., Lavigne,R. et al. (2013) Massive activation of archaeal defense genes during viral infection. J. Virol., 87, 8419-8428.

19. Oke,M., Kerou,M., Liu,H., Peng,X., Garrett,R.A., Prangishvili,D., Naismith,J.H. and White,M.F. (2011) A dimeric Rep protein initiates replication of a linear archaeal virus

59

genome: implications for the Rep mechanism and viral replication. J. Virol., 85, 925-931.

20. Gardner,A.F., Guan,C. and Jack,W.E. (2011) Biochemical characterization of a structure-specific resolving enzyme from Sulfolobus islandicus rod-shaped virus 2. PLoS. One., 6, e23668.

21. Gardner,A.F., Prangishvili,D. and Jack,W.E. (2011) Characterization of Sulfolobus islandicus rod-shaped virus 2 gp19, a single-strand specific endonuclease. Extremophiles., 15, 619-624.

22. Gudbergsdottir,S., Deng,L., Chen,Z., Jensen,J.V., Jensen,L.R., She,Q. and Garrett,R.A. (2011) Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with vector-borne viral and plasmid genes and protospacers. Mol. Microbiol., 79, 35-49.

23. Deng,L., Zhu,H., Chen,Z., Liang,Y.X. and She,Q. (2009) Unmarked gene deletion and host-vector system for the hyperthermophilic crenarchaeon Sulfolobus islandicus. Extremophiles., 13, 735-746.

24. Oke,M., Carter,L.G., Johnson,K.A., Liu,H., McMahon,S.A., Yan,X., Kerou,M., Weikart,N.D., Kadi,N., Sheikh,M.A. et al. (2010) The Scottish Structural Proteomics Facility: targets, methods and outputs. J. Struct. Funct. Genomics, 11, 167-180.

25. Kelley,L.A. and Sternberg,M.J. (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc., 4, 363-371.

26. Zhang,J., Kasciukovic,T. and White,M.F. (2012) The CRISPR associated protein Cas4 Is a 5' to 3' DNA exonuclease with an iron-sulfur cluster. PLoS. One., 7, e47232.

27. Brugger,K., Redder,P. and Skovgaard,M. (2003) MUTAGEN: multi-user tool for annotating genomes. Bioinformatics., 19, 2480-2481.

28. Xue,B., Dunbrack,R.L., Williams,R.W., Dunker,A.K. and Uversky,V.N. (2010) PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta, 1804, 996-1010.

29. Dosztanyi,Z., Csizmok,V., Tompa,P. and Simon,I. (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics., 21, 3433-3434.

30. Munoz,V. and Serrano,L. (1994) Elucidating the folding problem of helical peptides using empirical parameters. Nat. Struct. Biol., 1, 399-409.

60

31. Kawano,S., Iyaguchi,D., Okada,C., Sasaki,Y. and Toyota,E. (2013) Expression, purification, and refolding of active recombinant human E-selectin lectin and EGF domains in Escherichia coli. Protein J., 32, 386-391.

32. Chong,J.P., Hayashi,M.K., Simon,M.N., Xu,R.M. and Stillman,B. (2000) A double-hexamer archaeal minichromosome maintenance protein is an ATP-dependent DNA helicase. Proc. Natl. Acad. Sci. U. S. A, 97, 1530-1535.

33. Lemak,S., Beloglazova,N., Nocek,B., Skarina,T., Flick,R., Brown,G., Popovic,A., Joachimiak,A., Savchenko,A. and Yakunin,A.F. (2013) Toroidal structure and DNA cleavage by the CRISPR-associated [4Fe-4S] cluster containing Cas4 nuclease SSO0001 from Sulfolobus solfataricus. J. Am. Chem. Soc., 135, 17476-17487.

34. Shereda,R.D., Kozlov,A.G., Lohman,T.M., Cox,M.M. and Keck,J.L. (2008) SSB as an organizer/mobilizer of genome maintenance complexes. Crit Rev. Biochem. Mol. Biol., 43, 289-318.

35. Bochkarev,A. and Bochkareva,E. (2004) From RPA to BRCA2: lessons from single-stranded DNA binding by the OB-fold. Curr. Opin. Struct. Biol., 14, 36-42.

36. Suck,D. (1997) Common fold, common function, common origin? Nat. Struct. Biol., 4, 161-165.

37. Theobald,D.L., Mitton-Fry,R.M. and Wuttke,D.S. (2003) Nucleic acid recognition by OB-fold proteins. Annu. Rev. Biophys. Biomol. Struct., 32, 115-133.

38. Dickey,T.H., Altschuler,S.E. and Wuttke,D.S. (2013) Single-stranded DNA-binding proteins: multiple domains for multiple functions. Structure., 21, 1074-1084.

39. Paytubi,S., McMahon,S.A., Graham,S., Liu,H., Botting,C.H., Makarova,K.S., Koonin,E.V., Naismith,J.H. and White,M.F. (2012) Displacement of the canonical single-stranded DNA-binding protein in the Thermoproteales. Proc. Natl. Acad. Sci. U. S. A, 109, E398-E405.

40. Wu,Y. (2012) Unwinding and rewinding: double faces of helicase? J. Nucleic Acids, 2012, 140601.

41. Yusufzai,T. and Kadonaga,J.T. (2008) HARP is an ATP-driven annealing helicase. Science, 322, 748-750.

42. Yusufzai,T. and Kadonaga,J.T. (2010) Annealing helicase 2 (AH2), a DNA-rewinding motor with an HNH motif. Proc. Natl. Acad. Sci. U. S. A, 107, 20970-20973.

43. Howard,J.A., Delmas,S., Ivancic-Bace,I. and Bolt,E.L. (2011) Helicase dissociation and annealing of RNA-DNA hybrids by Escherichia coli Cas3 protein. Biochem. J., 439, 85-95.

61

44. Peng,X., Kessler,A., Phan,H., Garrett,R.A. and Prangishvili,D. (2004) Multiple variants of the archaeal DNA rudivirus SIRV1 in a single host and a novel mechanism of genomic variation. Mol. Microbiol., 54, 366-375.

45. Pina,M., Basta,T., Quax,T.E., Joubert,A., Baconnais,S., Cortez,D., Lambert,S., Le,C.E., Bell,S.D., Forterre,P. et al. (2014) Unique genome replication mechanism of the archaeal virus AFV1. Mol. Microbiol., 92, 1313-1325.

46. Mosig,G. (1998) Recombination and recombination-dependent DNA replication in bacteriophage T4. Annu. Rev. Genet., 32, 379-413.

47. Kowalczykowski,S.C. (2000) Initiation of genetic recombination and recombination-dependent replication. Trends Biochem. Sci., 25, 156-165.

48. Kowalczykowski,S.C., Dixon,D.A., Eggleston,A.K., Lauder,S.D. and Rehrauer,W.M. (1994) Biochemistry of homologous recombination in Escherichia coli. Microbiol. Rev., 58, 401-465.

TABLE AND FIGURE LEGENDS

Table 1. Structures of the DNA substrates used in this study.

Figure 1. Organization of SIRV2 gp17, gp18, gp19 and their homologs in the genomes of archaeal linear viruses. The pattern codes are as follows:

SIRV2 gp17 homolog; SIRV2 gp18 homolog or analog; SIRV2 gp19 homolog; conserved gene upstream of SIRV2 gp18 homolog in most filamentous viruses.

Figure 2. gp17 binds to ssDNA. (A) gp17 binds to DNA substrates with either a 23-nt 5´-ssDNA flap or Y-shaped double-flaps (23 nt), but not to blunt-ended dsDNA. F, free DNA; C, DNA-protein complex. (B) gp17 shows a high preference towards ssDNA than to dsDNA. The concentration (nM) of gp17 is indicated on the top of the gel.

Figure 3. Mutagenesis of gp17 revealed a U-shaped binding path for ssDNA. (A) The structure of a monomer of gp17 homolog with the conserved positive charged residues labeled in stick model. (B) Gel retardation assays using gp17 WT and mutant proteins and ssDNA. Protein concentrations are labeled on the top of each gel. DNA forms are indicated by a short line (free) or a line covered with a circle (DNA-protein complex). (C) Quantification of the ssDNA binding activity of different gp17 mutants based on the results shown in B. (D)

62

Binding path of ssDNA on gp17. The residues contributing to ssDNA binding are labeled in sticks.

Figure 4. Characterization of the refolded recombinant gp18. (A) Purification and refolding of gp18 from inclusion bodies. Lane 1, supernatant of E.coli cells expressing gp18; lane 2, pellet of the lysate; lane 3, pellet protein dissolved in 8 M urea; lane 4, supernatant of gp18 after purification, refolding, heating at 70°C for 20 min and centrifugation. (B) Far-UV CD spectrum of refolded gp18. (C) Temperature denaturation of gp18 followed at 220 nm. (D) Gel-filtration chromatographic analysis of the purified and refolded gp18 protein. The main peak and the two shoulders are labeled.

Figure 5. gp18 stimulates annealing of the complementary oligonucleotides. (A) Concentration-dependent enhancement of oligonucleotide annealing by gp18. The 32P-labeled 57-mer oligo-4 (1nM) and the complementary oligo-5 (1.2 nM) were incubated in the absence (lane 1) or presence (lanes 2 to 6) of gp18. gp18 concentrations were indicated on the top of the gel. The presence (lanes 2 to 5) or absence (lane 6) of ATP is also indicated. (B) Time course of gp18-enhanced single-strand annealing. Left panel, annealing without gp18; right panel, annealing in the presence of 200 nM gp18. (C) Quantification of annealed DNA in the absence or presence of gp18. The percentages were calculated based on the intensities of bands in B.

Figure 6. Nuclease activities of SIRV2 gp19. (A) Selective cleavage of DNA substrate with a 5´ ssDNA flap by gp19. (B) Gradual cleavage of ssDNA. (C) Endonuclease activity of gp19. The circular ssDNA of M13mp18 was incubated at 50 °C with or without the addition of 0.5 μM gp19, and the incubation time is given on top of the gel.

Figure 7. Interactions between gp17, gp18 and gp19 and between gp17 and Sulfolobus host proteins. (A) Schematic presentation of GST-tagged gp17 mutants. (B) Pull-down assays by GST affinity chromatography. The purified and refolded His-tagged gp18 (labeled as P for prey) was incubated with GST-tagged gp17 or GST-tagged gp19 on GSH column for 1 h. After washing with PBS buffer, the GSH beads were boiled in the SDS loading buffer and loaded for SDS-PAGE, and the interacting protein was detected by anti-His antibody. GST protein was used as negative control. Positive controls for Western blotting were carried out using the input His-tagged gp18. (C) Interaction between GST-tagged gp17 mutants and His-tagged gp18. Pull-down assays were performed as described in B. (D) Identification of Sulfolobus proteins interacting with His-tagged gp17 overexpressed in sulfolobus sofataricus P2. Three fractions of eluted proteins from the negative control cells containing the empty pEXA2 vector (lanes 1 to 3) and from the gp17 transformant (lanes 4 to 6) were tested. The

63

identified proteins are indicated at the right side. (E) Western blot hybridization of negative control (Lane1) and gp17 protein elution (Lane 2).

Table S1. Details of the primers used in this study.

Table S2. Sequences of the oligonucleotides used as substrates in this study.

Table S3. The location, gene length and functions of the conserved gene cluster in all linear viruses.

Table S4. Mass spectrometric peptide mapping and sequencing analysis of the two pulled down proteins.

Figure S1. Genome comparison of all the rudiviruses and filamentous viruses using the Mutagen program. Conserved gene clusters are labeled with red square. Homologs are color-coded whereas white rectangles represent ORFs without homologues.

Figure S2. Structure of SIRV1 ORF1312-96 (PDB identifier [ID] 2X5T)(24). (A) Dimer structure of SIRV1 ORF1312-96 coloured in deep teal and violet purple for the two monomers. (B) Secondary structure elements of the monomer are labeled in different colors. (C) and (D), A surface representation shown on the concave side and convex side of the dimer, indicating the electrostatic potential of the putative binding interface.

Figure S3. (A) Probability of disordered gp17 aa: two different programs (IUpred and PONDR) were used for the prediction, both revealed disorder at the C-terminus. (B) Percentage of helicity of gp17.

Figure S4. Purification of SIRV2 gp17. Protein gp17 expressed and purified to homogeneity from E.coli. Lane 1-4, four elution fractions from Ni-NTA-agarose beads.

Figure S5. Alignment of SIRV2 gp17 and its homologs. Identical residues are labeled as *, and conserved positive charged residues are shaded red. Red arrows indicate the position of the β sheets, blue bars indicate the position of α helices. Residues mutated to alanine in this study are marked with black dots and numbered accordingly.

Figure S6. Gel retardation assays showing the binding of gp17 WT and some of the mutant proteins to ssDNA.

Figure S7. Standard linear regression curve. The column Superdex 200 HR 10/30 was calibrated with proteins of known molecular masses: Thyroglobulin, Bovine (669 kDa);

64

Apoferritin, Horse Spleen (443 kDa); β-Amylase, Sweet Potato (200 kDa) and Alcohol Dehydrogenase, Yeast (150 kDa).

65

Table 1. Structures of the substrates used in this study

Name Structure Oligonucleotides*

Substrate A: Blunt end duplex 1+2

Substrate B: 5´-ssDNA flap duplex 1+3

Substrate C: 3´-ssDNA flap duplex 1+4

Substrate D: Y-shaped duplex 4+5

*: The sequences of the oligonucleotides are provided in Table S2

66 Figure 1

SIRV1,2 << >>

ARV << >>

SRV << >>

AFV1 << >>

SIFV << >>

AFV3-8 << >>

AFV9 << >>

67 Figure 2

A

gp17(nM) 0 33 66 130 260 0 33 66 130 260 0 33 66 130 260

C2 C C1

F

F F

B gp17(nM) 0 66 130 260 1300 2600 3250 3900

68 Figure 3

A C 100

R29 K82

K27

ratio % gp17 wt R33 R24 50 K82A

R60A Bound H54 H54A K61A K61 R33A R60 R60A K61A

0 0 200 400 600 800 1000 1200 B Protein (nM) gp17 wt (nM) gp17 K82A (nM) gp17 R60A (nM)

0 16 33 66 133 266 400 533 666 1200 0 16 33 66 133 266 400 533 666 1200 0 16 33 66 133 266 400 533 666 1200

gp17 K61A (nM) gp17 H54A (nM) gp17 R33A (nM)

0 33 66 133 266 400 533 800 1200 2000 0 33 66 133 266 400 533 800 1200 2000 0 133 266 400 533 733 933 1200 1466 2000

gp17 R60A K61A (nM) D 0 133 266 400 533 733 933 1200 1466 2000

R33

H54

K61 R60

69 Figure 4

A B C

M 1 2 3 4

55kDa gp18

35kDa

15kDa

10kDa

D 1

Vo 2

70 Figure 5

A C

ATP + + + + + - 90 Time (min) 15 15 15 15 15 15 control 80 gp18 (nM) -- 50 100 150 200 200 gp18

70 (% ) (%

60 50 40

Annealing 30 20 10 0 0 1 2 3 4 1 2 3 4 5 6 Time (min) B

gp18 (nM) 0 0 0 0 0 gp18 (nM) 0 200 200 200 200

Time (min) 0 0.5 1 2 4 Time (min) 0 0.5 1 2 4

71

Figure 6

A 5 3 3 5 5 3 3 5 3 5 3 5 Time (min) 10 3 10 10 3 10 10 3 10 gp19 (0.5 M) – + + – + + – + +

B

Time (min) 10 2 5 8 10 gp19 (0.5M) – + + + +

5 3 3 5

5 3 3 5

C Time (min)

60 15 30 45 60 gp19 (0.5M) M – + + + +

5 kbp

1 kbp

72 Figure 7

A B

50 100 (aa) GSTgp17-I (131aa)

GSTgp17-II (121aa)

GSTgp17-III(111aa)

C

55kDa 35kDa M 55kDa

D E pEXA2-control pEXA2-gp17

M 1 2 3 4 5 6 1 2 170kDa Reverse 130kDa gyrase

70kDa Sso2277 55kDa

25kDa

gp17 gp17

15kDa

73 Table S1. Details of the primers used in this study

Expression Tag Protein Name Oligo Nr. Sequence 5´-3´ Host

1 Fw: GGCGAAAACCATATGGCCTCATTAAAACAAATAATAG gp17 wt 2 Rv: GCTTCTCGAGAAACTCCTCCTCAACTGTTTTTT

gp17(1-121aa)a 3 Rv: CGTACTCGAGTTATTTTTCTCTCGTTTTCTCTTCTT

4 1Rv: GCATACGCCTCTAGAAATTCAG gp17 K82Ab 5 1Fw: GAATTTCTAGAGGCGTATGCAG 6 1Rv: CAACTATTCTCGCTATACCT TTTATC gp17 K32Ab 7 1Fw: GGTATAGCGAGAATAGTTGTACAG 8 1Rv: CCAATTTGTTTCGCGAAATTATTC gp17 R60Ab 9 1Fw: CGCGAAACAAATTGGAATAAC 10 1Rv: CCAATTTGCGCTCTGAAATTATT gp17 K61Ab 11 1Fw: CAGAGCGCAAATTGGAATAAC 12 1Rv: GAAATTATTCTGACTCGCTATCGTC gp17 H54Ab C-HIS tag 13 1Fw: CATGACGATAGCGAGTCAGAA 14 1Rv: GTACAACTATCGCTTTTATACCTTTTA gp17 R33Ab 15 1Fw: GCGATAGTTGTACAGTTAAATGC 16 1Rv: TTGCGCCGCGAAATTATTCTG gp17 R60A K61Ab E.coli 17 1Fw: TTCGCGGCGCAAATTGGAA

18 1Rv: CGCTAAAATCGCAGACGCTATTTTATTGTTCTCTTT gp17 R24A K27Ab 19 1Fw:GCGATTTTAGCGATAAAAGGTATAAAAAGAATAGTTGTAC

gp17 R24A K27A 20 1Rv: TCTTTTTATACCCGCTATCGCTAAAATCGC AGACGC K29Ab 21 1Fw: GCG ATA GCG GGT ATA AAA AGA ATA GTT GTA CAG

22 Fw: CATTTGTTCCATATGAGTGAAAACACACAACTATTTG gp18 23 Rv: CGTACTCGAGCCATCCTCCTAAATTGCTAAATC

24 Fw: CTACCATTCATATGGTAAATATGAATTATGAAGATC gp19 25 Rv: GCGCTCGAGAAAAAGTGATATAATGCATTTTTG

26 Fw: ATCGGGATCCGCCTCATTAAAACAAATAATAG GSTgp17-I 27 Rv: CGTACTCGAGTTAAAACTCCTCCTCAACTGTTTTTT

GSTgp17-IIc 28 Rv: CGTACTCGAGTTATTTTTCTCTCGTTTTCTCTTCTT N-GST tag GSTgp17-IIIc 29 Rv: CGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC Fw:CGCGGATCCGTAAATATGAATTATGAAGATCATATAAAAGAA 30 GSTgp19 AG 31 Rv:GCCGCTCGAGTTAAAAAAGTGATATAATGCATTTTTGTTTG

34 Fw: CATTTGTTCCATATGGCCTCATTAAAACAAATAATAG gp17 35 Rv: CTCAACTAGCGGCCGCAAACTCCTCCTCAACTGTTT

Sulfolobus 36 Fw: CATTTGTTCCATATGAGTGAAAACACACAACTATTTG C-HIS tag gp18 solfataricus 37 Rv:TATTAATAGCGGCCGCCCATCCTCCTAAATTGCTAAAT

38 Fw: CATTTGTTCCATATGGTAAATATGAATTATGAAGATC gp19 39 Rv: CTCTTCTAGCGGCCGCTTAAAAAAGTGATATAATGCATTT

a The forward primer used for this construction is oligo-1. b For this construction, the front part fragment was amplified using oligo1 and 1Rv primer, and the rest part fragment was amplified using 1Fw and oligo2. Then the two fragments were used as template to amplify the whole construction by oligo-1 and oligo-2. c The forward primer used for this construction is oligo-26. 74 Table S2. Sequences of the oligonucleotides used as substrates in this study

Name Sequence (5’ to 3’)

oligo-1……………………..CGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC oligo-2………………….....GAAGAGCTAGACATGGAACAATAACTCGAGTACG oligo-3……………………..GTTATTGCATGAAAGCCCGGCTGGAAGAGCTAGACATGGAACAATAACTCGAGTACG oligo-4……………………..GTCAGTCCAAAAGTACATTATTGCGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC oligo-5……………………..GAAGAGCTAGACATGGAACAATAACTCGAGTACGGTTATTGCATGAAAGCCCGGCTG

75 Table S3. The location, gene length and functions of the conserved gene cluster in all linear viruses

Length Length Length Strain Gene1 Function Gene2 Function Gene3 Function (aa) (aa) (aa)

Hypothetical AFV3 gp10 79 gp09 593 Putative helicase gp08 203 Putative nuclease protein

Hypothetical AFV6 gp10 79 gp09 593 Putative helicase gp08 203 Putative nuclease protein

Hypothetical AFV7 gp05 79 gp04 593 Putative helicase gp03 203 Putative nuclease protein

Hypothetical AFV8 gp07 79 gp06 593 Putative helicase gp05 203 Putative nuclease protein

Putative Holiday Hypothetical junction branch Putative AFV9 gp11 79 gp10 602 gp08 203 protein migration nuclease helicase

Hypothetical Hypothetical SIFV SIFV-08 79 SIFV-07 601 Putative helicase SIFV-06 232 protein protein

Hypothetical AFV2 - gp15 425 - protein

Hypothetical Hypothetical CRISPR-associated AFV1 gp14 135 gp15 426 gp17 223 protein protein Cas4-like protein

Hypothetical Hypothetical Hypothetical ARV1 gp12 134 gp16 443 gp17 207 protein protein protein

SRV- Hypothetical SRV- Hypothetical SRV- Hypothetical SRV 138 440 199 ORF138- protein ORF440 protein ORF199 protein

Hypothetical Hypothetical Single strand SIRV2 gp17 131 gp18 436 gp19 207 protein protein nuclease

76 Table S4. Mass spectrometric peptide mapping and sequencing analysis of the two pulled down proteins

77 Figure S1

78 Figure S2

A B

C D

79 Figure S3

80 Figure S4

M 1 2 3 4

25KDa

SIRV2 gp17 15KDa

10KDa

81 Figure S5

82 Figure S6

gp17 wt gp17(R24A K27A) gp17(R24A K27A K29A) (nM) 0 33 66 130 260 (nM) 0 33 66 130 260 (nM) 0 33 66 130 260

gp17(1-121aa)

(nM) 0 33 66 130 260

83 Figure

S7

Kav 660kDa log log MW 440kDa

84

200kDa 150kDa

Manuscript II

Genome-wide binding profile of two transcription regulators of Sulfolobus solfataricus

Yang Guo, Xu Peng In preparation

85

Genome-wide binding profile of two transcription regulators of Sulfolobus solfataricus

Yang Guo and Xu Peng *

Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 CPH N. Denmark

* To whom correspondence should be addressed. Tel: +45 35322018; Fax: +45 35322128; Email: [email protected]

86

Abstract

Two transcription regulators sso2474 and sso10340 from Sulfolobus solfataricus P2 were differently regulated upon SIRV2 infection. A method similar as Chromatin immunoprecipitation combined with subsequent high-throughput sequencing (Chip-seq) was applied in this study to get into the gene composition of the two protein regulons in vivo. Mapping of the sequencing data with Sulfolobus solfataricus P2 and SIRV2 genomes demonstrated that sso2474 binds with a high affinity to virus genome, whereas sso10340 mainly binds to the host DNA. A total of 27 enriched host DNA fragments extracted from sso10340-DNA complex appeared as potential binding targets, most of which are genes involved in energy metabolism, transport, translation and amino acid metabolism. The genome-wide binding profiles presented here reveal two different kinds of regulon conditions and contribute to the knowledge expansion of the transcription regulations upon virus infection.

87

Introduction

Viruses infecting the organisms of Archaea, the third domain of life, comprise the most diverse and previously unsuspected virion morphotypes (Pina et al., 2011). In the last few years, a substantial effort was made to explore the functions of the hypothetical proteins and the viral infection life cycles (Kessler et al., 2004;Oke et al., 2011;Gardner et al., 2011;Bize et al., 2009). To date, several virus-host systems have become promising models providing a great opportunity for studying virus-host interactions.

One of the best-studied viruses in hyperthermophilic archaea is SIRV2 (Sulfolobus islandicus rod-shaped virus 2), isolated from an acidic hot spring in Iceland, belongs to Rudiviridae and share a common ancestry with the family Lipothrixviridae. Studies by transmission electron microscopy showed that SIRV2 virions specificially recognized the pilus-like filaments on the host cell surface to get adsorption (Quemin et al., 2013). On the other hand , two gene clusters, cluster sso3138 to sso3141 and cluster sso2386 and sso2387 identified from the SIRV2 resistant Sulfolobus mutants were confirmed responsible for the virus entry (Deng et al., 2014), providing first insights into its entry process. Unlike most archaeal viruses, infecting host cells with a `carrier state`, this linear non-enveloped double- stranded DNA (dsDNA) SIRV2, together with TTV1 and STIV, are lytic viruses (Bize et al., 2009;Ortmann et al., 2008;Zillig et al., 1996). The virions released from the host cell through a unique mechanism, which involvs the formation of pyramid-like protrusions, transecting the cell envelope and S-layer. At the end of the infection stage, this seven isosceles triangular faces pyramid opens up, allowing mature virions to escape from the cell (Bize et al., 2009;Quax et al., 2011). To gain better insights into the biology of virus, life cycle and their effect on the host, microarray analysis to determine the transcriptional responses of the host and the virus during the infection process could be very efficient and had been successfully applied to three archaeal viruses, the fusellovirus SSV1, the icosahedral virus STIV and the Rudiviridae virus SIRV2 (Frols et al., 2007;Ortmann et al., 2008;Okutan et al., 2013).

What we focused in this work is to investigate the host genes regulation upon SIRV2 infection. As the previous study revealed that a total of 148 host genes differently responded, and among these genes, two transcription regulators sso2474 and sso10340 were up and

88

down regulated, respectively. It is raised an interesting question as to how these two proteins regulate the corresponding genes upon the virus infection stress. Are they global regulators or just regulate their own promoters? A method similar with chip-seq was applied to this study for detection of the DNA binding sites in vivo. Combined with the gene expression analysis, we can get a first insight into the transcription regulation network between virus and host cells.

Materials and methods

Sulfolobus cultivation and plasmid construction sso2474 and sso10340 fragments were amplified from Sulfolobus solfataricus P2 genome by PCR, digested with NdeI and NotI and inserted into the similarly digested sulfolobus/E.coli shuttle vector pEXA3 (He et al., 2014), allowing the expression of His- tagged gp17 under the control of arabinose promoter. The constructed plasmid, as well as the empty plasmid pEXA3, were then electroporated individually into the uracil deficient competent cells (Deng et a.,2009), Single colonies of the transformants were inoculated into test tubes containing 5 ml SCV (basal medium supplemented with 0.2% sucrose, 0.2% casamino acids and 1% vitamin solution) (Deng et a.,2009), and incubated in an Innova 3100 oil-bath shaker. Large-scale culturing was performed in ACV medium (0.2% D- arabinose was substituted for sucrose) with Erlenmeyer flasks of long necks. When the culture OD 600 reached to 0.8, it was infected by SIRV2 at about m.o.i of 10. The cells were collected after 2.5 h virus post infection.

E. coli cells cultivation and plasmid construction

The coding sequences of sso2474 was amplified by PCR from Sulfolobus solfataricus P2 genome, digested with NdeI and XhoI and subsequently inserted into a similarly digested pET-30(a) (Novagen) expression vector. E.coli BL21 CodonPlus cells were transformed with individual plasmid construct and a single clone transformant was inoculated in LB medium containing 30 g/ml kanamycin and 25 g/ml chloramphenicol. At an optical density (OD600) of 0.4, IPTG (0.5 mM) was added to the culture and the cells were further cultured at 25oC for 12 hours.

Protein purification

The purification of His-tagged sso2474 and sso10340 either from sulfolobus or E.coli was carried out as follows, the harvested cell pellets were lysed in lysis buffer (50 mM Tris-Hcl pH 8.0 , 300 mM NaCl , 1 mM EDTA, 1% Triton X100 and 1 mM PMSF) by sonication, and different sonication time (4,6,8,10 min) was detected to minimum the size of the DNA

89

fragments bound by the proteins. Then the lysate were cleared by centrifugation at 10000× g for 20 min. Supernatant was then incubated with Ni-NTA-agarose beads(Qiagen, Germany) for 1 h at room temperature. Beads was washed three times with washing buffer (50 mM Tris-Hcl pH 8.0 , 300 mM NaCl, 40 mM Imidazol ) and protein-DNA complex were eluted with elution buffer (50 mM Tris-Hcl pH 8.0 , 300 mM NaCl, 250 mM Imidazol). The purity of the protein was evaluated by 12.5 % SDS-PAGE and staining with PAGE blue (Sigma Aldrich, UK), and the amount of DNA in the samples were detected on 0.7% Agarose gel and stained with GelRed (Biotium).

DNA extraction and high-throughput sequencing

The eluted protein-DNA complex in solution were diluted with one volume of water and treated with RNase A at room temperature for 1h. The deproteination was carried out by incubation with 2 mg/ml Protease K at 50 °C for 2 h, and 65 °C for 8 h. Then the DNA was extracted with phenol/chloroform/isoamyl (25:24:1) mixture solution, and finally was precipitated and concentrated by ethanol precipitation. Sequencing libraries with an average fragment size of 350 bp were prepared according to protocol of the ion plus fragment library kit, and sequenced in the Ion PGM™ Sequencer (Life Technology).

Reads mapping and Peak detection

The quality filtered reads were treated and aligned to genome Sulfolobus solfataricus P2 as well as SIRV2 using Bowtie, the ultrafast memory-efficient short read aligner, to align sequenced sets of short DNA reads to large genomes (Satoh and Tabunoki, 2013), and then the enriched peaks were visualized using Artemis (Carver T, etal. 2012), allowing for up to two errors per reads (insertion, deletion and/or mismatch).

Real-time Quantitative PCR qPCR reactions were performed in 10 L mixtures containing 5 L iQ SYBR Green Supermix (Bio-Rad, Cat. No. 170-8880), 1 mM primers and around 1 ng total DNA. Separate reactions were prepared for detection of reference gene and virus specific and sulfolobus solfataricus-host specific amplicons. The mixtures were prepared in duplicates in 96-well microliter PCR plates (Bio-Rad Laboratories), sealed with an adhesive cover (Bio- Rad Laboratories) and worked on the CFX96 Real-Time Detection System (Bio-Rad Laboratories) following this uniform cycling parameters: Initialization (95°C for 3 min) was followed by the denaturation of the strands (95°C for 10 sec), annealing of the primers to the template (55°C for 10 sec), elongation of the primers by the DNA polymerase (72°C for 15 sec). The cycle from denaturation to elongation were repeated 40 times. Thereafter, the final elongation step was performed at 95°C for 10 sec. At last, a melting temperature gradient

90

with 0.5°C increasing increment from 65–95°C for 5 sec was used to confirm the specificity of the primer sets. Besides, the possibility of unspecific amplification products and contamination was checked by using a non-template control (NTC). Furthermore, a positive control was used with a known amount of template. qPCR data were analyzed with the Bio- Rad CFX manager software, which allows for the immediate determination of the cycle threshold (Ct), melting curves and quantification of samples.

Motif analysis

For de novo motif discovery within the significantly enriched DNA fragments, their genomic sequences were submitted to MEME (Bailey and Elkan, 1994). Parameters were set to search for zero or one palindromic motif of 16 bp width per sequence.

DNA band shift assays

50 nM of the ssDNA (oligo1-CGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC) and blunt-ended dsDNA (which was formed by annealing the oligo1 with its complementary oligonucleotide) were incubated for 20 min at 50oC with increasing concentrations of SSO2474 (0-2.0 M) in 20 l DNA-binding buffer (10 mM Tris-Cl, pH 8.0, 100 mM KCl, 2 mM DTT, 10% glycerol). The samples were loaded onto 12% (v/v) acrylamide gel and electrophoresed in 0.5 × TBE buffer for 1 h 50 min. Following electrophoresis, the gels were stained with SYBR® Gold (Life Technologies) and scanned by Typhoon FLA 7000 (GE Healthcare Life Science).

Results

Function prediction and high conservation of the two host transcription regulators

The transcription machinery in Archaea has drawn a lot of interests due to its bacterial-like regulators and eukayote-like basal factors (Bell et al., 2001). Although possessing unique genome structure, half of the transcription factors (TFs) identified in archaeal genomes share at least one homolog with bacterial genomes (Perez-Rueda and Janga, 2010). Therefore, structural or sequence similarities to the well-studied protiens could provide efficient and direct information for the study of the unknown archaeal proteins.

BlastP search of sso2474 revealed a putative HTH (Helix-turn-Helix) motif located between 20 and 70 amino acid residues, and a conserved domain matching MarR-2 (multiple antibiotic resistance) family proteins was also detected at the same location. The genome sequence similarities in the database suggest that sso2474 belong to the TrmB-like (transcriptional regulator of the maltose system) family, and it has a 36% identity and 65% sequence similarity to transcription regulator TrmB in Sulfolobus archaeon AZ1 and a 36%

91

identity and 61% similarity to transcription regulator TrmB in strain Acidianus hospitalis. Most of the TrmB proteins are able to bind DNA using a HTH motif as DBD (DNA– binding domain). The DBD is located at the N-terminal region and a mutational analysis revealed that it is essential for binding (Maruyama et al., 2011). However, the tertiary structure prediction of sso2474 using the threading program Phyre 2 (Kelley and Sternberg, 2009) suggested a high confidence (99.6%) to MarR-like family transcription regulator with a 93% alignment coverage. The crystal structures of many MarR family proteins from bacterial and archaeal species were solved, and they reveal a common architecture with a characteristic winged helix domain for DNA binding. Although sequence identities between these homologs is less than 20%, they all possess the same core fold (Nichols et al., 2009). The protein sso2474 was conserved in Sulfolobus, Acidianus and Metallosphaera specises of Sulfolobales as well as euryarchaeotal Halobacteria species (Fig. S1).

The down-regulated gene, sso10340, encoding a 10 kDa protein, has a high identity with truncated variant of Lrp/AsnC-family. Sequence alignment between sso10340 and E.coli Lrp (leucine-responsive regulatory) protein, one of the best characterized Lrp family proteins, revealed that the sso10340 protein aa sequence matched well with the C-terminal amino acid effector domain of the Lrp protein (Fig. S3). The structure prediction of sso10340 by Phyre2 also suggested a high confidence (99.9%) to the STS042 protein from Sulfolobus tokodaii 7. STS042 was identified as a stand-alone RAM (regulation of amino acid metabolism) module protein, which has homologies with the C-terminal domain of Lrp/AsnC-family proteins (Miyazono et al., 2008). Search results among the DNA database indicated that Lrp/AsnC-family proteins distribute among many bacterial and most archaea. Sso10340 has homologues in crenarchaeotal sulfolobales species as well as bacterial species (Fig. S2).

DNA extraction from protein-DNA complex and high-throughput sequencing

A method similar to Chip-seq was used to gain further insights into the two proteins regulon. sso2474 and sso10340 were cloned into the E.coli-Sulfolobus shuttle vector pEXA2 under the control of arabinose promoter, with a His-tag in the C-terminus, respectively (Gudbergsdottir et al., 2011). In order to detect their binding sites in both the host and virus genome, the cells were infected with SIRV2 at a m.o.i of 10 after the expression of the target protein was induced for 15 h. The cells were collected after 2.5 h post virus infection.

By Ni-NTA-Agarose chromatography the protein-DNA complex were purified. Proteins were detected in SDS-PAGE gel and DNA bound by these proteins was run on the agarose gel. As shown in Fig. 1A, sso2474 was purified to homogeneity, with a single band detected in SDS-gel. The DNA extracted from sso2474 exhibited hundreds of folds higher yields

92

than the control DNA, which was purified by Ni-NTA-Agarose beads from the cells transformed with an empty plasmid. It seems that sso2474 showed a really high affinity to DNA. The SDS-PAGE and western blot analysis showed that the Lrp-like protein, sso10340, exhibit a range of oligomeric states including dimers, octamers and decamers even after SDS treatment (Fig. 1B), resembling the Lrp/AsnC family protiens which form a range of multimeric species in solution (Brinkman et al., 2003;Leonard et al., 2001). There are also significantly more DNA from sso10340-DNA complex than from the control.

The purified DNA-protein complex was firstly treated with RNase A to remove the contaminated RNA, and the deproteination was carried out by incubation with protease K. The target DNA fragments of each sample were finally extracted using phenol/chloroform extraction and ethanol precipitation, with an average size of 300-500 bp. Then the prepared sample was sequenced using ion torrent next-generation sequencing. Of the sequenced 363- 393 thousand reads, 347-362 thousand reads was uniquely mapped with either sulfolobus sofataricus host genome DNA or SIRV2 virus DNA (Table 1). It is interesting that 91.7 % of the mapped reads from sso2474 are aligned with virus genome, whereas 92.04 % reads from sso10340 belong to the host genome, indicating that sso2474 has a high affinity to virus genome and sso10340 specially regulate the host genes.

It is surprising that almost all the DNA extracted from sso2474 was aligned to virus genome. In order to validate whether it is due to the high amount of virus genome present in the cell, we checked the copy number ratio between host and virus genome by Real-Time PCR (qPCR). The infected cells (the same one for sequence) were collected and washed 3 times to remove the virus on the cell surface, and the total DNA from the infected Sulfolobus cells was extracted. One set of primers belong to the Sulfolobus solfataricus TFB-II were designed to detect the host genome copy numbers and the primers amplifying the SIRV2 coat protein were designed to check the virus copy numbers in the same DNA sample. Whereas, the data in Table 2 showed that there was average 0.6 virus entered in one host cell after 2.5 h post infection, excluding the possibility that the high coverage of viral sequence reads was due to a high copy number of the virus present in the infected cells. Thus, we conclude the sso2474 preferentially binds the viral DNA.

Detection of the enriched DNA fragment and genome-wide binding profile of the two proteins

To identify the DNA-enriched regions, we use Bowtie, the ultrafast memory-efficient short read aligner, to align sequenced sets of short DNA reads to large genome Sulfolobus solfataricus P2 as well as SIRV2 (Satoh and Tabunoki, 2013). And the genomic locations of

93

the peaks was identified and visualized by Artemis, an integrated platform to analyze high- throughput sequence-based experimental data (Carver et al., 2012).

Protein Sso2474 binds to virus DNA with low specificity

Although only 3.76 % DNA extracted from protein sso2474 can be aligned to the host genome, there is still a specific binding peak showing up in the map (Fig. 2A), located upstream and inside of sso2474, indicating that the protein was regulated by itself (peak 10 in Fig 2A). In contrast, an average of 5400 reads were aligned to SIRV2 genome, which was more than 500 folds than that to the host genome. They were demonstrated as mountain shape with wide peaks covering the whole virus genome (Fig 2B). Even so, some potential specific binding sites were marked with numbers, the binding regions were amplified by PCR, and the gel mobility shift assays were carried out for validation.

Protein sso2474 was firstly expressed and purified from Sulfolobus solfataricus P2 (Fig 1A). However, DNA specifically bound by the protein cannot be removed by either DNaseI or PEI (Phenylethyleneimine) and formed an extremely high background. Then we set out to express the sso2474 protein in E.coli inserted into the pET-30a vector with a C-terminal His tag and purified by Ni2+-affinity chromatography. It was expressed soluble in high amount and SDS-PAGE analysis of the purified protein revealed a pure major band with a molecular weight of approximately 15kDa (Fig. 3A).

Firstly the 11 enriched fragments and a negative control fragment were amplified by PCR from SIRV2 genome, and the electrophoretic mobility shift assay (EMSA) of the recombinant sso2474 from E.coli with the target fragment were carried out. The results showed that this protein bound all the DNA fragments with no specificity (data not shown). Since the binding region of sso2474 cannot be detected, another possibility is that this protein prefers to bind ssDNA rather than dsDNA, some single-strand DNA binding proteins bind DNA in a non-sequence specific way (Dickey et al., 2013). To verify whether sso2474 is a single-strand binding protein, an EMSA experiment with equal molar ratio of ssDNA and dsDNA substrate mixture were performed.

The concentration of the protein was increased from 0.1 M to 2.0 M. The samples were deposited in a 12% acrylamide gel and was run in 0.5 x TBE buffer. It is demonstrated that sso2474 preferred to bind dsDNA than ssDNA. As it is shown in Fig .3B lane 3, the band representing dsDNA began to shift while the amount of ssDNA kept the same. When the protein binds all dsDNA in the sample, and there is still more protein left, it begins to bind the ssDNA. When the protein concentration increased to 1.6 M (Fig 3B. Lane 7), almost all the substrates formed complex and no free DNA left. The result demonstrated a clear image that sso2474 showed more fold affinity to dsDNA than to ssDNA.

94

Protein sso10340 bound the host genome at several regions

Compared with sso2474, the genome-wide binding profile of sso10340 with host genome was well mapped showing a dozen of binding sites (Fig 4 A). However, the reads corresponding to viral sequences were randomly aligned with virus genome, similar to the control sample (Fig 4 B). Only regions exhibiting more than 2-fold enrichment in CHIP DNA versus input DNA were considered to be bound significantly to sso10340. A total of 27 genomic regions, scattered in the genome, was identified. The various functions of the genes that these binding peaks overlapped or closest to were summarized, most of which participate in amino acid metabolism, energy metabolism, biosynthesis and transport ( Table 3).

Additionally, the fragments were grouped into four categories according to their location with respect to open reading frames (Fig 4 C). As we observed that 41% (upstream and intragenic but upstream) of the 27 regions localize to the upstream regulatory region of the corresponding gene, and 46% of them fell within the coding region. The small left peaks (13%) were found in the regions locating both the downstream of the neighbored genes.

As half of the binding region fell into the upstream area of the corresponding gene, a binding profile of 14 genomic regions near promoter area were zoomed in and analyzed in detail (Fig. S4). The binding genomic fragment was amplified by PCR with an averagesize of 150bp. The protein sso10340 was purified from Sulfolobus, and an EMSA screen of these regions was performed to verify whether these targets regions also interact with purified protein in vitro. However, no binding was observed by protein sso10340 in vitro.

Motif analysis for sso10340 binding site

The protein sso10340 binding motif was defined by enriched oligonucleotide sequences within bound regions. The sequences of these DNA fragments were submitted to the motif- based sequence analysis tool MEME-ChIP (Machanick & Bailey, 2011. http://meme.nbcr.net.) to detect conserved DNA motif. The most suggested motif was shown in Fig. 5 B.

This 11 bp, an imperfect palindromic sequence was present in 96% of all binding regions and have a similarity with the known motif of PPARG (Peroxisome proliferator activated receptor gamma) (MA0066.1) (Fig 5 A). PPARgamma binds as heterodimer composing of members of the retinoid X receptor family (RXR) and PPRE (PPAR response elements), which had a direct repeat of two half sites of 5´-AGGTCA-3´ separated by one nucleotide (Fig 5 A).

Discussion

95

The archaeal transcription regulation possessed the eukaryotic-like basal transcription machinery and bacterial-like regulators that distinguished them from the other two domains (Koonin and Galperin, 1997;Grabowski and Kelman, 2003). Bindings of the TBP (TATA- box binding protein) and TFB (transcription factor B) to TATA box and BRE (TFB response element) in the promoter region are critical to the transcription initiation of archaeal genes (Bell et al., 1999). How the bacterial-type transcriptional regulators regulate the eukaryotic-like transcription machinery in archaea, especially on the virus infection, are still need to be elucidated.

The protein sso2474 showed an amino acid sequence similarity with TrmB family proteins, which were found in all three domains of life, containing all three kinds of possible TF combinations- repressors, activators or both. In archaea, most of the TrmB family proteins were spread in the kingdom Euryarchaeota, only some exist in Crenarchaeota (Maruyama et al., 2011). No matter the best studied TrmB proteins in Thermococcales P. furiosus of Euryarchaeota or the research on TrmB family protein MalR of S. acidocaldarius in Crenarchaeon, most of documented TrmBs seem to function as controlling diverse sugar transporters or different genes of sugar metabolism, such as maltose and glucose processing, as well as genes involved in other metabolisms (Kanai et al., 2007;Lee et al., 2008;Reichlen et al., 2012;Wagner et al., 2014). However, in this study, the binding map of the protein sso2474 indicated that this protein did not show a significant response on regulating the genes related with sugar metabolism (except sso2474 itself) to activate or repress the corresponding gene for its healthy maintenance.

On the other hand, both the conserved domain and the structure prediction revealed that sso2474 belongs to the MarR (multiple antibiotic resistance) family transcription regulators. MarR family proteins constitute a diverse group of transcription regulators that modulate the expression of genes encoding proteins involved in the metabolic pathways, stress responses, virulence and degradation or export of harmful chemicals such as antibotics, organic solvents (White et al., 1997), oxidative stress agents (Ariza et al., 1994), and house disinfections (McMurry et al., 1998). It seems that this mar locus is involved in the mechanism that the stains used to resist the lethal effects of a wide range of toxic agents.

E.coli MarR was the firstly described MarR family regulator and its homologs are widely distributed in both bacterial and archaea. MarR, as a component of the marRAB locus in E.coli, is a repressor to its own operon and MarA is a transcription activator that can active the operon and regulates the expression of proteins important to the multiple antibiotic resistance (Alekshun and Levy, 1997). In many strains, constitutive expression of MarA makes a contribution to maintenance of the resistance to antibiotics and other environmental hazards, and the marR deletion mutant or the inactivation of MarR will result the increased

96

expression of MarA (Alekshun and Levy, 1999;Barbosa and Levy, 2000). To date, no research revealed that TrmB family or MarR family proteins bind specifically to virus genome upon virus infection. If sso2474 is more similar to a MarR family protein, the strain would activate the expression of corresponding proteins to resist the exposure to the virus infection, and the sso2474 could be in a way like binding to virus genome to hinder the process of transcription. Based on this hypothesis, the growth retardation to SIRV2 was compared between sso2474 overexpressed stain and the wild-type strain, and no difference was observed from the growth curve (data not show). The above experiment indicated that this protein probably is not involved in inhibiting the growth of virus. Or perhaps there is a difference between viral and host DNA, e.g. modification, so sso2474 could specifically binds to viral DNA. However, the binding mechanism of this protein and its possible interacted partners involved are needed to be further identified. It will be intriguing to detect the phenotype changes of sso2474 mutant strain upon SIRV2 infection, comparing to wide type strain.

The downregulated transcription regulator sso10340 showed a similarity to the C-terminal domain of Lrp/AsnC family proteins ( leucine-responsive regulatory protein ). Most of the experimentally characterized archaeal transcriptional regulators belong to this family, and it is a family that globally and specifically regulates genes. These family members can be found in both bacteria and archaea but not in eukarya (Brinkman et al., 2003). The Lrp family proteins typically have a 15 kDa molecular weight for the monomer with an N- terminal wHTH domain and a C-terminal Amino Acid Metabolism (RAM) domain. The RAM possesses a αβ sandwich fold and possibly involved in effector recognition and oligomerization of the protein subunits (Thaw et al., 2006). Actually, proteins that only possess the RAM domain are frequently observed in the genomes of many organisms. They are defined as a novel ligand-binding domain or stand-alone RAM-domain (SARD) proteins involved in regulation of amino acid metabolism (Ettema et al., 2002). Although many of them were crystalized and structurally determined, the functions of these proteins remain not clear and still need to be elucidated (Miyazono et al., 2008;Nakano et al., 2006). The failure of detecting any DNA binding by sso10340 is possibly due to lack of binding conditions or lack of DNA binding activity. It is possible that sso10340 recognizes an effector and interacts with a DNA binding protein or a transcription regulator to achieve its regulatory role in vivo. Indeed, sequences close or in the coding region of a DNA binding protein (sso2626) and a transcription regulator (sso2827) were detected in this study (Table 3). Weather sso10304 interacted with the two proteins still need to further confirmed.

97

Reference

Alekshun,M.N., and Levy,S.B. (1997) Regulation of chromosomally mediated multiple antibiotic resistance: the mar regulon. Antimicrob Agents Chemother 41: 2067-2075.

Alekshun,M.N., and Levy,S.B. (1999) Alteration of the repressor activity of MarR, the negative regulator of the Escherichia coli marRAB locus, by multiple chemicals in vitro. J Bacteriol 181: 4669- 4672.

Ariza,R.R., Cohen,S.P., Bachhawat,N., Levy,S.B., and Demple,B. (1994) Repressor mutations in the marRAB operon that activate oxidative stress genes and multiple antibiotic resistance in Escherichia coli. J Bacteriol 176: 143-148.

Bailey,T.L., and Elkan,C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28-36.

Barbosa,T.M., and Levy,S.B. (2000) Differential expression of over 60 chromosomal genes in Escherichia coli by constitutive expression of MarA. J Bacteriol 182: 3467-3474.

Bell,S.D., Kosa,P.L., Sigler,P.B., and Jackson,S.P. (1999) Orientation of the transcription preinitiation complex in archaea. Proc Natl Acad Sci U S A 96: 13662-13667.

Bell,S.D., Magill,C.P., and Jackson,S.P. (2001) Basal and regulated transcription in Archaea. Biochem Soc Trans 29: 392-395.

Bize,A., Karlsson,E.A., Ekefjard,K., Quax,T.E., Pina,M., Prevost,M.C. et al. (2009) A unique virus release mechanism in the Archaea. Proc Natl Acad Sci U S A 106: 11306-11311.

Brinkman,A.B., Ettema,T.J., de Vos,W.M., and van der Oost,J. (2003) The Lrp family of transcriptional regulators. Mol Microbiol 48: 287-294.

Carver,T., Harris,S.R., Berriman,M., Parkhill,J., and McQuillan,J.A. (2012) Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28: 464-469.

Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y., and Peng,X. (2014) Unveiling cell surface and type IV secretion proteins responsible for archaeal rudivirus entry. J Virol 88: 10264- 10268.

Dickey,T.H., Altschuler,S.E., and Wuttke,D.S. (2013) Single-stranded DNA-binding proteins: multiple domains for multiple functions. Structure 21: 1074-1084.

Ettema,T.J., Brinkman,A.B., Tani,T.H., Rafferty,J.B., and Van Der Oost,J. (2002) A novel ligand- binding domain involved in regulation of amino acid metabolism in prokaryotes. J Biol Chem 277: 37464-37468.

Frols,S., Gordon,P.M., Panlilio,M.A., Schleper,C., and Sensen,C.W. (2007) Elucidating the transcription cycle of the UV-inducible hyperthermophilic archaeal virus SSV1 by DNA microarrays. Virology 365: 48-59.

Gardner,A.F., Guan,C., and Jack,W.E. (2011) Biochemical characterization of a structure-specific resolving enzyme from Sulfolobus islandicus rod-shaped virus 2. PLoS One 6: e23668.

98

Grabowski,B., and Kelman,Z. (2003) Archeal DNA replication: eukaryal proteins in a bacterial context. Annu Rev Microbiol 57: 487-516.

Gudbergsdottir,S., Deng,L., Chen,Z., Jensen,J.V., Jensen,L.R., She,Q., and Garrett,R.A. (2011) Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with vector-borne viral and plasmid genes and protospacers. Mol Microbiol 79: 35-49.

He,F., Chen,L., and Peng,X. (2014) First Experimental Evidence for the Presence of a CRISPR Toxin in Sulfolobus. J Mol Biol.

Kanai,T., Akerboom,J., Takedomi,S., van de Werken,H.J., Blombach,F., van der Oost,J. et al. (2007) A global transcriptional regulator in Thermococcus kodakaraensis controls the expression levels of both glycolytic and gluconeogenic enzyme-encoding genes. J Biol Chem 282: 33659-33670.

Kelley,L.A., and Sternberg,M.J. (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4: 363-371.

Kessler,A., Brinkman,A.B., van der Oost,J., and Prangishvili,D. (2004) Transcription of the rod- shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon sulfolobus. J Bacteriol 186: 7745-7753.

Koonin,E.V., and Galperin,M.Y. (1997) Prokaryotic genomes: the emerging paradigm of genome- based microbiology. Curr Opin Genet Dev 7: 757-763.

Lee,S.J., Surma,M., Hausner,W., Thomm,M., and Boos,W. (2008) The role of TrmB and TrmB-like transcriptional regulators for sugar transport and metabolism in the hyperthermophilic archaeon Pyrococcus furiosus. Arch Microbiol 190: 247-256.

Leonard,P.M., Smits,S.H., Sedelnikova,S.E., Brinkman,A.B., de Vos,W.M., van der Oost,J. et al. (2001) Crystal structure of the Lrp-like transcriptional regulator from the archaeon Pyrococcus furiosus. EMBO J 20: 990-997.

Maruyama,H., Shin,M., Oda,T., Matsumi,R., Ohniwa,R.L., Itoh,T. et al. (2011) Histone and TK0471/TrmBL2 form a novel heterogeneous genome architecture in the hyperthermophilic archaeon Thermococcus kodakarensis. Mol Biol Cell 22: 386-398.

McMurry,L.M., Oethinger,M., and Levy,S.B. (1998) Overexpression of marA, soxS, or acrAB produces resistance to triclosan in laboratory and clinical strains of Escherichia coli. FEMS Microbiol Lett 166: 305-309.

Miyazono,K., Tsujimura,M., Kawarabayasi,Y., and Tanokura,M. (2008) Crystal structure of STS042, a stand-alone RAM module protein, from hyperthermophilic archaeon Sulfolobus tokodaii strain 7. Proteins 71: 1557-1562.

Nakano,N., Okazaki,N., Satoh,S., Takio,K., Kuramitsu,S., Shinkai,A., and Yokoyama,S. (2006) Structure of the stand-alone RAM-domain protein from Thermus thermophilus HB8. Acta Crystallogr Sect F Struct Biol Cryst Commun 62: 855-860.

Nichols,C.E., Sainsbury,S., Ren,J., Walter,T.S., Verma,A., Stammers,D.K. et al. (2009) The structure of NMB1585, a MarR-family regulator from Neisseria meningitidis. Acta Crystallogr Sect F Struct Biol Cryst Commun 65: 204-209.

99

Oke,M., Kerou,M., Liu,H., Peng,X., Garrett,R.A., Prangishvili,D. et al. (2011) A dimeric Rep protein initiates replication of a linear archaeal virus genome: implications for the Rep mechanism and viral replication. J Virol 85: 925-931.

Okutan,E., Deng,L., Mirlashari,S., Uldahl,K., Halim,M., Liu,C. et al. (2013) Novel insights into gene regulation of the rudivirus SIRV2 infecting Sulfolobus cells. RNA Biol 10: 875-885.

Ortmann,A.C., Brumfield,S.K., Walther,J., McInnerney,K., Brouns,S.J., van de Werken,H.J. et al. (2008) Transcriptome analysis of infection of the archaeon Sulfolobus solfataricus with Sulfolobus turreted icosahedral virus. J Virol 82: 4874-4883.

Perez-Rueda,E., and Janga,S.C. (2010) Identification and genomic analysis of transcription factors in archaeal genomes exemplifies their functional architecture and evolutionary origin. Mol Biol Evol 27: 1449-1459.

Pina,M., Bize,A., Forterre,P., and Prangishvili,D. (2011) The archeoviruses. FEMS Microbiol Rev 35: 1035-1054.

Quax,T.E., Lucas,S., Reimann,J., Pehau-Arnaudet,G., Prevost,M.C., Forterre,P. et al. (2011) Simple and elegant design of a virion egress structure in Archaea. Proc Natl Acad Sci U S A 108: 3354-3359.

Quemin,E.R., Lucas,S., Daum,B., Quax,T.E., Kuhlbrandt,W., Forterre,P. et al. (2013) First insights into the entry process of hyperthermophilic archaeal viruses. J Virol 87: 13379-13385.

Reichlen,M.J., Vepachedu,V.R., Murakami,K.S., and Ferry,J.G. (2012) MreA functions in the global regulation of methanogenic pathways in Methanosarcina acetivorans. MBio 3: e00189-12.

Satoh,J., and Tabunoki,H. (2013) Molecular network of chromatin immunoprecipitation followed by deep sequencing-based vitamin D receptor target genes. Mult Scler 19: 1035-1045.

Thaw,P., Sedelnikova,S.E., Muranova,T., Wiese,S., Ayora,S., Alonso,J.C. et al. (2006) Structural insight into gene transcriptional regulation and effector binding by the Lrp/AsnC family. Nucleic Acids Res 34: 1439-1449.

Wagner,M., Wagner,A., Ma,X., Kort,J.C., Ghosh,A., Rauch,B. et al. (2014) Investigation of the malE promoter and MalR, a positive regulator of the maltose regulon, for an improved expression system in Sulfolobus acidocaldarius. Appl Environ Microbiol 80: 1072-1081.

White,D.G., Goldman,J.D., Demple,B., and Levy,S.B. (1997) Role of the acrAB locus in organic solvent tolerance mediated by expression of marA, soxS, or robA in Escherichia coli. J Bacteriol 179: 6122-6126.

Zillig,W., Prangishvilli,D., Schleper,C., Elferink,M., Holz,I., Albers,S. et al. (1996) Viruses, plasmids and other genetic elements of thermophilic and hyperthermophilic Archaea. FEMS Microbiol Rev 18: 225-236.

100

Table and Figure Legends

Table1. Sequencing and mapping data with Sulfolobus solfataricus P2 and SIRV2 genomes

Table 2. The average virus copy number in each host cell for 2.5 h post infection of SIRV2

Table 3. The products as well as their COG Functional Category of target genes binding by sso10340

Figure 1. Purification of protein sso2474, sso10340 as well as negative control from Sulfolobus sofataricus P2. (A) Purified protein elution samples of sso2474 and negative control was subjected to 12.5% SDS-PAGE gel to detect the purity of the protein (a), and 0.7% Agarose gel to detect the DNA amount bound by protein(b).(B) Purified protein elution samples from sso10340 and negative control was subjected to SDS-PAGE gel to detect the the purity of the protein (a), and Agarose gel to detect the DNA amount(b).Western blot analysis of the purified sso10340 (c).

Figure 2. Genome-wide distribution of sso2474 binding regions on Sulfolobus solfataricus P2 genome (A) and SIRV2 viron genome (B). These experiments were performed using DNA extracted from purified sso2474 and negative control. Data were analyzed and visualized as `Material and Methods` section. The genome coordinates (in bp) are given on the x-axis, and y-axis represents the sequenced reads aligned on the genome. And the sharp peak marked in arrow in (A) locates in the region of protein sso2474.

Figure 3. (A) The purified sso2474 fractions (L1-L6) from E.coli were analyzed on a SDS- PAGE gel. The single major band around 15kDa represented the protein sso2474. (B) EMSA assay with dsDNA and ssDNA mixture as substrate. The concentration of the protein was increased from 0.1 M to 2.0 M. The length of oligos was 35 bp. The dsDNA and ssDNA were mixed with equal molar ratio (25 nM: 25 nM).

Figure 4. Genome-wide distribution of sso10340 binding regions on Sulfolobus solfataricus P2 genome (A) and SIRV2 viron genome (B) as well as classification of sso10340 binding regions with respect to genomic organization(C).(A) and (B),These experiments were performed using DNA extracted from purified sso10340(red bar) and control(empty plasmids) (green bar)-input chip. The genome coordinates (in bp) are given on the x-axis, and y-axis represents the sequenced reads aligned on the genome. Taget gene detected in vivo are indicated.(C) ChIP-enriched regions are indicated by black horizontal bars, whereas ORFs are depicted by horizontal arrows. Binding regions ranging from – 500 bp to +100 bp relative to translation start site was considered to be upstream of a transcription unit or intragenic but upstream. ChIP-enriched regions that are exclusively located in gene coding regions or partial in downsteam of the gene were identified as intragenic part. For binding in the regions belong to both downstream of neighbored gene was classified to intergenic region.

Figure 5. Weblogo of the motif detected with MEME-chip. For MEME motifs, the discovered motif Logo (B) from submitted bind-region sequences is shown aligned with the most similar JASPAR motif Logo (A) and the similarity is significant (E≤0.05).

101

Table 1. Sequencing and mapping with Sulfolobus solfataricus P2 and SIRV2

Mapped reads Alighnment Sequenced Mapped reads Alignment rate Sample Mapped reads with Sirv2 rate with Sirv2 reads with p2 genome with p2 genome genome genome

Input control 393537 362185 359559 91.37% 2626 0.67%

sso2474 363814 347347 13677 3.76% 333670 91.70%

sso10340 388533 359145 357598 92.04% 1547 0.4%

Table 2. The average virus copy number in each host cell for 2 h post infection of SIRV2

Virus copy Time (h) Host SQ Mean Host Std.Dev Virus SQ Mean Virus Std.Dev number/host chromosome

2.5 5.287E+07 1.80E+06 3.158E+07 8.22E+06 0.6

102

Table 3. The products as well as their COG Functional Category of target genes binding by sso10340

Genomic Genomic Log2 CHIP COG Target Motif binding coordinate coordinate DNA/input Gene function Functional name location peak start peak stop DNA Category doxC 40520 41150 Inside the gene 1.82 Terminal oxidase sso1214 1053350 1054300 Inside the gene 3.41 Carbonic anhydrases 80bp Inside the Anaerobic dehydrogenases; Energy sso1580 1427200 1428050 gene from start 2.21 molybdopterin production and codon oxidoreductase conversion molybdopterin-binding sso2826 2587400 2588210 Inside the gene 2.34 protein Inorganic ion pacS 2411320 2412430 200bp upstream 3.39 Cation transporting ATPase transport and metabolism 75bp inside the Amino acid sso3189 2935150 2935710 gene from start 1.93 Amino acid permease transport and codon metabolism adh-11 2469950 2470580 70bp upstream 1.96 Alcohol dehydrogenase Archaeal putative sso2153 1977900 1978650 Inside the gene 2.66 transposase ISC1217; Transposase pfam04693 sso12210 2965830 2966580 175bp upstream 2.07 Transposase Conserved,truncated variant of Lrp/AsnC-family;RNA sso10340 2189085 2189390 Inside the gene 2.31 polymerase and Transcription transcription factors sso2626 2391100 2392050 Start codon 2.07 DNA-binding protein sso2827 2588200 2589300 Inside the gene 3.06 Transcription regulator rps26E 502000 502750 Inside the gene 2.49 30S ribosomal protein S26e 24bp Inside the Translation rp144e 907520 908380 gene from start 1.88 50S ribosomal protein L44e coden ATPases involved in Cell division mrp 395515 396010 Inside the gene 1.78 chromosome partitioning and Possibly ATPase of the chromosome sso2730 2485375 2485920 Inside the gene 1.87 AAA superfamily partitioning metS 491920 492385 Inside the gene 2.48 Methionone-tRNA ligase RNA processing RNA methyltransferase, and sso1044 903240 904070 Inside the gene 2.96 DNA methyltransferase, modification tRNA short last area hits to a zoo of molecules associated sso1288 1114815 1115458 Inside the gene 2.16 Cell envelop with cell wall/cell biogenesis, membrane outermembrane sso3150 2902540 2903225 Inside the gene 3.19 Putative membrane protein sso3178 2923630 2924485 280bp upstream 2.10 Cell well ferritin sso2749 2502810 2503615 Inside the gene 2.37 Oxidoreductase,oxidative- Defense stress response mechanisms multicopy.similar to sso2037 1850715 1851580 Inside the gene 3.06 sso1886;proteases multicopy similar to sso1758 1592730 1593280 202bp upstream 2.20 carboxy-end of sso1752,1469 and 1606 Unclassified Fe-S cluster assembly sso0927 787505 788400 Inside the gene 2.09 protein SufB sso0438 380250 381045 43bp upstream 2.45 - 22bp inside the Function sso1984 1796200 1796975 gene from the 2.98 - unknown start codon

103

Figure 1

A

B

SSO10340 SSO10340

PEXA3

- -

1 2

M

70kDa octamer 55kDa

27kDa dimmer 15kDa 10kDa sso10340

(c)

104

Figure 2

A

B

105

Figure 3

A B

(Sso2474 uM) 0 0.1 0.2 0.4 0.8 1.2 1.6 2.0 K M 1 2 3 4 5 6 K

15kDa sso2474 dsDNA

K 10kDa K ssDNA

K 1 2 3 4 5 6 7 8

106

Figure 4

A

107

B

C

Distribution of the genomic locations of 10340 Intragenic binding sites Intragenic Intragenic but upstream Upstream Intergenic Intragenic but upstream

13% Upstream

15% 46%

Intergenic

26%

108

Figure 5

109

SyntTax Report http://archaea.u-psud.fr/SyntTax page 1/3 Genomic contexts

Sulfolobus solfataricus 0272 0274 0277 98 2 uid167998 0271 0273 0275 0276 0278 0279 0281 0280 0282 0284 0283 0285 0286 0287 Score: 79.92

Sulfolobus islandicus 0264 0263 0268 HVE10 4 uid162067 0262 0265 0266 0267 0269 0270 0271 0272 0273 0274 0275 0276 0277 0278 0279 0280 Score: 70.47

Sulfolobus islandicus 0287 0291 0293 0286 leuD 0289 0290 0292 0294 0295 0296 0297 2966 0298 0299 0300 0301 0302 L S 2 15 uid58871 3011 Score: 70.08

Sulfolobus islandicus 0267 0269 0261 0262 0264 0265 0266 0268 0270 0271 0272 0273 0274 0275 0276 0277 0278 L D 8 5 uid43679 C1 0263 Score: 70.08

Sulfolobus solfataricus 2473 10449 P2 uid57721 2468 2469 leuD leuC phrB 2474 2475 2476 2478 2479 2481 Score: 70.08

Sulfolobus islandicus 0256 0261 0255 leuD 0259 0260 0262 0263 0264 0265 0266 0267 0268 0269 0270 0271 thrS M 14 25 uid58849 0257 Score: 70.08

Sulfolobus islandicus 2869 2863 2864 leuD 2867 2868 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 Y N 15 51 uid58825 C1 2865 Score: 70.08

Sulfolobus islandicus 0279 leuD 0277 0278 0281 0283 0284 0286 0287 0288 0273 0274 0280 0282 0285 0289 thrS M 16 4 uid58841 0275 Score: 70.08

Sulfolobus islandicus 0261 leuD 0259 0260 0263 0264 0266 0267 0268 0269 0270 0271 thrS 0255 0256 0262 0265 M 16 27 uid58851 0257 Score: 70.08

Sulfolobus islandicus 0260 0258 0259 leuD 0262 0263 0264 0266 0268 0269 0271 0272 0273 0274 0275 Y G 57 14 uid58923 0265 0267 0270 2988 Score: 70.08

Sulfolobus islandicus 0257 0259 0260 0262 0264 0265 0267 0268 0269 0270 0271 0272 REY15A uid162071 0258 0261 0263 0266 0273 0274 Score: 69.69

Sulfolobus islandicus 0249 0247 0250 0251 0252 0253 0255 0256 0258 0260 0261 0262 0263 LAL14 1 uid197216 0248 0254 0257 0259 0264 Score: 69.69

Acidianus hospitalis 0940 0942 0941 0939 0938 0936 0935 0934 0932 0931 0929 0927 0926 W1 uid66875 0937 0933 0930 0928 Score: 27.01

Sulfolobus tokodaii 0846 0845 0842 0841 0837 0836 0835 0834 0833 0831 096 0844 0843 0840 0839 097 0838 7 uid57807 0830 Score: 25.79

Metallosphaera cuprina 0829 0830 0833 0834 0835 0836 0837 0839 0840 0841 0843 0844 0845 0848 0849 0850 0838 0842 0846 0847 Ar 4 uid66329 0831 0832 Score: 25.16

Metallosphaera sedula 1405 1404 1401 1400 1397 1396 1395 1394 1393 1392 1390 1389 1388 1406 1402 1399 1398 1391 1387 DSM 5348 uid58717 1403 Score: 23.03

Sulfolobus acidocaldarius 00455 00445 00450 00460 00475 00480 00485 00500 00505 00520 00525 00535 00540 00465 00470 00490 00495 SUSAZ uid232254 00510 00515 00530 00545 Score: 20.31

Sulfolobus acidocaldarius 00450 00505 00445 00460 00465 00470 00490 00495 00500 00510 00515 00520 00530 00535 00440 11569 00455 00475 00480 00485 N8 uid189027 00525 Score: 20.16

Sulfolobus acidocaldarius 00450 00505 00460 00465 00470 00490 00495 00500 00510 00515 00520 00530 00535 00445 00440 11814 00455 00475 00480 00485 Ron12 I uid189028 00525 Score: 20.16 SyntTax Report http://archaea.u-psud.fr/SyntTax page 2/3

Sulfolobus acidocaldarius pgmB 0093 0095 0096 DSM 639 uid58379 gds cbaA 0098 0099 0100 0101 lrs14 0103 0104 0105 moeA 0107 0108 0109 0110 0111 0112 0113 Score: 20.16

Fervidicoccus fontis 0361 0360 0359 0358 0354 R0010 0347 Kam940 uid162201 0357 0356 0355 0353 0352 0351 0350 0349 0348 0346 0345 0344 0343 Score: 17.13

Pyrolobus fumarii 1480 1482 1A uid73415 1481 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 Score: 15.31

Methanocaldococcus fervens AG86 uid59347 C1 1035 1034 1033 1032 1031 1030 1029 1028 1027 1026 1025 1024 tfb 1022 1021 1020 Score: 11.54

Pyrobaculum oguniense 1848 TE7 uid84411 C1 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 Score: 11.06

Pyrobaculum arsenaticum 0499 0500 DSM 13514 uid58409 0498 0497 0496 0495 0494 0493 0492 0491 0490 0489 Score: 11.06

Methanobacterium SWAN 0270 0271 1 uid67359 0269 0272 0273 0274 0275 0276 0277 0278 0279 0280 0281 0282 Score: 10.75

Halopiger xanaduensis 3788 3789 3790 3792 3795 3797 3798 3799 SH 6 uid68105 C1 3791 3793 3794 3796 Score: 10.75

Methanosarcina mazei 0111 0105 fxsA 0108 0109 0110 truD 0115 0116 pyrG Go1 uid57893 0107 0112 0113 0117 Score: 10.47

Methanosarcina mazei 0110 0111 0112 0113 0114 0115 0120 0121 0122 0124 0125 0126 Tuc01 uid190185 0116 0117 0118 0119 0123 Score: 10.47

Methanosarcina barkeri A3200 fxsA A3199 A3202 A3203 A3204 A3207 A3208 A3209 A3210 A3211 Fusaro uid57715 C1 A3201 A3205 truD Score: 10.47

Natrinema pellirubrum 1273 1275 1274 1272 1271 1270 1267 1263 1262 1261 DSM 15624 uid74437 C1 1269 1268 1266 1265 1264 Score: 10.47

Pyrococcus horikoshii 1087 1097 1093 1090 1089 1088 1086 1085 1083 1082 1081 1080 OT3 uid57753 1098 1095 1091 Score: 10.31

Methanosarcina acetivorans 3266 3267 3269 truD 3273 3274 3275 3277 3278 rimK 3270 C2A uid57879 3272 3276 Score: 10.31

Pyrobaculum calidifontis 1461 1460 1459 1457 1455 1453 1451 1448 1446 1445 1465 1464 1463 1462 1458 1456 JCM 11548 uid58787 1454 1452 1450 1449 1447 Score: 10.00

Thermofilum pendens 0055 0053 0051 R0001 0050 0049 0048 0046 0045 0042 0041 0039 0038 0037 0036 0035 0054 0047 0044 0043 0040 Hrk 5 uid58563 C1 0052 Score: 10.00 SyntTax Report http://archaea.u-psud.fr/SyntTax page 3/3 Query protein sequence: mqgkseismp dgrvadvfnv vkflyglsdr dieilkllik sqssltmeei sselnitksv vnks ilnlek knivikekve sskkgrrayt yrvdvnyltr klvtdldqli kdlkvkiadv igiqiekt as v Genomes without synteny: Acidilobus_saccharovorans_345_15_uid51395 Caldisphaera_lagunensis_DSM_15908_uid183486 Aeropyrum_camini_SY1___JCM_12091_uid222311 Aeropyrum_pernix_K1_uid57757 Desulfurococcus_fermentans_DSM_16532_uid75119 Desulfurococcus_kamchatkensis_1221n_uid59133 Desulfurococcus_mucosus_DSM_2162_uid62227 Ignicoccus_hospitalis_KIN4_I_uid58365 Ignisphaera_aggregans_DSM_17230_uid51875 Staphylothermus_hellenicus_DSM_12710_uid45893 Staphylothermus_marinus_F1_uid58719 Thermogladius_1633_uid167488 Thermosphaera_aggregans_DSM_11486_uid48993 Hyperthermus_butylicus_DSM_5456_uid57755 Thermofilum_1910b_uid215374 Caldivirga_maquilingensis_IC_167_uid58711 Pyrobaculum_1860_uid82379 Pyrobaculum_aerophilum_IM2_uid57727 Pyrobaculum_islandicum_DSM_4184_uid58635 Pyrobaculum_neutrophilum_V24Sta_uid58421 Thermoproteus_tenax_Kra_1_uid74443 Thermoproteus_uzoniensis_768_20_uid65089 Vulcanisaeta_distributa_DSM_14429_uid52827 Vulcanisaeta_moutnovskia_768_28_uid63631 Archaeoglobus_fulgidus_DSM_4304_uid57717 Archaeoglobus_profundus_DSM_5631_uid43493_C1 Archaeoglobus_sulfaticallidus_PM70_1_uid201033 Archaeoglobus_veneficus_SNP6_uid65269 Ferroglobus_placidus_DSM_10642_uid40863 Halalkalicoccus_jeotgali_B3_uid50305_C1 Haloarcula_hispanica_ATCC_33960_uid72475_C1 Haloarcula_hispanica_N601_uid230920_C1 Haloarcula_marismortui_ATCC_43049_uid57719_C1 Halobacterium_NRC_1_uid57769_C1 Halobacterium_salinarum_R1_uid61571_C1 Haloferax_mediterranei_ATCC_33500_uid167315_C1 Haloferax_volcanii_DS2_uid46845_C1 Halogeometricum_borinquense_DSM_11551_uid54919_C1 Halomicrobium_mukohataei_DSM_12286_uid59107_C1 Haloquadratum_walsbyi_C23_uid162019_C1 Haloquadratum_walsbyi_DSM_16790_uid58673_C1 Halorhabdus_tiamatea_SARL4B_uid214082_C1 Halorhabdus_utahensis_DSM_12940_uid59189 Halorubrum_lacusprofundi_ATCC_49239_uid58807_C1 Haloterrigena_turkmenica_DSM_5511_uid43501_C1 Halovivax_ruber_XH_70_uid184819 Natrialba_magadii_ATCC_43099_uid46245_C1 Natrinema_J7_uid171337_C1 Natronobacterium_gregoryi_SP2_uid74439 Natronococcus_occultus_SP4_uid184863_C1 Natronomonas_moolapensis_8_8_11_uid190182 Natronomonas_pharaonis_DSM_2160_uid58435_C1 Salinarchaeum_laminariae_Harcht_Bsk1_uid207001 Methanobacterium_AL_21_uid63623 Methanobacterium_MB1_uid231690 Methanobrevibacter_AbM4_uid206516 Methanobrevibacter_ruminantium_M1_uid45857 Methanobrevibacter_smithii_ATCC_35061_uid58827 Methanosphaera_stadtmanae_DSM_3091_uid58407 Methanothermobacter_marburgensis_Marburg_uid51637_C1 Methanothermobacter_thermautotrophicus_Delta_H_uid57877 Methanothermus_fervidus_DSM_2088_uid60167 Methanocaldococcus_FS406_22_uid42499_C1 Methanocaldococcus_infernus_ME_uid48803 Methanocaldococcus_jannaschii_DSM_2661_uid57713_C1 Methanocaldococcus_vulcanius_M7_uid41131_C1 Methanotorris_igneus_Kol_5_uid67321 Methanococcus_aeolicus_Nankai_3_uid58823 Methanococcus_maripaludis_C5_uid58741_C1 Methanococcus_maripaludis_C6_uid58947 Methanococcus_maripaludis_C7_uid58847 Methanococcus_maripaludis_S2_uid58035 Methanococcus_maripaludis_X1_uid70729 Methanococcus_vannielii_SB_uid58767 Methanococcus_voltae_A3_uid49529 Methanothermococcus_okinawensis_IH1_uid51535_C1 Methanocella_arvoryzae_MRE50_uid61623 Methanocella_conradii_HZ254_uid157911 Methanocella_paludicola_SANAE_uid42887 Methanocorpusculum_labreanum_Z_uid58785 Methanoculleus_bourgensis_MS2_uid171377 Methanoculleus_marisnigri_JR1_uid58561 Methanoplanus_petrolearius_DSM_11571_uid52695 Methanoregula_boonei_6A8_uid58815 Methanoregula_formicicum_SMSP_uid184406 Methanosphaerula_palustris_E1_9c_uid59193 Methanospirillum_hungatei_JF_1_uid58181 Methanosaeta_concilii_GP6_uid66207_C1 Methanosaeta_harundinacea_6Ac_uid81199_C1 Methanosaeta_thermophila_PT_uid58469 Methanococcoides_burtonii_DSM_6242_uid58023 Methanohalobium_evestigatum_Z_7303_uid49857_C1 Methanohalophilus_mahii_DSM_5219_uid47313 Methanolobus_psychrophilus_R15_uid177925 Methanomethylovorans_hollandica_DSM_15978_uid184864_C1 Methanosalsum_zhilinae_DSM_4017_uid68249 Methanopyrus_kandleri_AV19_uid57883 Pyrococcus_abyssi_GE5_uid62903_C1 Pyrococcus_furiosus_COM1_uid169620 Pyrococcus_furiosus_DSM_3638_uid57873 Pyrococcus_NA2_uid66551 Pyrococcus_ST04_uid167261 Pyrococcus_yayanosii_CH1_uid68281 Thermococcus_4557_uid70841 Thermococcus_AM4_uid54735 Thermococcus_barophilus_MP_uid54733 Thermococcus_CL1_uid168259 Thermococcus_gammatolerans_EJ3_uid59389 Thermococcus_kodakarensis_KOD1_uid58225 Thermococcus_litoralis_DSM_5473_uid82997 Thermococcus_onnurineus_NA1_uid59043 Thermococcus_sibiricus_MM_739_uid59399 Ferroplasma_acidarmanus_fer1_uid54095 Picrophilus_torridus_DSM_9790_uid58041 Thermoplasma_acidophilum_DSM_1728_uid61573 Thermoplasma_volcanium_GSS1_uid57751 SyntTax Report http://archaea.u-psud.fr/SyntTax page 1/8 Genomic contexts

Sulfolobus solfataricus 0203 0211 98 2 uid167998 0202 0204 0205 0206 0207 0208 0209 0210 0212 0213 0214 0215 0216 0217 0218 0219 0220 0221 0222 0223 Score: 81.88

Sulfolobus islandicus 2799 Y N 15 51 uid58825 C1 2793 2794 2795 2796 panB 2798 3285 2800 2801 2802 2803 2804 2805 2806 aksA 2808 2809 2810 2811 2812 2813 2814 Score: 81.88

Sulfolobus solfataricus 2398 P2 uid57721 2395 2397 2399 panB 2401 2402 2404 10340 2405 10342 2406 3 2408 10348 2409 2410 2411 2412 2413 2415 3 Score: 81.88

Sulfolobus islandicus L S 2 15 uid58871 0218 0219 0220 0221 panB 0223 0224 0225 0226 0227 0228 0229 0230 0231 0232 0233 0234 0235 0236 0237 0238 0239 Score: 81.88

Sulfolobus islandicus 0192 0195 L D 8 5 uid43679 C1 0191 0193 0194 0196 0197 0198 0199 0200 0201 0202 0203 0204 0205 0206 0207 0208 0209 0210 0211 0212 Score: 81.88

Sulfolobus islandicus LAL14 1 uid197216 0180 0181 0182 0183 0184 0185 0186 0187 0188 0189 0190 0191 0192 0193 0194 0195 0196 0197 0198 0199 0200 Score: 81.21

Sulfolobus islandicus 0188 REY15A uid162071 0187 0189 0190 0191 0193 0192 0194 0195 0009 0010 0196 0197 0198 0199 0011 0200 0201 0202 0203 0204 0205 0206 0207 Score: 81.21

Sulfolobus islandicus 0203 0193 0194 0195 0196 0197 0198 0205 0206 0207 0209 0210 0211 0212 0213 0214 HVE10 4 uid162067 0199 0200 0201 0009 0202 0204 0208 Score: 81.21

Sulfolobus islandicus 0189 panB 0192 0193 0198 0201 0202 0204 0205 0206 0207 M 14 25 uid58849 0187 0188 0190 0194 0195 0196 0197 0199 aksA 0203 0208 Score: 81.21

Sulfolobus islandicus 0189 0190 panB 0192 0193 0196 0198 0201 0202 0204 0205 0206 0207 M 16 27 uid58851 0187 0188 0194 0195 0197 0199 aksA 0203 0208 Score: 81.21

Sulfolobus islandicus 0208 0209 panB 0211 0212 0215 0216 0218 aksA 0221 0222 0223 0224 0225 0226 0227 M 16 4 uid58841 0206 0207 0213 0214 0217 0220 Score: 81.21

Sulfolobus islandicus 0191 0192 0193 0194 panB 0197 0200 0201 0202 0203 aksA 0205 0206 0207 0208 0209 0210 0211 0212 Y G 57 14 uid58923 0196 0198 0199 Score: 80.54

Sulfolobus acidocaldarius 04630 04625 04615 04610 04605 panB 04590 04585 04580 04565 aksA 04555 04550 04545 04540 04535 04530 04520 04515 04595 04575 04570 04525 N8 uid189027 04620 Score: 44.97

Sulfolobus acidocaldarius 0954 0953 0951 0950 0949 0946 0945 0944 0941 aksA 0939 0938 0937 0936 0935 0934 0933 0932 trxR panB 0947 0943 0942 DSM 639 uid58379 0952 Score: 44.97

Sulfolobus acidocaldarius 04615 04605 04600 04595 panB 04580 04575 04570 04555 aksA 04545 04540 04535 04530 04525 04520 04510 04505 04620 04585 04565 04560 04515 Ron12 I uid189028 04610 Score: 44.97

Sulfolobus acidocaldarius 04375 04370 04365 04360 04355 04340 04335 04330 04320 04315 04305 04300 04295 04290 04285 04280 04275 04270 04265 04380 04350 04345 04325 SUSAZ uid232254 04310 Score: 44.97

Sulfolobus tokodaii 0528 0529 0530 0532 panB 0535 0537 aksA 0539 0541 0542 0543 0544 0545 0534 0536 071 072 7 uid57807 073 0540 0546 Score: 44.97

Metallosphaera cuprina 1479 1489 1488 1487 1485 1480 1478 1477 1475 1474 1473 1472 1491 1490 1486 1484 1483 1482 Ar 4 uid66329 1481 1476 Score: 43.69

Acidianus hospitalis 2202 2199 2195 0021 2203 2205 2207 2208 2210 2189 2190 2192 2193 2194 2196 2197 0020 2198 2200 2201 2204 2206 2209 W1 uid66875 2191 Score: 39.53 SyntTax Report http://archaea.u-psud.fr/SyntTax page 2/8

Metallosphaera sedula 0614 panB 0622 DSM 5348 uid58717 0611 0612 0613 0616 0617 0618 0619 0620 0621 aksA 0624 0625 0626 0627 0628 0629 0630 0631 0632 Score: 39.53

Thermofilum 1910b 03375 03385 03405 03430 03365 03370 03390 03395 03400 03410 03415 03420 03425 03435 03440 03445 03450 03455 03460 03465 03470 03475 03480 03485 03490 uid215374 03380 Score: 27.38

Thermococcus CL1 1452 1446 1443 uid168259 1451 1450 1449 1448 1447 1445 1444 1442 1441 1440 1439 1438 1437 1436 1435 Score: 26.38

Pyrococcus furiosus 1885 1899 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 s043 t043 18961 1897 1898 1900 1901 1902 DSM 3638 uid57873 1886 Score: 25.84

Thermococcus 4557 09280 09260 09255 uid70841 09275 09270 09265 09250 09245 09240 09235 09230 09225 09220 09215 09210 09205 09200 09195 Score: 25.84

Pyrococcus furiosus 09160 09155 09130 COM1 uid169620 09150 09145 09140 09135 09125 09120 09115 09110 09105 10690 09100 09095 09090 09085 09080 09075 Score: 25.84

Pyrococcus abyssi 2069 2 2066 2065 2064 2063 2062 20621 2061 0282 like 30 06 2060 04 07 0284 2059 0285 2057 aksA GE5 uid62903 C1 2067 Score: 24.56

Pyrococcus NA2 0446 0448 0450 0451 0444 0445 0447 0449 0453 0454 2002 0457 0459 0460 0461 0462 uid66551 0452 0455 0456 0458 Score: 24.56

Pyrococcus yayanosii 08200 08160 08170 08180 08190 08210 08220 08260 t90 08270 08290 08300 08310 08320 08330 CH1 uid68281 08150 08230 08240 08250 08280 Score: 24.30

Thermofilum pendens 0572 0573 0575 R0026 0578 0579 0582 0583 0584 R0028 R0029 0585 0576 0577 R0027 t25 0580 0581 Hrk 5 uid58563 C1 0574 Score: 24.30

Thermococcus onnurineus 1352 1351 1350 1349 1347 1346 1345 1342 1339 1338 1337 1336 1335 1334 1332 1331 1344 1343 1341 1340 1333 1330 1329 NA1 uid59043 1348 Score: 24.30

Thermococcus kodakarensis 1335 1334 1333 1332 1328 1327 1326 1325 1323 rplX 1321 1338 1337 1331 1330 1329 1324 KOD1 uid58225 1336 Score: 24.03

Candidatus Nitrososphaera 13600 13610 13620 13630 13650 13680 13690 sdr2 trnA1 13720 mscL rpiA2 13760 gargensis Ga9 2 uid176707 13640 13660 13670 13750 Score: 24.03

Pyrococcus horikoshii 1850 1848 1849 1852 1853 1856 054 1858 43 1859 1861 1862 1863 1866 1865 1854 1855 18561 1857 OT3 uid57753 1860 1864 Score: 23.76

Thermococcus gammatolerans 0883 0870 0871 dppF dppD oppC oppD 0879 0880 0881 0882 0884 0885 0886 0888 0889 0890 0876 0877 srp19 EJ3 uid59389 0887 0891 Score: 23.76

Pyrococcus ST04 1775 1778 1779 1782 1784 1786 t0046 1788 1789 1776 1780 1781 1783 1785 1787 uid167261 1777 Score: 23.49

Thermococcus sibiricus 0920 0921 0923 0928 0929 0930 0931 0933 0935 0936 0922 0924 0925 0926 0927 0932 0934 MM 739 uid593990919 Score: 23.49

Methanoregula boonei 1987 1985 1988 1990 1992 1995 1996 1997 1998 1984 1986 1989 1991 R0038 1993 6A8 uid58815 1994 Score: 23.49

Thermococcus AM4 572 558 487 662 880 755 2320 2321 2322 700 463 859 712 2318 442 uid54735 609 562 Score: 23.49 SyntTax Report http://archaea.u-psud.fr/SyntTax page 3/8

Candidatus Caldiarchaeum C0117 C0118 C0114 C0112 C0110 subterraneum uid227223C0119 C0116 C0115 C0113 C0111 C0109 C0108 C0107 C0106 C0105 C0104 C0103 Score: 23.29

Thermoplasma volcanium 0301 0296 0302 09 0299 0298 0297 0295 0294 0293 0292 0291 0290 0289 0288 0287 0286 0285 GSS1 uid57751 0300 Score: 23.29

Thermococcus litoralis 14495 14500 14505 DSM 5473 uid82997 10980 14490 10995 11000 07054 07059 07064 07069 07074 07079 07084 07089 07094 07099 07104 07109 Score: 23.29

Thermoplasma acidophilum 1354 Tat39 1357 1352 1353 Tat38 1356 1358 1359 1359 1361 fur 1363 1364 1365 gcvH purA 1368 1370 DSM 1728 uid61573 1355 Score: 23.29

Aeropyrum pernix ligT K1 uid57757 cca topA 17961 1797 fbpA 17991 1799a pelA 1804 valS 1807 32 1808 1810 Score: 23.29

Aeropyrum camini cca fbpA topA 1125 1126 1128 1129 pelA valS 1132 32 1133 1134 1135 1136 SY1 JCM 12091 uid222311 ligT Score: 23.02

Pyrolobus fumarii 1A uid73415 1986 1985 1984 R0052 1983 1982 1981 1980 1979 1978 1977 R0051 1976 Score: 22.75

Thermococcus barophilus 01516 01512 01515 01519 01520 01521 01522 01523 01524 01525 01526 01527 01528 01529 01511 01513 01517 01518 MP uid54733 01514 Score: 22.75

Hyperthermus butylicus 0984 0979 0982 0983 0986 0989 0991 DSM 5456 uid57755 0980 0981 0985 0987 0988 0990 0992 Score: 21.95

Methanosphaerula palustris 1928 1930 1926 1931 1934 1937 1938 1939 1940 1941 1944 1945 E1 9c uid59193 1927 1929 1932 1933 1935 1936 1942 1943 Score: 20.94

Aciduliprofundum boonei 0496 0491 0490 0489 0488 0487 0486 0485 0484 0483 0482 0481 0480 0479 0478 0477 T469 uid43333 0497 0495 0494 0493 0492 Score: 20.94

Ignisphaera aggregans 1135 1134 1137 1133 1131 1130 1129 1128 1125 1124 1123 1122 1121 1119 1118 1117 1116 1115 1114 DSM 17230 uid51875 1136 1132 1127 1126 1120 Score: 20.94

Candidatus Nitrosopumilus 08350 08345 08395 08390 08385 08365 08360 08355 08340 08335 08330 AR2 uid176130 08405 08400 08380 08375 08370 Score: 19.87

Nitrosopumilus maritimus 1349 1348 1345 ectC 1342 1341 1337 1336 1335 1334 1333 1331 1330 1347 1346 1343 1340 1339 1338 SCM1 uid58903 1332 1329 Score: 19.87

Aciduliprofundum MAR08 0559 0543 0545 0546 0547 0548 0549 0551 0554 0556 0557 0560 0561 0562 0563 0564 0544 0552 0553 0555 339 uid184407 0550 0558 0565 Score: 19.87

Candidatus Nitrosopumilus 08315 08310 08305 08295 08290 08285 08275 08270 08265 08260 08255 08300 koreensis AR1 uid176129 08280 08250 Score: 19.66

Ignicoccus hospitalis 0301 0299 0302 0303 0304 0305 0306 0307 0309 0310 0312 0313 0314 0300 0308 KIN4 I uid58365 0311 Score: 19.13

Haloarcula marismortui galE 0451 yjlD1 manB2 0443 psmA3 0441 0440 0439 0437 traB arg1 udp2 cdd purM gyrB ATCC 43049 uid57719 C1 gyrA Score: 17.85

Thermosphaera aggregans 1226 1227 R0037 1229 1230 1231 1233 R0038 1234 1235 1236 1237 1238 1223 1222 1224 1225 1228 DSM 11486 uid48993 1232 Score: 17.85 SyntTax Report http://archaea.u-psud.fr/SyntTax page 4/8

Thermoproteus uzoniensis 0197 768 20 uid65089 0193 0194 0195 0196 R02 0198 0199 0200 0202 0203 0204 0205 0206 0207 0208 0209 Score: 17.32

Methanomassiliicoccus Mx1 Issoire uid207287 08235 08240 08245 08250 08255 08260 08265 08270 08275 08280 08285 08290 08295 08300 Score: 17.05

Desulfurococcus 1232 1234 1221n uid591331229 1230 1231 1233 1235 1236 R0035 1237 1238 1239 1240 1241 R0036 1242 1243 R0037 1244 Score: 16.78

Methanobacterium AL 2456 21 uid63623 2455 2454 2453 2452 2451 2450 2449 2448 2447 2446 2445 2444 Score: 16.78

Archaeoglobus profundus 0009 DSM 5631 uid43493 C1 0010 0008 0007 0006 0005 0004 0003 0002 0001 Score: 16.31

Desulfurococcus fermentans 1332 1335 R0040 R0041 DSM 16532 uid75119 1333 1334 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 R0042 1346 Score: 16.31

Vulcanisaeta moutnovskia 0546 0547 768 28 uid636310548 0545 0544 0543 0542 0541 0540 0539 0538 0537 0536 0535 0534 Score: 16.31

Pyrobaculum oguniense 0379 0378 0377 0375 0374 0373 0372 0368 0366 0364 0363 0362 0371 0370 0369 0367 0365 0361 TE7 uid84411 C1 0376 Score: 16.04

Pyrobaculum arsenaticum 1761 1754 1755 28 1758 1763 R0031 1764 1765 1766 1767 DSM 13514 uid58409 1753 argS 7 1760 1762 R0030 1768 Score: 16.04

Methanosaeta concilii 3405 3397 3406 3408 3413 3414 3415 GP6 uid66207 C1 3395 3396 3398 3407 3409 3411 3412 Score: 16.04

Halobacterium salinarum crcB2 cofG 3721 3714 trkA6 trh5 3706 crcB1 3699 3695 Leu 3696 ppiA trpD2 nirDL 3688 36871 36861 3686 Ile ftsZ3 nolA tmk 2 3711 R1 uid61571 C1 3719 Score: 15.77

Halobacterium NRC 1921 cofG 1935 ftsZ3 1927 pdhA1 trkA6 trh5 1920 1919 1918 1917 trn32 1916 trpD2 nirD 1910 1907 trn31 nolA tmk 1925 ppiA 1 uid57769 C1 1934 Score: 15.77

Halorubrum lacusprofundi 1858 1860 1861 1862 1863 1864 1866 1867 1868 ATCC 49239 uid58807 C1 1859 1865 1869 Score: 15.77

Haloquadratum walsbyi tauB tauC aldH 1970 1969 1968 pncB tauC tauA sfsA 1971 DSM 16790 uid58673 C1 1978 Score: 15.77

Haloquadratum walsbyi 36 27 1281 tif2a nop10 1286 AF B phzF 1290 A scpA smc 1280 1287 prt2 mtaP C23 uid162019 C1 AB Score: 15.77

Halovivax ruber 0717 0718 0722 0723 0725 0727 0728 0714 0716 0719 0720 0721 0724 0726 0729 0730 XH 70 uid184819 0715 Score: 15.50

Methanoculleus marisnigri 1896 1897 1898 pyrG 1901 1902 1904 1906 1907 1908 1909 1910 1900 1903 JR1 uid58561 1905 Score: 15.50

Methanopyrus kandleri 0853 0854 0856 0857 GAR1 0860 PhoU 0864 0867 0869 0870 CyaB 0851 0852 HisB 2 0861 AV19 uid57883 2 argJ 0866 0868 Score: 15.50

Methanomethylovorans 0724 0720 0719 0717 0716 0714 0713 0729 0727 0726 0725 0723 0722 0721 0718 0715 DSM 15978 uid184864 C1 0728 Score: 15.50 SyntTax Report http://archaea.u-psud.fr/SyntTax page 5/8

Natronomonas pharaonis 0268 0266 ubiA DSM 2160 uid58435serA C1 prf1 atpD 0260 tpa02 0256 0254 0252 0250 0248 15 Score: 15.50

Halorhabdus utahensis 1379 1378 1380 1381 1384 DSM 12940 uid59189 1382 1383 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 Score: 15.23

Methanocaldococcus infernus 1142 1145 1143 ME uid48803 1144 1141 1140 1139 1138 1137 1136 1135 1134 1133 1132 1131 1130 1129 1128 1127 1126 1125 Score: 15.23

Methanocaldococcus FS406 1060 1062 1058 22 uid42499 C1 1061 1059 1057 1056 1055 1054 1053 1052 1051 1050 1049 1048 1047 1046 1045 Score: 15.23

Methanosarcina barkeri A3014 Fusaro uid57715 C1 A3013 A3015 A3016 A3017 A3018 A3019 A3020 A3021 A3022 Score: 15.23

Thermoplasmatales archaeon 00758 00759 BRNA1 uid195930 00755 00756 00757 00760 00761 00762 00763 00764 00765 00766 00767 00768 00769 00770 Score: 15.23

Natronomonas moolapensis 2149 8 8 11 uid1901822147 pgi 2150 rps15 recJ2 2153 rps1e tmk 2156 trkA2 2158 cspA3 tfbA1 aldH1 2163 carA sopI htr1S 2167 Score: 15.23

Pyrobaculum 1860 1017 1025 1024 1023 1022 1018 1015 1014 1013 uid82379 1021 1020 1019 1016 Score: 15.23

Methanococcus aeolicus 1229 1231 1226 1227 1228 1232 1233 1234 1235 1236 1238 1239 1240 1242 1243 1244 1245 1246 1247 1248 Nankai 3 uid58823 1230 1237 1241 Score: 15.23

Methanococcus maripaludis 1366 1359 1360 1362 1364 1365 1367 1370 1371 1372 1373 1374 1375 1378 1379 1363 1368 1369 1376 1377 C5 uid58741 C1 1361 Score: 15.23

Methanococcus maripaludis 0639 0642 0643 0648 0649 0650 0651 0653 0655 C6 uid58947 0636 0637 0638 0640 0641 0644 0645 0646 0647 0652 0654 Score: 15.23

Methanococcus maripaludis 1315 1301 1317 1316 1314 1313 1312 1311 1310 1305 1304 1303 1299 1298 1297 C7 uid58847 1309 1308 1307 1306 1302 1300 Score: 15.23

Methanococcus maripaludis 01620 01615 01605 01600 01595 01590 01585 01580 01560 01550 01545 01540 01535 01530 01525 01520 01515 01510 01575 01570 01565 01555 X1 uid70729 01610 Score: 15.23

Haloferax volcanii 2662 2663 2664 hpcH hemG 2671 2672 2673 mntA hisD DS22661 uid46845 C1 thiD thiM thiE 2670 Score: 15.23

Methanothermococcus 0220 0219 0221 0222 0223 0224 0228 0230 0231 0232 0233 0225 0226 0227 IH1 uid51535 C1 0229 Score: 15.23

Methanococcus maripaludis 0313 0314 0312 0311 0310 0309 0308 0304 0302 hypA 0300 rpl15 beta 0296 thrB iorB1 0307 0306 0305 S2 uid58035 0303 0299 Score: 15.23

Natrialba magadii 1461 1455 1463 1462 1460 1459 1457 1456 1454 1453 1452 1450 1451 1449 1448 1447 ATCC 43099 uid46245 C1 1458 Score: 14.97

Methanocaldococcus 0916 0915 0911 0909 0908 0906 0905 0904 pnk 0914 0913 0912 0910 DSM 2661 uid57713 C1 0907 Score: 14.97

Methanoplanus petrolearius 0795 0797 0798 R0018 0801 0802 0803 0804 0805 0793 0794 0796 DSM 11571 uid52695 0799 0800 Score: 14.97 SyntTax Report http://archaea.u-psud.fr/SyntTax page 6/8

Ferroglobus placidus 1491 1497 1495 1494 1493 R0030 1492 1490 1489 1488 1487 1486 1485 1484 1483 1482 DSM 10642 uid40863 1496 Score: 14.97

Methanocaldococcus fervens 1190 1193 1191 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 AG86 uid59347 C1 1192 Score: 14.97

Methanocaldococcus vulcanius 0119 0118 0116 0113 0112 0109 M7 uid41131 C1 0117 0115 0114 0111 0110 0108 0107 0106 0105 0104 Score: 14.97

Cenarchaeum symbiosum 0270 0275 0274 0272 0268 0267 0265 A uid61411 0273 0271 0269 0266 0264 0263 0262 0261 0260 0259 0258 0257 0256 0255 0254 0253 0252 0251 Score: 14.70

Acidilobus saccharovorans 0168 0169 345 15 uid51395 R0005 0170 0171 0172 0173 0174 0175 0176 R0006 0177 0178 0179 0180 0181 Score: 14.70

Caldivirga maquilingensis 0512 19 0522 0513 0514 0517 0518 0519 ubiA 0521 0523 R0011 0524 dcd 0526 0527 0528 IC 167 uid58711 0516 Score: 14.70

Archaeoglobus veneficus 0781 SNP6 uid65269 0782 0783 0784 0785 0786 0787 0788 0789 0790 0791 0792 0793 0794 0795 Score: 14.70

Pyrobaculum neutrophilum 0857 0852 15 0856 0854 0853 0851 0849 0847 0846 0845 0844 0841 0840 0839 V24Sta uid58421 0858 0855 0850 0848 0843 0842 Score: 14.70

Methanohalobium evestigatum 0886 0885 0884 0883 0882 0881 0878 0875 0874 0873 0872 Z 7303 uid49857 C1 0880 R0023 0879 0877 0876 Score: 14.70

Methanosarcina mazei 0468 0469 0470 0471 0473 argJ 0475 0477 0478 0479 Go1 uid57893 0467 0472 argC Score: 14.70

Halorhabdus tiamatea 1058 1057 1056 1055 1053 1052 1051 1050 SARL4B uid214082 C1 1054 1049 Score: 14.50

Pyrobaculum aerophilum 1946 1943 1939 1935 1931 1929 1927 aroB 1922 1941 1936 1934 1932 1924 1921 IM2 uid57727 1944 Score: 14.50

Halalkalicoccus jeotgali pyrH 04290 04305 04310 04320 04325 04350 04355 04360 15276 04365 04370 04375 B3 uid5030504285 C1 04295 04315 04330 04335 04340 04345 Score: 14.50

Salinarchaeum laminariae gatA 14710 14700 14680 14670 14665 14660 14655 14720 14705 14695 Harcht Bsk1 uid207001 14715 Score: 14.50

Staphylothermus marinus 1398 1399 1400 1402 1403 1404 1405 1406 1407 1408 1409 F1 uid58719 1401 Score: 14.50

Methanothermobacter 293 294 295 296 297 298 301 302 303 304 305 307 308 309 299 300 306 Delta H uid57877 292 Score: 14.50

Methanococcus voltae 0854 0855 0856 0857 0858 0860 0863 0864 0865 0866 0867 A3 uid49529 0859 0861 0862 Score: 14.50

Methanothermobacter 07680 07580 07610 07620 07660 07670 07700 07710 07560 07600 07570 07590 07630 07640 07650 Marburg uid51637 C1 07690 Score: 14.50

Methanosaeta thermophila 0864 0861 0860 0859 0857 0856 0868 0866 0865 0863 pyrG 0858 PT uid58469 0867 Score: 14.50 SyntTax Report http://archaea.u-psud.fr/SyntTax page 7/8

Methanosaeta harundinacea 1712 1713 6Ac uid81199 C1 1714 1715 1716 1717 1718 1719 1720 Score: 14.50

Nanoarchaeum equitans 041 t04 034 t03 Kin4 M uid58009 039 037 37 036 035 033 032 031 030 029 027 028 026 025 024 023 021 Score: 14.23

Methanohalophilus mahii 1809 1812 DSM 5219 uid473131808 1810 1811 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 Score: 14.23

Candidatus Korarchaeum 0094 0093 0096 0099 0095 0097 0098 0100 0101 0102 0103 0104 0105 cryptofilum OPF8 uid586010092 Score: 14.23

Vulcanisaeta distributa 0799 0802 0803 DSM 14429 uid52827 0800 0801 0804 0805 0806 0807 0808 0809 0810 0811 0812 0813 0814 0815 0816 0817 Score: 14.23

Caldisphaera lagunensis 0077 DSM 15908 uid183486 0080 0079 0078 0076 0075 0074 0073 0072 0071 0070 0069 0068 Score: 13.96

Desulfurococcus mucosus 1365 1361 DSM 2162 uid62227 1366 1364 1363 1362 1360 1359 1358 R0049 1357 1356 1355 1354 1353 1352 1351 1350 1349 1348 1347 1346 1345 Score: 13.96

Methanococcus vannielii 1092 1091 1090 1084 R0027 R0026 1083 1081 1080 1079 1078 1077 1076 rplX 1074 1073 SB uid58767 1089 1088 1087 1086 1085 1082 Score: 13.96

Methanobrevibacter smithii 0532 0527 0528 0535 0539 1803 1804 0526 0529 0530 0531 0533 0534 0536 0537 0538 0540 1805 ATCC 35061 uid588270525 Score: 13.96

Ferroplasma acidarmanus 000010021 000010020 000010018 000010017 000010016 000010015 000010014 fer1000010022 uid54095 000010019 000010013 Score: 13.96

Thermogladius 1633 1150 1142 1155 1153 1148 1145 1144 1143 1141 1140 1139 uid1674881156 1154 1152 1151 1149 1147 1146 1138 Score: 13.69 SyntTax Report http://archaea.u-psud.fr/SyntTax page 8/8 Query protein sequence: maevvrayil vsttvgkeme vadmakkvsg viradpvyge ydvvveveak ssddlkkviy eirr npniir tvtlivm Genomes without synteny: Staphylothermus_hellenicus_DSM_12710_uid45893 Fervidicoccus_fontis_Kam940_uid162201 Pyrobaculum_calidifontis_JCM_11548_uid58787 Pyrobaculum_islandicum_DSM_4184_uid58635 Thermoproteus_tenax_Kra_1_uid74443 Archaeoglobus_fulgidus_DSM_4304_uid57717 Archaeoglobus_sulfaticallidus_PM70_1_uid201033 Haloarcula_hispanica_ATCC_33960_uid72475_C1 Haloarcula_hispanica_N601_uid230920_C1 Haloferax_mediterranei_ATCC_33500_uid167315_C1 Halogeometricum_borinquense_DSM_11551_uid54919_C1 Halomicrobium_mukohataei_DSM_12286_uid59107_C1 Halopiger_xanaduensis_SH_6_uid68105_C1 Haloterrigena_turkmenica_DSM_5511_uid43501_C1 Natrinema_J7_uid171337_C1 Natrinema_pellirubrum_DSM_15624_uid74437_C1 Natronobacterium_gregoryi_SP2_uid74439 Natronococcus_occultus_SP4_uid184863_C1 Methanobacterium_MB1_uid231690 Methanobacterium_SWAN_1_uid67359 Methanobrevibacter_AbM4_uid206516 Methanobrevibacter_ruminantium_M1_uid45857 Methanosphaera_stadtmanae_DSM_3091_uid58407 Methanothermus_fervidus_DSM_2088_uid60167 Methanotorris_igneus_Kol_5_uid67321 Methanocella_arvoryzae_MRE50_uid61623 Methanocella_conradii_HZ254_uid157911 Methanocella_paludicola_SANAE_uid42887 Methanocorpusculum_labreanum_Z_uid58785 Methanoculleus_bourgensis_MS2_uid171377 Methanoregula_formicicum_SMSP_uid184406 Methanospirillum_hungatei_JF_1_uid58181 Methanococcoides_burtonii_DSM_6242_uid58023 Methanolobus_psychrophilus_R15_uid177925 Methanosalsum_zhilinae_DSM_4017_uid68249 Methanosarcina_acetivorans_C2A_uid57879 Methanosarcina_mazei_Tuc01_uid190185 Picrophilus_torridus_DSM_9790_uid58041 Figure S3

10 20 30 40 50 ....|....| ....|....| ....|....| ....|....| ....|....| Lrp 1 MVDSKKRPGK DLDRIDRNIL NELQKDGRIS NVELSKRVGL SPTPCLERVR sso10340 1 ------Clustal C

60 70 80 90 100 ....|....| ....|....| ....|....| ....|....| ....|....| Lrp 51 RLERQGFIQG YTALLNPHYL DASLLVFVEI TLNRGAPDVF EQFNTAVQKL sso10340 1 ------M AEVVRAYILV STTVGKEMEV ADM---AKKV Clustal C : : .:: : : . * . :: .:*:

110 120 130 140 150 ....|....| ....|....| ....|....| ....|....| ....|....| Lrp 101 EEIQECHLVS GDFDYLLKTR VPDMSAYRKL LGETLLRLPG VNDTRTYVVM sso10340 29 SGVIRADPVY GEYDVVVEVE AKSSDDLKKV IYE-IRRNPN IIRTVTLIVM Clustal C 12 . : ... * *::* :::.. . . . :*: : * : * *. : * * :**

160 ....|....| .... Lrp 151 EEVKQSNRLV IKTR sso10340 77 ------Clustal C

Figure S4

1 Sso 0438 peak area(380600-380750) nr 220

AAATAGTTACACTTATCTCCTAAAGGCATTGGTTTCTGCTCATCAGTAAAGGAGGGCTGTCTTTCAAAGCCTCT ATTTCC

GTTCCTCAAACTACTTATACCCCCAAACTTTTAGACAATCATATTAAACTTACATTTTGACCCATTTAAACATT ATGTATCTTATACCTTCACTTTTACGATCCACTAGAGCCAATACTAATCTCTTTTTAGTGCTATGTGAGACTCT ACCGAAACTCAATAGTTCATGAG

Primer EMSASso0438 S 5 CAAAGCCTCTATTTCCGTTC 3 TM 53.8 EMSAsso0438 A 5 CGGTAGAGTCTCACATAGCA 3 TM 55.7 168bp

2 Sso 0570 peak area (502350-502600) nr192 (with rps26E together transcription)

GTTGTGATCAATGTGGTGCTAGAGTACCAGAGGATAAGGCAGTATGTGTAACAAAAATGTATAGCCCCGTGGAT GCTTCT CTAGCATCTGAATTAGAAAAGAAGGGTGCAATAATTGCTAGATATCCTGTAACTAAGTGTTACTGTGTGAATTG TGCGGT

ATTTTTGGGTATTATTAAGATAAGAGCAGAAAATGAGAGAAAGCAAAAAGCTCGTTTAAGATAGGCTTTTAAAC CTTTAG

TCAGAATATGTGATGAAATGAGACTTTATGAATTATCTTTTGCACAAATTGAAGATTTTTTCTATAAACTAGCA GAAGTTAAAGATATTATAAAAGATCATGGTCTATTAG

Primer

EMSAsso0570 S 5 GTTGTGATCAATGTGGTGCTAG 3 TM 57.4 EMSAsso0570 A 5 CGCACAATTCACACAGTAACA 3 TM 57.5 158bp

3 Rp144e peak area (907700-908000) nr 195

TATATAATGATCACAGGGTTAATGAGGGAGATGTTTTGGTTTTACCTATGAGAGAAGCCTTGCCATTAATAATA GCAAGT

TATTTAACTCCCTATAAGATAGATATTGAAGAACAATTATGAAAGTCCCTAAGGTCATCAGCACATATTGTCCA AAGTGTAAGACTCATACAGATCACTCTGTATCACTATACAAGAGCGGTAAGAGAAGAAATCTCGCTGAAGGACA GAGAAGATATGAGAGAAAGAATATTGGATATGGAAGTAAAAGAAAACCAGAACAGAAGAGATTTGCAAAAGTT

Primer

EMSArp144e S 5 GCAAGTTATTTAACTCCCTATAAG 3 TM 53.2 EMSArp144e A 5 CTCTCATATCTTCT CTGTCCTTC 3 TM 55.2 168bp

4 Sso 1758 peak area (1592900-1593150) nr 213

AGACGACCTCAAATGACCTGCCTTTCTCTATGATACCAACAGTATCAACGATCGTACCGTGTTCAACCCTCTTA GGATCTACACCATGAAGTTCTAAAGCTTTCCTATAAAGTTTTAACATTGTCCTTTCAAACCCTTTACCGGCCCT AGAAGTAAAACT

ACCTAACTCAACTGAAAACTTCTTTTGACTCCTAACTAGTCTAAGAATGATCCTGGAATGTCTTTCAATTCCTT TTTGGGAGAGAGACTATTGCTTCTCCTTGCTTTTTGACTTCTTGCTGTAATGACGTAATAGCCTCACTGTGCCT CTTAACCTCTTCTTGCAAGGATCTTATTGTCTCTTGTAAAAGCTGAATGCTTTTAGTATTTTCTTCCAATCTCT TTATCACTATTTCATCC

Primer

EMSAsso1758 S 5 GAATGATCCTGGAATGTCTTTC 3 TM 54.5 EMSAsso1758 A 5 GGAAGAAAATACTAAAAGCATTCAG 3 TM 55.2 171bp

5 Sso 1879 peak area (1692550-1692750)

GGTTATCCACCTTATGGATACCTTGTAACGATAATATTAAATATTGAGGAGCTGTCAGACGCATTGCGAAGTTT AGCTGAGTACTTACGCTCCATAATGAATTAACAGAATAAGGTTATAGAAGCGGAATGATAAATGGCTAACTTTA TCACCTCAATAC

ATTAAATCTATATTGTGACATTTGACGACTTAAATAAGTTAATCAGAGAGAAACTTAGCGTAGAAACGTACCCT TATCAAAAGTACATCAG

Primer

EMSAsso1879 S 5 CAGACGCATTGCGAAGTTTA 3 TM 56.5 EMSAsso1879 A 5 CGCTAAGTTTCTCTCTGATTAAC 3 TM 54.9 165bp

6 Rpom-2 peak area (1731850-1732040) nr 105.8 (with sso1913 together transcription)

TGAGCTACATTCTCCTTTAATATTACTTTCTCTACATCATGATCAGAATACCCACACTTACTACAAACCATTTT GTTACCTTTTACCTTCAAGAAAGAACCACATTTAGGGCAAAAACGCATATCTAGAAATGAGTCTTTTAATCATA AAAGCTTGCGGC

TAACATACTAACTCCGGGCCAAGGTGTGTCA

Primer

EMSArpom-2 S 5 GATCAGAATACCCACACTTAC 3 TM 53.3 EMSArpom-2 A 5 GAGTTAGTATGTTAGCCGCA 3 TM 54.3 134bp

7 Sso1984 peak area (1796450-1796650) nr 363

TGTACCAAATGGTCAAGCAATAAATTATAATGGACATACAGATCCTGTGGTGATTTAATACTAAGTAATGGAAC TATGATACAAAATGTGGTATGGGATGGACAATATGCAGGTACAATAATTCAAAATCATTACCAAATAGTTCAAT TGAATGATGAATGGGTAGGAAGAACCGACCCAGTGAATAATCAACAATATGTA

Primer

EMSAsso1984 S 5 GGTCAAGCAATAAATTATAATGGAC 3 TM 55.0 EMSAsso1984 A 5 CCTACCCATTCATCATTCAATTG 3 TM 55.9 157bp

GCTTAAAGGGTTTGTTTGTTTTGGTGGTGTTATGGGTGCGTTGAAGTACGTGGCCATTGGCGTTGTGGTTTTCG CCACCACTGTGTTTTACTACTACCGACACCGGGTGCCTGTGCAGTACGTGGGTTCACCCAGCGGTTATGAGGCA TTTGTACCAAATGGTCAAGCAATAAATTATAATGGACATACAGATCCTGTGGTGATTTAATACTAAGTAATGGA ACTATGATACAAAATGTGGTATGGGATGGACAATATGCAGGTACAATAATTCAAAATCATTACCAAATAGTTCA ATTGAATGATGAATGGGTAGGAAGAACCGACCCAGTGAATAATCAACAATATGTAACTTTTCAAGATTTCTACG TGATCAAGGGTCAAGTACCAATTGAGAATGTAACAATAAATGGACAGACGTATTACGTGATAGATGCGGATAAA ATAAACCCAGCGGACATCGCGGGATTTTTCACATACTGGAGATGGGTAAACAACTTC

FPsso1984 S 5 GCTTAAAGGGTTTGTTTGTTTTG 3 TM 55.4

FPsso1984 A 5 GAAGTTGTTTACCCATCTCCAG 3 TM 56.5 501bp

8 Sso 2102 peak area (1923900-1924100) nr 143

GAATGCGAAATATGTAAAGCAATAGTTTCAGTATTATGTGGACTATTAGCTGAAGGAGTAGCTAAAAGTGTGGC ATGTGACGAAGCTTGTGGAACAGTTTGCTTAATATTTGTTGAGGATCCCATTATTTATGATATTTGTGTGGTAA TATGTATACCTTCTTGTGATGAACTACTTCAACTAATTATCTCAATAGGAGTAGCGACTGCATGTGGACTAGGT GGTGAGTATCTATGTCAA

AAGGCTGGTCTGTGTTGCTAATAGATTTTTTTTAGATATGTGGTTATAAATAGCTAAGGAGGGAGGGATAGAA ATGGAAAGAAAGGGGACAGAAATAGAAAGAAATAAGATATCTTTTTTAAAGTGGCTTGAGCTAACATTATTATT TGTGATTTTGCCTT

Primer

EMSAsso2102 S 5 GTGGACTATTAGCTGAAGGAG 3 TM 54.6 EMSAsso2102 A 5 CAGTCGCTACTCCTATTGAGATA 3 TM 56.5 171bp

9 Sso 2626 peak area (2391350-2391550) nr 167.4

ATGAGATAACTAAACAATTGAGAGATGAAGCTGACAAATTACAACAACCCCTTAAGAAGTATATTGGGCTTGTT CACAAT

GTAGGTGGCACAGGTCACTTTGCATATGTTATGATTCTAAGAAGGTGACCTTAGATGCAAATAGATGCAATACC GTTATCAATAAAATATAAGATCAAATACCCGGACGAATTTATTGAAGCAGTTAAGAGGGGAGAAATTGTAGCTA CCAAATGTAAAAATTGTGGTTCC

Primer

EMSAsso2626 S 5 CAACAACCCCTTAAGAAGTATATT 3 TM 54.5 EMSAsso2626 A 5 CTCCCCTCTTAACTGCTTCAAT 3 TM 57.5 175bp

Pacs

10 Pacs peak aera (2411750-2411950) nr 1080

CTTCTCAGTGGCCATGTGGCATTCCCTTAGGTCCATTCCTTAAATACTCTTCCGGATTTCTCTGAAACTCCCTT AGACAA

TGAGATGAGCAGAAATAGTAGATTTTTCCCTTATACATTGTCTTATATTGACTTTTCTCATCTACTTCCATTCC ACAAACCGGATCGATTATCATAATACTAATTATTGCTTTTATTATAAAGAACCTTTTCTTCTGTAAATTTGTAT CTATATATTGTTGATTACAGTCAAAATGACTCTAAAACTAATGTAAGTGCAAGCCATTGTTGCGTGCAAATTTT TCCTTTAATCCAGACAAGCAAGAATTGCAACAAGCATAATACGTTTTCC

Primer

EMSApacs S 5 GTCTTATATTGACTTTTCTCATCTAC 3 TM 54.0 EMSApacs A 5 GCTTGTCTGG ATTAAAGGAAAA 3 TM 54.4 201bp

CCGGATTTCTCTGAAACTCCCTTAGACAATGAGATGAGCAGAAATAGTAGATTTTTCCCTTATACATTGTCTTA TATTGACTTTTCTCATCTACTTCCATTCCACAAACCGGATCGATTATCATAATACTAATTATTGCTTTTATTATA AAGAACCTTTTCTTCTGTAAATTTGTATCTATATATTGTTGATTACAGTCAAAATGACTCTAAAACTAATGTAA GTGCAAGCCATTGTTGCGTGCAAATTTTTCCTTTAATCCAGACAAGCAAGAATTGCAACAAGCATAATACGTTT TCCTTCCTACCTTATAGGTCAAGGGATTATCATATATTTCCTTACCGCAGTAATCACATTTAACTACAAAACTGC TCTTGCCTATGAAAACTAAATCCTTTTCTTTTACATTTTTTAACTTATTTTCTAAATCTTCTAAAGTGTTAGCTC TTATCATGTTAATAAATCTACCATCCAACAGTTTGTAACACTCGTCACTTTGA

FPpacs S 5 CTCTGAAACTCCCTTAGACAATG 3 TM 56.4 FPpacs A 5 CAAACTGTTGGATGGTAGATTTA 3 TM 54.3 474bp

Adh-11

11 Adh-11 peak area (2470250-2470400) nr 120

ATAAAAAATACCTTCCACAATCTTAGCTCACCTTTATTTGCCTATAGTTAGTTATTATTACTGTTCCGATGAAG TATATA

AGACCTATCCCAATTAAGATTTCCATTCCTAAGAGATATGCTTTGGTGACTTCCACAGTAGCACCAAATACTAT TGGTAC TATTATACCCCATAGAGTTTCCCAAAAGCCAATATGACCACCAAATTGACCAGCTAATTCAGTAGAAACTAGAA AGGATGGTGCTGCCCATTGTATCCCCGACCATCTTAGGAAGAAGAAAGTTACCATTAGTATTTCAAC

Primer

EMSAadh-11 S 5 GAGATATGCTTTGGTGACTTCC 3 TM 56.0 EMSAadh-11 A 5 CCTAAGATGGTCGGGGATAC 3 TM 55.8 160bp

12 Sso 3178 peak area (2923900-2924050) nr 206

TTTCTTTAAAGCTCTATCACATGTAGTCCAGTTCTCATATAGTTCCTTATCCTCTTTTCTTAGCAGTCATATTG ACTTATGAATTATAATCAACGACGTTTTTGGGAGTAAAGGTTAAGTTTGCTCACATTTCTACTTTAAGAGAGAG TAAATTAAATTAACTTTCCTTTAAATTTTCTAATGCTTTGAGGTTCTATGCAAGAGCTGGTTTACAATGGTATA TAGTTAAAAGAGAGGGCTGAAAATGAAACTTGGAGTGGAAAAATTGTTGCCAGCTTATTAAAAACCAAGTGGAG CATTATCGTAGCTGAGAAAACATAGTCCAGTCCTTAATTCTAGCTTTAGTGCGACGTATTTTGTGAAAAAATCG CTTGATTTTTGCATGTTTCAAAACATTTTTA

Primer

EMSAsso3178 S 5 GCTCTATCACATGTAGTCCAG 3 TM 55.1 EMSAsso3178 A 5 GCATAGAACCTCAAAGCATTAG 3 TM 55.2 188bp

13 Sso3189 peak area (2935250-2935550)

AAGTAAATTATGAACGATTATCTAATCATACTTTTGCAATGGATAAGTAAATTTAAATAGGGGAATTTAGAAGA TAGCGT

GTGAGTAACAAAAATAGAATATTCGTAAGGGAGACTTCTGGTTTAATAAAGAACGTATCATTATGGGATGCAGT TGCACTCAATATAGGCAATATGTCAGCTGGAGTAGCATTATTTGAATCAATATCACCATATGTACAACAAGGAG GAGTATTGTGGCTGGCTTCATTAATAGGCTTCATCTTCGCTATACCACAACTGTTAATTTATGTATTTTTAAC

Primer

EMSAsso3189 S 5 GGGGAATTTAGAAGATAGCGT 3 TM 54.5 EMSAsso3189 A 5 CAGCCACAATACTCCTCCTT 3 TM 56.2 183bp

14 Sso 12210 peak area (2966200-2966400)

TTTGCACGCTAACTTCAACTGGGCTTTTAACGCCCTAAGGGTTTGTTCGTCAGTGTATGCACGGAAGCGAAACC CTAAGGTGGGTATTGATGTGTATTTTGTTATTTTCCTATTTTTAACTTTTCTACAAAGGGATTCATCTCAAAGA GGCGAAGTTTTCCGCCCCCTTTGAACCCCCGTCTGTTATAAACATAATACGCAATCATAGGTCAGATTGACTAC AGATGATAGCTTATATGGCTGAAATGTAGTATAAAAATACCTAAGACGTACTGGTGTTTACCCGTGGCGTAACT TCTGC

Primer

EMSAsso12210 S 5 CTTTTCTACAAAGGGATTCATCTC 3 TM 55.6 EMSAsso12210 A 5 GGTAAACACCAGTACGTCTT 3 TM 54.5 204 bp

Area 505950-506100 151bp

CTCCTAAAAATAGCGAACCGCCACTCTTTCTTGCAAATTCTATAATACCTTCTGCTTGTTCCCTTCTAACATAA ATTTTTCCCTCTATATCCTTTCTACTCTTCATCTCAATTAAAATAATAACGCCATTCTTTAAAGCGATAATATC CGGTATAGGGTC

Primer

EMSA no binding S 5 CTCCTAAAAATAGCGAACCG 3 TM 53.5 EMSA no binding A 5 GACCCTATACCGGATATTATCG 3 TM 54.5 160bp

Unveiling Cell Surface and Type IV Secretion Proteins Responsible for Archaeal Rudivirus Entry

Ling Deng, Fei He, Yuvaraj Bhoobalan-Chitty, Laura Martinez-Alvarez, Yang Guo and Xu Peng

J. Virol. 2014, 88(17):10264. DOI: 10.1128/JVI.01495-14. Downloaded from Published Ahead of Print 25 June 2014.

Updated information and services can be found at: http://jvi.asm.org/content/88/17/10264 http://jvi.asm.org/

These include: SUPPLEMENTAL MATERIAL Supplemental material

REFERENCES This article cites 32 articles, 11 of which can be accessed free on November 2, 2014 by Copenhagen University Library at: http://jvi.asm.org/content/88/17/10264#ref-list-1

CONTENT ALERTS Receive: RSS Feeds, eTOCs, free email alerts (when new articles cite this article), more»

Information about commercial reprint orders: http://journals.asm.org/site/misc/reprints.xhtml To subscribe to to another ASM Journal go to: http://journals.asm.org/site/subscriptions/ Unveiling Cell Surface and Type IV Secretion Proteins Responsible for Archaeal Rudivirus Entry

Ling Deng, Fei He, Yuvaraj Bhoobalan-Chitty, Laura Martinez-Alvarez, Yang Guo, Xu Peng Archaea Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark Downloaded from Sulfolobus mutants resistant to archaeal lytic virus Sulfolobus islandicus rod-shaped virus 2 (SIRV2) were isolated, and muta- tions were identified in two gene clusters, cluster sso3138 to sso3141 and cluster sso2386 and sso2387, encoding cell surface and type IV secretion proteins, respectively. The involvement of the mutations in the resistance was confirmed by genetic comple- mentation. Blocking of virus entry into the mutants was demonstrated by the lack of early gene transcription, strongly support- ing the idea of a role of the proteins in SIRV2 entry.

o date, relatively few archaeal viruses have been character- http://jvi.asm.org/ Tized, and most of those that have been characterized infect acidothermophilic members of the order Sulfolobales. Despite their limited number of around 50 species, they exhibit consider- ably greater morphological diversity than the more extensively characterized bacteriophages, about 95% of which show head-tail morphologies. Archaeal viruses, in contrast, exhibit fusiform shapes, often with one or two tails, bottle shapes, bearded-globu- lar forms, and a wide variety of rod-like and filamentous morpho- on November 2, 2014 by Copenhagen University Library types which often carry small terminal appendages (1–3). This morphological diversity suggests that the archaeal viruses may employ a variety of mechanisms to enter their hosts, but current insights into entry mechanisms are limited to an OppA trans- porter protein, Sso1273, possibly providing a receptor site for the Acidianus two-tailed virus (ATV) in Sulfolobus solfataricus P2 (4). And very recently, microscopic studies suggested that Sulfolobus islandicus rod-shaped virus 2 (SIRV2) enters the host cell by at- taching and moving through a pilus-like filament; however, the nature of the structure and the identity of the involved proteins remain elusive (5). Sulfolobus solfataricus P2 is an acidothermophilic crenar- chaeon that can host a wide range of archaeal viruses, many of which are propagated stably (1, 3, 6). Moreover, few of the viruses appear to induce cell lysis, possibly reflecting a need to minimize contact with the harsh hot acidic environment. However, recent studies have identified a few viruses that can enter a lytic phase, including the Sulfolobus turreted icosahedral virus (STIV), the FIG 1 (A) Growth retardation of S. solfataricus 5E6 upon SIRV2 infection. (B) two-tailed fusiform (ATV), and, more recently, the rudivirus Resistance of S. solfataricus 5E6R to SIRV2. OD600, optical density at 600 nm. SIRV2 (7–9). SIRV2 is classified in the family Rudiviridae together with other well-characterized viruses, including SIRV1 (10, 11), ARV1 (12) The surviving cells (named 5E6R) appeared to be resistant to and SRV1, (13), all of which are rod shaped and lack an envelope, SIRV2 because, in contrast to the sensitive 5E6 strain, no growth and their genomes consist of linear double-stranded DNA with covalently closed ends (10, 14, 15). In a recent microarray analysis of S. solfataricus infected with SIRV2, we demonstrated that the Received 24 May 2014 Accepted 16 June 2014 viral genes were activated at different times and that mainly stress- Published ahead of print 25 June 2014 response host genes and those implicated in vesicle formation Editor: A. Simon were downregulated (16). The results also illustrated that SIRV2 Address correspondence to Xu Peng, [email protected]. infection at a multiplicity of infection (MOI) of 30 resulted in Supplemental material for this article may be found at http://dx.doi.org/10.1128 growth inhibition of S. solfataricus 5E6 (16). In the present exper- /JVI.01495-14. iment, the culture was infected at a lower MOI (Ͻ1) which also led Copyright © 2014, American Society for Microbiology. All Rights Reserved. to a growth retardation, but the infected culture could enter the doi:10.1128/JVI.01495-14 exponential-growth phase at 80 h postinfection (p.i.) (Fig. 1A).

10264 jvi.asm.org Journal of Virology p. 10264–10268 September 2014 Volume 88 Number 17 SIRV2 Entry in Sulfolobus Downloaded from

FIG 2 Analysis of the pyrEF mutants derived from S. solfataricus 5E6. (A) Types and insertion sites of transposons inserted in the pyrEF gene region of different pyrEF mutants. (B) PCR amplification of the pyrEF region from Sens1 to Sens10 and from Res1 to Res10. wt, wild type. http://jvi.asm.org/ retardation was observed when 5E6R was diluted and infected terspaced short palindromic repeat (CRISPR) loci A, B, C, and D with SIRV2 at the same MOI (Fig. 1). were all lost from the 5E6 host strain (16), the residual CRISPR In order to manipulate the SIRV2-sensitive S. solfataricus 5E6 loci E and F, which lack the spacer acquisition cas genes, were strain genetically (17), 10 pyrEF mutants, labeled Sens1 to Sens10, unlikely to be responsible for the resistance (21, 22). Therefore, we were isolated from Gelrite plates containing 5-fluoroorotic acid inferred that resistance arose as a result of mutated host genes that (5=-FOA). Their mutation sites in the pyrEF gene region were are important for the SIRV2 life cycle. To identify such mutations,

identified by a combination of PCR amplification, restriction di- the genomes of strains Sens1 and Res1 were resequenced by the on November 2, 2014 by Copenhagen University Library gest analysis, and sequencing (17). All the mutations were shown use of a Hiseq 2000 sequencer, yielding about 200-fold coverage. to result from transposon insertions, either IS elements or minia- The sequencing reads of both strains were aligned with the ture inverted terminal repeat elements (MITEs), and the inser- genome sequence of S. solfataricus P2 (23) using the R2R program tions occurred in the coding sequences or within the single pro- (24) to identify mismatches as well as insertions and deletions. moter (Fig. 2A). These results are consistent with the previous Mutations to the P2 genome in the resequenced genomes of Sens1 reports demonstrating high transposition activity in S. solfataricus and Res1 were then compared manually. Only one mutation was and its contribution to chromosomal plasticity (18–20). Follow- detected and constituted a single insertion of ISC1078 into Res1 ing the procedure described above, SIRV2-resistant cultures were but not Sens1. The insertion was localized in sso3139, a gene en- generated for each of the pyrEF mutants. Single colonies were then coding a conserved hypothetical protein lying within an operon produced from the cultures by streaking onto Gelrite plates to (Fig. 3A). yield the purified resistant strains Res1 to Res10. The stability of Next we tested whether other resistant strains also carried mu- the transposon insertions in the pyrEF genes was tested for each of tations in sso3139 or in other genes of the same operon by employ- the 10 pyrEF mutants (Sens1 to Sens10) and their corresponding ing a primer pair (5=-GCTACGCTTCTAACAAACCTAATCTG SIRV2-resistant colonies (Res1 to Res10) by growing them in rich and 5=-CGAAACTTGCGAAACAACTACCT) designed to am- media containing uracil (17) for 3 days without transfer, prior to plify the whole operon region. After PCR amplification, restric- total DNA extraction and PCR amplification of the pyrEF regions. tion digestion, and sequencing, another 5 strains were shown to Each transposon insertion appeared to be stable, because no wild- contain mutations at different locations within sso3139 or the ad- type PCR bands were observed, except a weak wild-type band jacent sso3140 (Fig. 3A). Interestingly, all the 6 mutations were produced in Sens2, consistent with the undetectable reversion produced by ISC1078 insertion (Fig. 3A) and appeared to be stably rates for Sulfolobus transposons recorded earlier (19, 20). Since maintained (Fig. 3C). Res2 did not generate the wild-type band, the extra PCR product To identify possible mutations in the other 4 resistant strains, in Sens2 was probably due to a minor contamination of the colony genome resequencing followed by PCR analyses of relevant genes by wild-type cells (Fig. 2B). was performed (using primers 5=-GAGTCTGGGGAAAATCGGT Sens1, Sens3, Sens7, and Sens8 were selected for a transforma- AAAGTT and 5=-TGGCATTGTAACCCTAATTGCTTCT). tion test because they carried different transposons located at dif- These revealed IS element insertions in sso2387 of Res2 and Res10 ferent insertion sites (Fig. 2A). Shuttle vector pEXA was used for (Fig. 3B). Sso2387 in Sens2 and Sens10 contains 577 amino acids transformation (21), and water was used in the negative control. (aa) but only 283 aa in the sequenced S. solfataricus P2 genome While Sens7 and Sens8 appeared unstable after electroporation, (23). An analysis of the sequences around sso2387 in S. solfataricus Sens1 and Sens3 yielded transformants without colony formation P2 revealed that it is a partial gene that resulted from an ISC1225 in the negative control. Thus, we focused on Sens1 and its resistant insertion (Fig. 3B), which could explain the resistance of the wild- mutants for further studies of SIRV2 susceptibility. type P2 strain to SIRV2 (13 and this work). Interestingly, an in- The SIRV2-resistant cells were enriched directly from the version in sso2386 was detected in Res7 whereas no mutations SIRV2-sensitive culture; therefore, the only selective pressure ap- were identified in Res9 that could be linked to SIRV2 resistance peared to occur either upon SIRV2 infection or during virus-in- (Fig. 3B). duced cell lysis. Moreover, since the active clustered regularly in- The frequently observed mutations in cluster sso3139 and

September 2014 Volume 88 Number 17 jvi.asm.org 10265 Deng et al. Downloaded from http://jvi.asm.org/

FIG 3 Different mutations in the SIRV2-resistant strains and their stability. (A) Transposon insertions in sso3139 and sso3140. (B) Mutations in sso2386 and sso2387. (C) PCR amplification of the mutation region from different resistant strains. on November 2, 2014 by Copenhagen University Library sso3140 and cluster sso2386 and sso2387 in the resistant strains tain two transmembrane helices, one at the N terminus and the strongly suggest that the two gene clusters are important for the other at the C terminus, while the sequence between them was SIRV2 life cycle. To confirm the implication of the mutations presumed to be located intracellularly. Therefore, it appears that in the gained resistance, genetic complementation was per- the proteins encoded in the operon form a membrane-associated formed for the mutated genes. As described above, Sens1 ap- cell surface structure and may function as a receptor for SIRV2. peared stable during genetic manipulation, and we thus se- Moreover, it was demonstrated recently that Sso2386 carries mul- lected Res1 for complementation of sso3139 mutation. For tiple transmembrane helices and that Sso2387 constitutes an complementation of mutations in the other gene cluster, ATPase associated with a type IV secretion system, and they were Res1B, carrying an ISC1234 insertion in sso2387 (Fig. 3B), was designated AapF and AapE, respectively (28). Further, homologs isolated from SIRV2-infected Sens1. Res1 cells were trans- of both are essential for the formation of the adhesive type IV formed with vector pEXA2 containing sso3139, and Res1B cells pilus of S. acidocaldarius (28). The association with the cell mem- were transformed with vector pEXA2 containing sso2386 and brane of proteins encoded by both gene clusters strongly indicates sso2387. After SIRV2 was added into the cultures, growth re- their involvement in the entry process of SIRV2. tardation occurred in the complemented cells, while the non- The failure of viral entry into Res1 and Res1B cells was fur- complemented culture, transformed with the empty vector, ther confirmed by reverse transcription-PCR analysis of one of showed a growth rate similar to that of the uninfected culture the early genes, ORF131a (17). RNA extracted from cells taken (Fig. 4A and B). Further, Southern hybridization (17) using a at 15 min p.i. was DNase I treated and reverse transcribed probe derived from the SIRV2 inverted terminal repeats (ITR) (SuperScript II reverse transcriptase; Invitrogen). PCR per- detected signals only from the complemented cells (Fig. 4C and formed on the cDNAs detected ORF131a only from Sens1 cells, D) and the multiple hybridized bands were consistent with while the positive-control sso0446 (tfb-1) gene was detected in ongoing replication (Fig. 4E)(10, 25). The absence of SIRV2 all the 3 strains (Fig. 4F). This strongly supports the conclusion signal in the resistant strains indicates a defect in the virus life that the proteins encoded by the two gene clusters are involved cycle. in SIRV2 entry. A likely scenario is that gene cluster sso3138 to To gain insights into the functions of the two gene clusters, the sso3141 encodes a surface receptor for SIRV2 and that gene protein sequences of the genes were firstly analyzed by the use of cluster sso2386 and sso2387 is involved in the secretion of the program TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2 receptor components. .0/) for the possible presence of transmembrane helices. Sso3138, Except in Escherichia coli, very few virus receptors are known in Sso3139, and Sso3140 were predicted to be primarily located ex- the domains of Bacteria and Archaea (29). The primary receptors tracellularly (see Fig. S1 in the supplemental material), correlating for E. coli filamentous phages are pili which retract toward the cell with a previous prediction of the presence of class III signal pep- surface, bringing the phages to the secondary receptor located in tides at their N termini (26). Among these, Sso3140 was confirmed the periplasm (30). Linear archaeal viruses, including rudiviruses, to be a membrane-associated protein in a proteomics study (27). have been observed to attach to pili (5, 31, 32). Future work is In contrast to the other 3 proteins, Sso3141 was predicted to con- needed to determine the association of the two identified gene

10266 jvi.asm.org Journal of Virology SIRV2 Entry in Sulfolobus Downloaded from http://jvi.asm.org/ on November 2, 2014 by Copenhagen University Library

FIG 4 Cluster sso3138 to sso3141 and cluster sso2386 and sso2387 are involved in SIRV2 entry. (A) Growth retardation of sso3139-complemented Res1 upon SIRV2 infection. Res1 (pEXA), Res1 transformed with expression vector pEXA; Res1 (pEXA3139), Res1 transformed with expression vector pEXA containing sso3139. (B) Growth retardation of sso2386-and-sso2387-complemented Res1B upon SIRV2 infection. Res1B (pEXA), Res1B transformed with expression vector pEXA; Res1B (pEXA2386–2387), Res1B transformed with expression vector pEXA containing sso2386 and sso2387. (C and D) Visualization of SIRV2 DNA replication in Res1 (C) and Res1B (D) transformants infected with SIRV2. Plasmid constructs contained in the transformants are labeled on top of each lane, and the sampling time p.i. is indicated as hours. L and R designate the left and right terminal fragments, respectively, after a double digestion with BamHI and HindIII (see panel E). (E) Schematic presentation of SIRV2 genomic map and the formation of terminal duplex replicative intermediates (2L and 2R), as described previously (12). The locations of the probe (filled rectangle) in the termini are indicated. ITR, inverted terminal repeat. (F) RT-PCR amplification of sso0446 (tfb-1) (left panel) and SIRV2 ORF131a transcript fragments (right panel). “ϩ” and “Ϫ” indicate the presence and absence of reverse transcriptase (RT), respectively. clusters with the structure of pili. To our knowledge, this is the first 4. Erdmann S, Scheele U, Garrett RA. 2011. AAA ATPase p529 of Acidianus work providing genetic and biochemical evidence for a possible two-tailed virus ATV and host receptor recognition. Virology 421:61–66. receptor system in archaeal virus entry. http://dx.doi.org/10.1016/j.virol.2011.08.029. 5. Quemin ER, Lucas S, Daum B, Quax TE, Kuhlbrandt W, Forterre P, ACKNOWLEDGMENTS Albers SV, Prangishvili D, Krupovic M. 2013. First insights into the entry process of hyperthermophilic archaeal viruses. J. Virol. 87:13379–13385. We thank Roger A. Garrett for critically reading the manuscript. http://dx.doi.org/10.1128/JVI.02742-13. This work is supported by a European Union FP7 grant (265933). 6. Zillig W, Arnold HP, Holz I, Prangishvili D, Schweier A, Stedman K, She Q, Phan H, Garrett R, Kristjansson JK. 1998. Genetic elements in the REFERENCES extremely thermophilic archaeon Sulfolobus. Extremophiles 2:131–140. 1. Prangishvili D, Forterre P, Garrett RA. 2006. Viruses of the Archaea: a http://dx.doi.org/10.1007/s007920050052. unifying view. Nat. Rev. Microbiol. 4:837–848. http://dx.doi.org/10.1038 7. Fu CY, Johnson JE. 2012. Structure and cell biology of archaeal virus /nrmicro1527. STIV. Curr. Opin. Virol. 2:122–127. http://dx.doi.org/10.1016/j.coviro 2. Pina M, Bize A, Forterre P, Prangishvili D. 2011. The archeoviruses. .2012.01.007. FEMS Microbiol. Rev. 35:1035–1054. http://dx.doi.org/10.1111/j.1574 8. Häring M, Vestergaard G, Rachel R, Chen L, Garrett RA, Prangishvili -6976.2011.00280.x. D. 2005. Virology: independent virus development outside a host. Nature 3. Peng X, Garrett RA, She Q. 2012. Archaeal viruses–novel, diverse and 436:1101–1102. http://dx.doi.org/10.1038/4361101a. enigmatic. Sci. China Life Sci. 55:422–433. http://dx.doi.org/10.1007 9. Bize A, Karlsson EA, Ekefjard K, Quax TE, Pina M, Prevost MC, /s11427-012-4325-8. Forterre P, Tenaillon O, Bernander R, Prangishvili D. 2009. A unique

September 2014 Volume 88 Number 17 jvi.asm.org 10267 Deng et al.

virus release mechanism in the Archaea. Proc. Natl. Acad. Sci. U. S. A. CRISPR/Cmr systems when challenged with vector-borne viral and plas- 106:11306–11311. http://dx.doi.org/10.1073/pnas.0901238106. mid genes and protospacers. Mol. Microbiol. 79:35–49. http://dx.doi.org 10. Peng X, Blum H, She Q, Mallok S, Brugger K, Garrett RA, Zillig W, /10.1111/j.1365-2958.2010.07452.x. Prangishvili D. 2001. Sequences and replication of genomes of the ar- 22. Erdmann S, Garrett RA. 2012. Selective and hyperactive uptake of foreign chaeal rudiviruses SIRV1 and SIRV2: relationships to the archaeal lipo- DNA by adaptive immune systems of an archaeon via two distinct mech- thrixvirus SIFV and some eukaryal viruses. Virology 291:226–234. http: anisms. Mol. Microbiol. 85:1044–1056. http://dx.doi.org/10.1111/j.1365 //dx.doi.org/10.1006/viro.2001.1190. -2958.2012.08171.x. 11. Prangishvili D, Arnold HP, Gotz D, Ziese U, Holz I, Kristjansson JK, 23. She Q, Singh RK, Confalonieri F, Zivanovic Y, Allard G, Awayez MJ, Zillig W. 1999. A novel virus family, the Rudiviridae: structure, virus-host Chan-Weiher CC, Clausen IG, Curtis BA, De Moors A, Erauso G, interactions and genome variability of the sulfolobus viruses SIRV1 and Fletcher C, Gordon PM, Heikamp-de Jong I, Jeffries AC, Kozera CJ, SIRV2. Genetics 152:1387–1396.

Medina N, Peng X, Thi-Ngoc HP, Redder P, Schenk ME, Theriault C, Downloaded from 12. Vestergaard G, Haring M, Peng X, Rachel R, Garrett RA, Prangishvili Tolstrup N, Charlebois RL, Doolittle WF, Duguet M, Gaasterland T, D. 2005. A novel rudivirus, ARV1, of the hyperthermophilic archaeal Garrett RA, Ragan MA, Sensen CW, Van der Oost J. 2001. The complete genus Acidianus. Virology 336:83–92. http://dx.doi.org/10.1016/j.virol genome of the crenarchaeon Sulfolobus solfataricus P2. Proc. Natl. Acad. .2005.02.025. Sci. U. S. A. 98:7835–7840. http://dx.doi.org/10.1073/pnas.141222098. 13. Vestergaard G, Shah SA, Bize A, Reitberger W, Reuter M, Phan H, 24. Skovgaard O, Bak M, Lobner-Olesen A, Tommerup N. 2011. Genome- Briegel A, Rachel R, Garrett RA, Prangishvili D. 2008. Stygiolobus wide detection of chromosomal rearrangements, indels, and mutations in rod-shaped virus and the interplay of crenarchaeal rudiviruses with the circular chromosomes by short read sequencing. Genome Res. 21:1388– CRISPR antiviral system. J. Bacteriol. 190:6837–6845. http://dx.doi.org 1393. http://dx.doi.org/10.1101/gr.117416.110. /10.1128/JB.00795-08. 25. Oke M, Kerou M, Liu H, Peng X, Garrett RA, Prangishvili D, Naismith 14. Blum H, Zillig W, Mallok S, Domdey H, Prangishvili D. 2001. The JH, White MF. 2011. A dimeric Rep protein initiates replication of a linear http://jvi.asm.org/ genome of the archaeal virus SIRV1 has features in common with genomes archaeal virus genome: implications for the Rep mechanism and viral of eukaryal viruses. Virology 281:6–9. http://dx.doi.org/10.1006/viro replication. J. Virol. 85:925–931. http://dx.doi.org/10.1128/JVI.01467-10. .2000.0776. 26. Szabó Z, Stahl AO, Albers SV, Kissinger JC, Driessen AJ, Pohlschröder 15. Prangishvili D, Koonin EV, Krupovic M. 2013. Genomics and biology of M. 2007. Identification of diverse archaeal proteins with class III signal Rudiviruses, a model for the study of virus-host interactions in Archaea. peptides cleaved by distinct archaeal prepilin peptidases. J. Bacteriol. 189: Biochem. Soc. Trans. 41:443–450. http://dx.doi.org/10.1042/BST20120313. 16. Okutan E, Deng L, Mirlashari S, Uldahl K, Halim M, Liu C, Garrett RA, 772–778. http://dx.doi.org/10.1128/JB.01547-06. She Q, Peng X. 2013. Novel insights into gene regulation of the rudivirus 27. Pham TK, Sierocinski P, van der Oost J, Wright PC. 2010. Quantitative SIRV2 infecting Sulfolobus cells. RNA Biol. 10:875–885. http://dx.doi.org proteomic analysis of Sulfolobus solfataricus membrane proteins. J. Pro- /10.4161/rna.24537. teome Res. 9:1165–1172. http://dx.doi.org/10.1021/pr9007688. on November 2, 2014 by Copenhagen University Library 17. Deng L, Zhu H, Chen Z, Liang YX, She Q. 2009. Unmarked gene 28. Henche AL, Ghosh A, Yu X, Jeske T, Egelman E, Albers SV. 2012. deletion and host-vector system for the hyperthermophilic crenarchaeon Structure and function of the adhesive type IV pilus of Sulfolobus acido- Sulfolobus islandicus. Extremophiles 13:735–746. http://dx.doi.org/10 caldarius. Environ. Microbiol. 14:3188–3202. http://dx.doi.org/10.1111/j .1007/s00792-009-0254-2. .1462-2920.2012.02898.x. 18. Martusewitsch E, Sensen CW, Schleper C. 2000. High spontaneous 29. Labrie SJ, Samson JE, Moineau S. 2010. Bacteriophage resistance mechanisms. mutation rate in the hyperthermophilic archaeon Sulfolobus solfataricus Nat. Rev. Microbiol. 8:317–327. http://dx.doi.org/10.1038/nrmicro2315. is mediated by transposable elements. J. Bacteriol. 182:2574–2581. http: 30. Rakonjac J, Bennett NJ, Spagnuolo J, Gagic D, Russel M. 2011. Fila- //dx.doi.org/10.1128/JB.182.9.2574-2581.2000. mentous bacteriophage: biology, phage display and nanotechnology ap- 19. Redder P, Garrett RA. 2006. Mutations and rearrangements in the ge- plications. Curr. Issues Mol. Biol. 13:51–76. http://www.horizonpress nome of Sulfolobus solfataricus P2. J. Bacteriol. 188:4198–4206. http://dx .com/cimb/v/v13/51.pdf. .doi.org/10.1128/JB.00061-06. 31. Zillig W, Kletzin A, Schleper C, Holz I, Janekovic D, Hain J, Lanzendör- 20. Blount ZD, Grogan DW. 2005. New insertion sequences of Sulfolobus: fer M, Kristjansson JK. 1993. Screening for Sulfolobales, their plasmids functional properties and implications for genome evolution in hyper- and their viruses in Icelandic Solfataras. Syst. Appl. Microbiol. 16:609– thermophilic archaea. Mol. Microbiol. 55:312–325. http://dx.doi.org/10 628. http://dx.doi.org/10.1016/S0723-2020(11)80333-4. .1111/j.1365-2958.2004.04391.x. 32. Bettstetter M, Peng X, Garrett RA, Prangishvili D. 2003. AFV1, a novel 21. Gudbergsdottir S, Deng L, Chen Z, Jensen JV, Jensen LR, She Q, virus infecting hyperthermophilic archaea of the genus acidianus. Virol- Garrett RA. 2011. Dynamic properties of the Sulfolobus CRISPR/Cas and ogy 315:68–79. http://dx.doi.org/10.1016/S0042-6822(03)00481-1.

10268 jvi.asm.org Journal of Virology