bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 New isolates from Italian hydrothermal environments underscore the biogeographic 2 pattern in archaeal virus communities 3 4 5 6 Diana P. Baquero1,2, Patrizia Contursi3, Monica Piochi4, Simonetta Bartolucci3, Ying Liu1, 7 Virginija Cvirkaite-Krupovic1, David Prangishvili1,* and Mart Krupovic1,* 8 9 10 1 Archaeal Unit, Department of , Institut Pasteur, 75015 Paris, France 11 2 Sorbonne Universités, UPMC Univ, 75005 Paris, France 12 3 University of Naples Federico II, Department of Biology, Naples, Italy 13 4 Istituto Nazionale di Geofisica e Vulcanologia, Naples, Osservatorio Vesuviano, Italy 14 15 * Correspondence to: [email protected] and [email protected] 16 Institut Pasteur, Department of Microbiology, 17 75015 Paris, France 18 Tel: 33 (0)1 40 61 37 22 19 20 21 Running title 22 Biogeographic pattern in archaeal virus communities 23 24 25 Competing Interests 26 The authors declare that they have no competing interests. 27 28 29 30 31

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

32 ABSTRACT 33 of hyperthermophilic represent one of the least understood parts of the virosphere, 34 showing little genomic and morphological similarity to viruses of or . Here, we 35 investigated virus diversity in the active sulfurous fields of the Campi Flegrei in Pozzuoli, 36 Italy. Virus-like particles displaying eight different morphotypes, including lemon-shaped, 37 droplet-shaped and bottle-shaped virions, were observed and five new archaeal viruses proposed 38 to belong to families Rudiviridae, and Tristromaviridae were isolated and 39 characterized. Two of these viruses infect neutrophilic of the 40 , whereas the remaining three have rod-shaped virions typical of the family 41 Rudiviridae and infect acidophilic hyperthermophiles belonging to three different genera of the 42 order , namely, Saccharolobus, and . Notably, 43 Metallosphaera rod-shaped virus 1 is the first rudivirus isolated on Metallosphaera . 44 Phylogenomic analysis of the newly isolated and previously sequenced rudiviruses revealed a clear 45 biogeographic pattern, with all Italian rudiviruses forming a monophyletic clade, suggesting 46 geographical structuring of virus communities in extreme geothermal environments. Furthermore, 47 we propose a revised classification of the Rudiviridae family, with establishment of five new 48 genera. Collectively, our results further show that high-temperature continental hydrothermal 49 systems harbor a highly diverse virome and shed light on the evolution of archaeal viruses.

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

50 INTRODUCTION 51 One of the most remarkable features of hyperthermophilic archaea is the diversity and uniqueness 52 of their viruses. Most of these viruses infect members of the phylum and are 53 evolutionarily unrelated to viruses infecting bacteria, eukaryotes or even archaea thriving at 54 moderate temperate [1-4]. Thus far, unique to hyperthermophilic archaea are rod-shaped viruses 55 of the families Rudiviridae and ; filamentous enveloped viruses of the families 56 and Tristromaviridae; as well as spherical (Globuloviridae), ellipsoid 57 (Ovaliviridae), droplet-shaped (), coil-shaped () and bottle-shaped 58 () viruses [1,5]. Hyperthermophilic archaea are also infected by two types of 59 spindle-shaped viruses, belonging to the families and [6,7]. 60 Whereas bicaudaviruses appear to be restricted to hyperthermophiles, viruses distantly related to 61 fuselloviruses are also known to infect hyperhalophilic archaea [8], marine hyperthermophilic 62 archaea [9,10] and marine ammonia-oxidizing archaea from the phylum Thaumarchaeota [11]. 63 Finally, hyperthermophilic archaea are also infected by three groups of viruses with icosahedral 64 virions, [12], Portogloboviridae [13] and two closely related, unclassified viruses 65 infecting Metallosphaera species [14]. Portoglobovirus SPV1 is structurally and genomically 66 unrelated to other known viruses [13,15], whereas viruses structurally similar to turriviruses are 67 widespread in all three domains of [1,16,17]. Structural studies on filamentous and spindle- 68 shaped crenarchaeal viruses have illuminated the molecular details of virion organization and 69 further underscored the lack of relationship to viruses of bacteria and eukaryotes [18-24]. 70 71 The uniqueness of hyperthermophilic archaeal viruses extends to their , with ∼75% of the 72 lacking detectable homologues in sequence databases [25]. All characterized archaeal 73 viruses have DNA genomes, which can be single-stranded (ss) or double-stranded (ds), linear or 74 circular. Comparative genomic and bipartite network analyses have shown that viruses of

75 hyperthermophilic archaea share only few genes with the rest of the virosphere [26]. Furthermore, 76 most of the genes in viruses from different families are family-specific, albeit a handful of genes 77 encoding transcription regulators, glycosyltransferases and anti-CRISPR display broader 78 distribution [27]. These observations, in combination with the structural studies, led to the 79 suggestion that crenarchaeal viruses have originated on multiple independent occasions and 80 constitute a unique part of the virosphere [1].

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

81 82 Although the infection cycles of crenarchaeal viruses has been studied for just a handful of 83 representative viruses, the available data has already provided valuable insight into the virus-host 84 interaction strategies in archaea. For instance, members of the Fuselloviridae, Guttaviridae, 85 Turriviridae and Bicaudaviridae are temperate and their genomes can be site-specifically 86 integrated into the host chromosome by means of a virus-encoded integrase [28-33]. In the case of 87 SSV1, the most extensively characterized member of the Fuselloviridae family, the virus 88 integrates its into the host chromosome at an arginyl-tRNA upon infection. After 89 exposure to UV light, the excision and replication of the SSV1 viral genomes is induced, leading 90 to virion release [34,35]. Two different egress strategies have been elucidated. The enveloped 91 virions of fusellovirus SSV1 are assembled at the host cell membrane and are released from the 92 cell by a budding mechanism similar to that of some eukaryotic enveloped viruses [36]. By 93 contrast, lytic crenarchaeal viruses belonging to three unrelated families, Rudiviridae, Turriviridae 94 and Ovaliviridae, employ a unique release mechanism based on the formation of pyramidal 95 protrusions on the host cell surface, leading to perforation of the cell envelope and release of 96 intracellularly assembled mature virions [5,37,38]. 97 98 Single-cell sequencing combined with environmental metagenomics of hydrothermal microbial 99 community from Yellowstone National Park [39] led to the estimation that >60% of cells contain 100 at least one virus type and a majority of these cells contain two or more virus types [40]. However, 101 despite their diversity, distinctiveness and abundance, the number of isolated species of viruses 102 infecting hyperthermophilic archaea remains low compared to the known eukaryotic or bacterial 103 viruses [2]. Indeed, it has been estimated that only about 0.01-0.1% of viruses present in 104 geothermal acidic environments have been isolated [41]. Similarly, using a combination of viral 105 assemblage sequencing and network analysis, it has been estimated that out of 110 identified virus 106 groups, less than 10% represent known archaeal viruses, suggesting that the vast majority of virus 107 clusters represent unknown viruses, likely infecting archaeal hosts [42]. Furthermore, the evolution 108 and structuring of virus communities in terrestrial hydrothermal settings remains poorly 109 understood. Here, to improve understanding on these issues, we explored the diversity of archaeal 110 viruses at the active solfataric field of the Campi Flegrei volcano [43,44] in Pozzuoli, Italy, namely 111 the Pisciarelli hydrothermal area. We report on the isolation and characterization of five new

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

112 archaeal viruses belonging to three different families and infecting hosts from five different 113 crenarchaeal genera.

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

114 MATERIALS AND METHODS 115 Enrichment cultures 116 Nine samples were collected from hot springs, mud pools and hydrothermally altered terrains at 117 the solfataric field of the Campi Flegrei volcano in Pozzuoli, Italy — the study area known as 118 Pisciarelli — with temperatures ranging from 81 to 96°C and pH between 1 and 7 (Supplementary 119 Information, Table S1). Each sample was inoculated into medium favoring the growth of 120 /Saccharolobus (basal salt solution supplemented with 0.2% tryptone, 0.1% yeast 121 extract, 0.1% sucrose, pH adjusted to 3.5), Acidianus (basal salt solution supplemented with 0.2% 122 tryptone, 0.1% yeast extract, , pH adjusted to 3.5), and Pyrobaculum (0.1% yeast extract, 1%

123 tryptone, 0.1-0.3% Na2S2O3 x 5 H2O, pH adjusted to 7) species [45,46]. The enrichment cultures 124 were incubated for 10 days at 75°C, except for Pyrobaculum cultures, which were grown at 90°C 125 for 15 days without shaking. 126 127 VLP concentration and purification 128 VLPs were collected after removal of cells from the enrichment culture (7,000 rpm, 20 min, Sorvall 129 1500 rotor) and concentrated by ultracentrifugation (40,000 rpm, 2h, 10°C, Beckman SW41 rotor).

130 The particles were resuspended in buffer A: 20 mM KH2PO4, 250 mM NaCl, 2.14 mM MgCl2,

131 0.43 mM Ca(NO3)2, <0.001% trace elements of Sulfolobales medium, pH 6 [47] and visualized 132 under the transmission electron microscope (TEM) FEI Tecnai Biotwin as described below. VLPs 133 were further purified by centrifugation in a CsCl buoyant density gradient (0.45 g ml−1) with a 134 Beckman SW41 rotor at 39,000 rpm for 20 h at 10°C. All opalescent bands were collected with a 135 needle and a syringe, dialyzed against buffer A and analyzed by TEM for the presence of VLPs. 136 137 Transmission electron microscopy and VLP analysis 138 For negative-stain TEM, 5 µl of the samples were applied to ‐coated copper grids, 139 negatively stained with 2% uranyl acetate and imaged with the transmission electron microscope 140 FEI Tecnai Biotwin. The width and length of the negatively stained virus particles were determined 141 using ImageJ (n=80) [48]. 142 143 Isolation of virus-host pairs

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

144 Strains of cells were colony purified by plating dilutions of the Sulfolobus/Saccharolobus 145 enrichment cultures onto Phytagel (0.7% [wt/vol]) plates incubated for 7 days at 75 °C. Brownish 146 colonies were toothpicked, inoculated into 1 mL of growth medium and incubated at 75°C for 3 147 days. To isolate the sensitive hosts of the observed VLPs, 5-µl aliquots of concentrated VLP 148 preparations were spotted onto Phytagel (0.3% [wt/vol]) plates containing lawns of each isolate, 149 as described previously [46]. Inhibition zones were cut out from the phytagel and inoculated into 150 20 mL of exponentially growing cultures of the corresponding isolates. After incubation at 75°C 151 for 3 days, the VLPs were concentrated by ultracentrifugation (40,000 rpm, 2h, 10°C, Beckman 152 SW41 rotor), resuspended in buffer A and observed by TEM. Pure virus strains were obtained by 153 two rounds of single-plaque purification, as described previously [34]. 154 155 For Pyrobaculum enrichment cultures, a different approach was carried out since it was not 156 possible to obtain single strain isolates. Growing liquid cultures of P. arsenaticum 2GA [45], P. 157 arsenaticum PZ6 (DSM 13514) [49], P. calidifontis VA1 (DSM 21063) [50] and P. 158 oguniense TE7 (DSM 13380) [51] were mixed with concentrated VLPs and incubated for 15 days 159 at 90°C. Increase in the number of extracellular viral particles was verified by TEM. In order to 160 obtain cultures producing just one type of viral particles, eight ten-fold serial dilutions (10 ml) of 161 infected cells were established, incubated at 90°C for 22 days and observed by TEM. Because a 162 plaque assay could not be established for VLPs infecting Pyrobaculum, we relied on TEM 163 observations for assessment of virus-host interactions. 164 165 Host range 166 The following laboratory collection strains were used to test the ability to replicate the viruses 167 MRV1, ARV3 and SSRV1: A. convivator [31], A. hospitalis [52], Saccharolobus solfataricus P1 168 (GenBank accession no. NZ_LT549890) and S. solfataricus P2 [53], Sulfolobus islandicus strains 169 REN2H1 [46], HVE10/4 [54], LAL14/1 [55], REY15A [54], ΔC1C2 [56], and Sulfolobus 170 acidocaldarius strain DSM639 [57]. CsCl gradient-purified virus particles were added directly to 171 the growing cultures of Acidianus species and virus propagation was evaluated by TEM. For 172 Sulfolobus and Saccharolobus species, spot tests were performed and the replication of the virus 173 was evidenced by the presence of an inhibition zone on the lawn of the tested strains. 174

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

175 The host ranges of PFV2 and PSV2 were examined by adding purified virions to the growing 176 cultures of P. arsenaticum PZ6 (DSM 13514) [49], P. calidifontis VA1 (DSM 21063) and P. 177 oguniense TE7 (DSM 13380) [51]. The virus propagation was monitored by TEM since plaque 178 assays could not be established. 179 180 16S rDNA sequence determination 181 DNA was extracted from exponentially growing cell cultures using the Wizard Genomic DNA 182 Purification Kit (Promega). The 16S rRNA genes were amplified by PCR using 519F (5'- 183 CAGCMGCCGCGGTAA) and 1041R (5'-GGCCATGCACCWCCTCTC) primers [58]. The 184 identity of the isolated strains was determined by blastn search of the corresponding 16S rRNA 185 gene sequences against the non-redundant nucleotide sequence database at NCBI. 186 187 Viral DNA extraction, sequencing and analysis of viral genomes 188 Viral DNA was extracted from purified virus particles as described previously [45]. Sequencing 189 libraries were prepared from 100 ng of DNA with the TruSeq DNA PCR-Free library Prep Kit 190 from Illumina and sequenced on Illumina MiSeq platform with 150-bp paired-end read lengths 191 (Institut Pasteur, France). Raw sequence reads were processed with Trimmomatic v.0.3.6 [59] and 192 assembled using SPAdes 3.11.1 [60] with default parameters. Open reading fragments (ORF) were 193 predicted by GeneMark.hmm [61] and RAST v2.0 [62]. Each predicted ORF was manually 194 inspected for the presence of putative ribosome-binding sites upstream of the start codon. The in 195 silico-translated sequences were analyzed by BLASTP [63] against the non-redundant 196 protein database at the National Center for Biotechnology Information with an upper threshold E- 197 value of 1e-03. Searches for distant homologs were performed using HHpred [64] against PFAM 198 (Database of Protein Families), PDB (Protein Data Bank) and CDD (Conserved Domains 199 Database) databases. Transmembrane domains were predicted using TMHMM [65]. The 200 sequences of the viral genomes have been deposited in GenBank, with accession numbers listed 201 in Table 2. 202 203 Phylogenomic analysis 204 All pairwise comparisons of the nucleotide sequences of rudivirus genomes were conducted using 205 the Genome-BLAST Distance Phylogeny (GBDP) method implemented in VICTOR, under

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

206 settings recommended for prokaryotic viruses [66]. The resulting intergenomic distances derived 207 from pairwise matches (local alignments) were used to infer a balanced minimum evolution tree 208 with branch support via FASTME including SPR postprocessing for D6 formula, which 209 corresponds to the sum of all identities found in high-scoring segment pairs divided by total 210 genome length. Branch support was inferred from 100 pseudo-bootstrap replicates each. The tree 211 was rooted with members of the family Lipothrixviridae. 212 213 Genome sequences of 19 rudiviruses were further compared using the Gegenees, a tool for 214 fragmented alignment of multiple genomes [67]. The comparison was done in the BLASTN mode, 215 with the settings 200/100. The cutoff threshold for non-conserved material was 40%. 216

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

217 RESULTS 218 Diversity of virus-like particles in enrichments cultures 219 Nine environmental samples (I1-I9) were collected in hot springs, mud pools and hydrothermally 220 altered terrains of the solfataric fields of the Campi Flegrei volcano in Pozzuoli, Italy, with 221 temperatures ranging from 81 to 96°C and pH values between 1 and 7 (Supplementary 222 Information, Table S1). The enrichment cultures were obtained by inoculating the samples into 223 different media favoring the growth of hyperthermophilic members of the genera 224 Sulfolobus/Saccharolobus, Acidianus and Pyrobaculum [45,46]. Virus-like particles (VLPs) were 225 collected from cell-free culture supernatants and visualized under transmission electron 226 microscope (TEM) as described in Materials and Methods. 227 228 A variety of VLPs were detected in the Sulfolobus/Saccharolobus enrichment cultures of samples 229 I3 and I9 and the Pyrobaculum enrichment culture of sample I4 (Figure 1). Based on virion 230 morphologies, the VLPs detected in the two samples inoculated in Sulfolobus/Saccharolobus 231 medium could be assigned to five archaeal virus families: Fuselloviridae (Figure 1A), 232 Bicaudaviridae (Figure 1B), Ampullaviridae (Figure 1C), Rudiviridae (Figure 1D) 233 and Lipothrixviridae (Figure 1E). VLPs propagated in Pyrobaculum medium resembled members 234 of the families Globuloviridae (Figure 1F), Tristromaviridae (Figure 1G) and Guttaviridae (Figure 235 1H). We next set out to establish pure cultures of these different viruses and to isolate their 236 respective hosts. 237 238 Isolation of virus-host pairs 239 In order to isolate VLP-propagating strains, 215 single strain isolates were colony purified from 240 the two enrichment cultures established in the Sulfolobus medium. Concentrated VLPs were first 241 tested against the isolates by spot test. In case of cell growth inhibition, a liquid culture of the 242 isolate was established and infected with the halo zone observed in the spot test. The production 243 of the viral particles was subsequently verified by TEM. As a result, three strains replicating VLPs 244 were identified. Comparison of their 16S rRNA gene sequences showed that the three strains 245 belong to three different genera of the order Sulfolobales, namely, Saccharolobus (until recently 246 known as Sulfolobus), Acidianus and Metallosphaera. The 16S rRNA genes of these strains, 247 POZ9, POZ149 and POZ202, respectively, displayed 100%, 99% and 99% identity to the

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

248 corresponding genes of Acidianus brierleyi DSM 1651 (NZ_CP029289), Saccharolobus 249 solfataricus Ron 12/III (X90483) and SARC-M1 (CP012176). Rod-shaped 250 particles of different sizes were propagated successfully in the three isolated strains (Figure 2A- 251 C) and, following the nomenclature used for other rudiviruses, were named Metallosphaera rod- 252 shaped virus 1 (MRV1), Acidianus rod-shaped virus 3 (ARV3) and Saccharolobus solfataricus 253 rod-shaped virus 1 (SSRV1), respectively. 254 255 Negatively stained MRV1, ARV3 and SSRV1 virions are rod-shaped particles measuring 630 ± 256 20 × 25 ± 2 nm, 670 ± 40 × 23 ± 4 nm and 750 ± 30 × 24 ± 3 nm, respectively (Figure 2A-C). 257 Similar to members of the Rudiviridae family, the three viral particles have terminal fibers located 258 at each end of the virion, which have been shown to play a role in host recognition in the case of 259 Sulfolobus islandicus rod-shaped virus 2 (SIRV2) [68]. Interestingly, ARV3 virions appear to be 260 more flexible than MRV1 and SSRV1 particles, resembling Acidianus rod-shaped virus 1 (ARV1) 261 [69]. 262 263 A different approach was chosen to identify the hosts of the viral particles detected in the 264 Pyrobaculum medium. Exponentially growing liquid cultures of Pyrobaculum strains were mixed 265 with concentrated VLPs and incubated for 15 days at 90°C (see Materials and Methods). The 266 replication of the particles was monitored by TEM. P. arsenaticum 2GA propagated filamentous 267 and spherical particles, named Pyrobaculum filamentous virus 2 (PFV2; Figure 2D) and 268 Pyrobaculum spherical virus 2 (PSV2; Figure 2E), respectively. Because P. arsenaticum 2GA 269 could not be grown as a lawn on solid medium, dilutions of infected cells were made in order to 270 establish cultures infected with just one type of viral particles. 271 272 Negatively stained virions of PFV2 are filamentous and flexible particles of about 450 ± 20 × 34 273 ± 4 nm in size with terminal filaments of up to 130 nm in length attached to one or both ends of 274 the virions (Figure 2D), similar to what has been reported for PFV1 [45]. PSV2 virions are 275 spherical particles of around 90 ± 20 nm of diameter, with a variable number of bulging protrusions 276 on their surface (Figure 2E), resembling Pyrobaculum spherical virus (PSV) particles [70].

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

277 Unfortunately, the hosts for other VLPs shown in Figure 1 could not be isolated, either due to 278 unfavorable growth conditions in the laboratory or due to virus-mediated extinction of the 279 corresponding host cell populations. 280 281 Host Range 282 The host ranges of MRV1, ARV3 and SSRV1 were investigated using strains of the 283 genera Acidianus, Sulfolobus and Saccharolobus (Table 1). Only ARV3 could be propagated on 284 A. hospitalis, whereas none of the other tested strains served as a host for MRV1 and SSRV1, even 285 though S. solfataricus P1 and P2 were also isolated from the solfataric field in Pozzuoli [53]. The 286 host ranges of PFV2 and PSV2 were studied using strains of the genus Pyrobaculum (Table 1). 287 None of the tested strains supported propagation of PSV2. By contrast, apparent increase in the 288 number of PFV2 particles was observed by TEM in the presence of P. oguniense TE7, albeit not 289 to the same extent as with P. arsenaticum 2GA, indicating that P. oguniense TE7 is not optimal 290 for the virus propagation. 291 292 Genome organization 293 Genomes of the five viruses were isolated from the purified virions and treated with DNase I, type 294 II restriction endonucleases (REases) and RNase A. None of the viral genomes were sensitive to 295 RNase A, but could be digested by DNase I and REases, indicating that all viral genomes consist 296 of dsDNA molecules. The genomes were sequenced on Illumina MiSeq platform, with the 297 assembled contigs corresponding to complete or near-complete virus genomes. The general 298 properties of the virus genomes are summarized in Table 2. 299 300 New species in the Globuloviridae family 301 The linear dsDNA genome of PSV2 is 18,212 bp in length and has a GC content of 45%, which is 302 similar to that of other Pyrobaculum-infecting viruses (45-48%) [45,70]. The coding region of the 303 PSV2 genome is flanked by perfect 55 bp-long terminal inverted repeats (TIR), confirming the 304 linear topology and (near) completeness of the genome. It contains 32 predicted open reading 305 frames (ORFs), all located on the same strand. Thirteen PSV ORFs contain at least one predicted 306 membrane-spanning region. Notably, two of them (ORFs 3 and 9) have nine predicted 307 transmembrane domains.

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

308 309 Globuloviruses stand out as some of the most mysterious among archaeal viruses, with 98% of 310 their proteins showing no similarity to sequences in public databases [25]. In part, this is due to 311 the fact that the family is represented by only two members, PSV and TTSV1, infecting 312 Pyrobaculum [70] and tenax [71], respectively, which precludes generation of 313 sensitive sequence profiles for remote homology searches. Previous structural genomics initiatives 314 were also not very illuminating [72]. As a result, the vast majority of the globulovirus proteins lack 315 functional annotation. Homologs of PSV2 proteins were identified exclusively in members of the 316 Globuloviridae (Figure 3), corroborating the initial affiliation of PSV2 into the family 317 Globuloviridae based on the morphological features of the virion. Nineteen PSV2 ORFs, including 318 those encoding the three major structural proteins (VP1-VP3), have closest homologs in PSV [70] 319 with sequence identities ranging between 28% and 65% (Figure 3); 13 of these ORFs 320 are also shared with TTSV1. The remaining 13 PSV2 ORFs yielded no significant matches to 321 sequences in public databases. The three genomes display no appreciable similarity at the 322 nucleotide sequence level, indicating considerable sequence diversity within the Globuloviridae 323 family. Notably, among five PSV proteins for which high-resolution structures are available [72], 324 only one protein with a unique fold, PSV gp11 (ORF239), is conserved in PSV2. 325 326 Sensitive profile-profile comparisons using HHpred allowed functional annotation of only four 327 PSV2 proteins. The PSV2 ORF2 encodes a protein with an AAA+ ATPase , which is most 328 closely related to those found in ClpB-like chaperones and heat shock proteins (HHpred 329 probability of 99.6%; Supplementary Information, Table S2). ORF3 encodes one of the two 330 proteins with nine putative transmembrane domains and is predicted to function as a membrane 331 transporter, most closely matching bacterial and archaeal cation exchangers (HHpred probability 332 of 95.5%). Interestingly, the product of ORF4 is predicted to be a circadian clock protein KaiB, 333 albeit with a lower probability (HHpred probability of 93%). Finally, ORF32 shares homology 334 with a putative transcriptional regulator with the winged helix-turn-helix domain (wHTH) of 335 Saccharolobus solfataricus (HHpred probability of 94.4%). In addition, ORF11 encodes a 336 functionally uncharacterized DUF1286-family protein conserved in archaea and several 337 Saccharolobus-infecting viruses (Supplementary Information, Table S2). 338

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

339 New member in the Tristromaviridae family 340 The linear genome of the isolated PFV2 carries 17,602 bp and contains 39 ORFs, all except one 341 located on the same strand. The coding region is flanked by 59 bp-long TIRs. The GC content 342 (45.3%) of the genome is similar to that of PSV2 and other Pyrobaculum-infecting viruses [45,70], 343 but is considerably lower than in Pyrobaculum arsenaticum PZ6 (58.3%), and Pyrobaculum 344 ogunienseTE7 (55.1%). Eleven of the PFV2 ORFs were predicted to encode proteins with one or 345 more membrane-spanning domains (Supplementary Information, Table S2). 346 347 The PFV2 genome is 98.9% identical over 70% of its length to that of PFV1, the type species of 348 the Tristromaviridae family [45]. PFV1 and PFV2 were isolated ~3 years apart, from the same 349 solfataric field in Pozzuoli [45], suggesting that the population of tristromaviruses is relatively 350 stable over time. Accordingly, 36 of the 39 PFV2 ORFs are nearly identical (average amino acid 351 identity of 96.5%) to those of PFV1. Two events account for the differences between PFV1 and 352 PFV2: (i) a deletion spanning most of the PFV1 gene 27 (including codons 72-495) as well as the 353 downstream genes 28-30, and (ii) insertion of a four-gene block between PFV1 genes 36 and 37 354 in PFV2 (Figure 4). PFV1 gene 27 encodes a minor virion protein, whereas genes 29 and 30 encode 355 putative small, lectin-like carbohydrate-binding proteins (Rensen et al., 2016). The absence of the 356 corresponding genes in PFV2 genome suggests that they are dispensable for the PFV1/PFV2 357 infection cycle. Notably, a homolog of the PFV1 glycoside hydrolase gene 28 is reinserted into 358 PFV2 genome as part of the four-gene block (PFV2 ORF34). However, the two genes do not 359 appear to be orthologous in PFV1 and PFV2 genomes, because they share much lower sequence 360 similarity (50% versus average 96.5% identity) compared to other orthologous genes. By contrast, 361 PFV2 ORF35 has no counterpart in PFV1 but is homologous to the glycosyltransferase gene of 362 Thermoproteus tenax virus 1 (TTV1) [73], the only other known member of the Tristromaviridae 363 family [74](Figure 4). Thus, comparison of the closely related tristromavirus sequences revealed 364 active genome remodeling in this virus group, involving both deletions and horizontal acquisition 365 of new genes. 366 367 New rod-shaped viruses 368 The linear genomes of the isolated rudiviruses have a length ranging from 20,269 to 26,079 bp and 369 a GC content varying between 32.02% and 34.12%. The MRV1 genome contains 80 bp-long TIR,

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

370 suggesting that the genome is coding-complete (i.e., contains all protein-coding genes). Although 371 no TIRs could be identified for ARV3 and SSRV1, comparison with the genomes of other 372 rudiviruses (Figure 5) suggests that the two genomes are also nearly complete. 373 374 The genomes of MRV1, ARV3 and SSRV1 contain 27, 33 and 37 ORFs, respectively, and display 375 high degree of gene synteny (Figure 5). Comparison of the three genomes showed that they share 376 26 putative proteins, with amino acid sequence identities ranging between 35.1% and 93.9%. 377 Among the previously reported rudiviruses, the three newly isolated viruses share the highest 378 similarity with ARV2 (ANI of ~78%), which was metagenomically sequenced from samples 379 collected in the same Pozzuoli area [75]; this group of viruses shares 20 genes. 380 381 Rudiviruses represent one of the most extensively studied families of archaeal viruses with many 382 of the viral proteins being functionally and structurally characterized [76]. Most of the MRV1, 383 ARV3 and SSRV1 ORFs have orthologs in at least some members of the Rudiviridae. Genes 384 shared with other rudiviruses include those for the major and minor proteins, several 385 transcription factors with ribbon-helix-helix motifs, three glycosyltransferases, GCN5-family 386 acetyltransferase, Holliday junction resolvase, SAM-dependent methyltransferase and a gene 387 cassette encoding the ssDNA-binding protein, ssDNA annealing ATPase and Cas4-like ssDNA 388 exonuclease (Figure 5). The conservation of these core proteins in the expanding collection of 389 rudiviruses underscores their critical role in viral reproduction. Conspicuously missing from the 390 MRV1 and ARV3 are homologs of the SIRV2 P98 protein responsible for the formation of 391 pyramidal structures for virion egress and anti-CRISPR (Acr) proteins encoded by diverse 392 crenarchaeal viruses [77,78]. Notably, SSRV1 carries a gene for the recently characterized AcrIII- 393 1, which blocks antiviral response of type III CRISPR-Cas systems by cleaving the cyclic 394 oligoadenylate second messenger [79]. Given that CRISPR-Cas systems are highly prevalent in 395 hyperthermophilic archaea[80], the lack of identifiable anti-CRISPR genes in MRV1 and ARV3 396 is somewhat unexpected, suggesting that the two viruses encode novel Acr proteins. Similarly, 397 lack of the genes encoding recognizable P98-like pyramid proteins in MRV1 and ARV3 (as well 398 as in ARV1 and ARV2) suggests that these viruses have evolved a different solution for virion 399 release. 400

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

401 Besides the core genes, rudiviruses are known to carry a rich complement of variable genes, which 402 typically occupy the termini of linear genomes and are shared with viruses isolated from the same 403 geographical location [81]. For instance, MRV1, ARV3 and SSRV1 carry several genes, which 404 are exclusive to Italian rudiviruses. These include a divergent glycoside hydrolase, putative metal- 405 dependent deubiquitinase, alpha-helical DNA-binding protein, transcription initiation factor and 406 several short hypothetical proteins, which are likely candidates for Acrs (Supplementary 407 Information, Table S2). Notably, some of these hypothetical proteins are shared with other 408 crenarchaeal viruses (families Lipothrixviridae and Bicaudaviridae) isolated from the same 409 location. 410 411 A biogeographic pattern in the Rudiviridae family 412 To gain insight into the global architecture of the rudivirus populations and the factors that govern 413 it, we performed phylogenomic analysis of all available rudivirus genomes using the Genome- 414 BLAST Distance Phylogeny method implemented in VICTOR [66]. Our results unequivocally 415 show that the 19 sequenced rudiviruses fall into six clades corresponding to the geographical 416 origins of the virus isolation (Figure 6), suggesting local adaptation of the corresponding viruses. 417 Thus, on the global scale, horizontal spread of rudivirus virions between geographically remote 418 continental hydrothermal systems appears to be restricted. 419 420 Remarkably, despite forming a monophyletic group, all rudiviruses originating from Italy infect 421 relatively distant hosts, belonging to three different genera of the order Sulfolobales. Such pattern 422 was highly unexpected, because closely related hyperthermophilic archaeal viruses typically infect 423 hosts from the same genus. Thus, we hypothesized that such pattern of host specificities might 424 signify host switching events in the history of the Italian rudivirus assemblage. CRISPR arrays, 425 which contain spacer sequences derived from mobile genetic elements, keep memory of past 426 infections and are commonly used as indicators for matching the uncultivated viruses to their 427 potential hosts [82-84]. Thus, to investigate which hosts were exposed to Italian rudiviruses, we 428 searched the CRISPRdb database [85] for the presence of spacers matching the corresponding viral 429 genomes. Spacer matches were found for all five virus genomes, albeit with different levels of 430 identity. Matches with 100% identity were obtained for ARV2, ARV3, SSRV1 and MRV1 431 (Supplementary Information, Table S3). Spacers matching ARV2 and ARV3 were found in

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

432 Saccharolobus solfataricus P1, whereas ARV2 was also targeted by a CRISPR spacer (100% 433 identity) from Metallosphaera sedula DSM 5348. Unexpectedly, MRV1, which infects 434 Metallosphaera species, was matched by spacers from different strains of S. solfataricus. 435 Conversely, SSRV1 infecting S. solfataricus was targeted by multiple spacers from CRISPR arrays 436 of M. sedula DSM 5348 (Supplementary Information, Table S3). These results are consistent with 437 the possibility that in the recent history of Italian rudiviruses, host switching, even across the genus 438 boundary, has been relatively common. 439 440 Revised classification of rudiviruses 441 Of the 19 rudiviruses for which (near) complete genome sequences are available, only three 442 (SIRV1, SIRV2 and ARV1) are officially classified. All three viruses are included in the same 443 genus, Rudivirus. Here, we propose a taxonomic framework for classification of all cultivated and 444 uncultured rudiviruses for which genome sequences are available. As mentioned above, 445 phylogenomic analysis revealed six different clades coinciding with the origin of virus isolation 446 (Figure 6), highlighting considerable diversity of the natural rudivirus population, which has 447 stratified into several assemblages, warranting their classification into distinct, genus-level 448 taxonomic units. To this end, we compared the genome sequences using the Gegenees tool [67], 449 which fragments the genomes and calculates symmetrical identity scores for each pairwise 450 comparison based on BLASTn hits and a genome length. The analysis revealed seven clusters of 451 related genomes, which were generally consistent with those obtained in the phylogenomic 452 analysis (Table S4). Notably, due to considerable sequence divergence, ARV1 falls into a separate 453 cluster from other Italian rudiviruses. Consistently, in the phylogenomic tree, ARV1 forms a sister 454 group to other Italian rudiviruses. Furthermore, among the five Italian rudiviruses, the genome of 455 ARV1 is most divergent, displaying multiple gene and genomic segment inversions compared to 456 the other viruses (Figure 5). Viruses from the seven clades also differ considerably in terms of the 457 variable gene contents. For instance, viruses SIRV1 and SIRV2 isolated in Iceland share 11 genes 458 that are absent from the USA SIRVs [81]. Thus, to acknowledge the differences between the 459 known rudiviruses, we propose to classify them into seven genera: “Icerudivirus” (former 460 Rudivirus,to include SIRV1-SIRV3), “Mexirudivirus” (SMRV1), “Azorudivirus” (SRV), 461 “Itarudivirus” (ARV1), “Hoswirudivirus” (ARV2, ARV3, MRV1, SSRV1; hoswi-, for host 462 switching), “Japarudivirus” (SBRV1) and “Usarudivirus” (SIRV4-SIRV11).

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

463 464 DISCUSSION 465 Virus discovery in the “age of metagenomics” is increasingly performed by culture-independent 466 methods [86], whereby viral genomes are sequenced directly from the environment. This is a 467 powerful approach which has already yielded thousands of viral genomes, providing 468 unprecedented insights into virus diversity, environmental distribution and evolution [82,87-90]. 469 The limitation of , however, is that the host species for the sequenced viruses 470 typically remain unknown, even if occasionally predicted. Furthermore, truly novel viruses, which 471 lack similarity to known reference virus genomes, often escape identification and remain in the so- 472 called “viral dark matter” [91-94]. Finally, although many features of viruses can be deduced from 473 the genome sequence alone, different molecular aspects of virus-host interactions remain obscure. 474 Thus, to get a full(er) picture of virus biology and to further expand the knowledge on virus 475 diversity, classical approaches of virus isolation and characterization remain highly valuable. Here, 476 we reported the results of our exploration of the diversity of hyperthermophilic archaeal viruses at 477 the solfataric field in Pisciarelli, Pozzuoli. 478 479 Previous sampling of archaeal viruses in the thermal springs in Pisciarelli led to the isolation of 480 viruses from the families Ampullaviridae, Bicaudaviridae, Lipothrixviridae, Rudiviridae and 481 Tristromaviridae [31,45,69,75,95], whereas those of the families Fuselloviridae, Globuloviridae 482 and Guttaviridae, which we observed in the initial enrichment cultures, have not been previously 483 reported from the sampled Pisciarelli sites. Nevertheless, similar virion morphologies have been 484 observed in high-temperature continental hydrothermal systems from other geographical locations 485 across the globe [29,32,70,96,97], pointing to global distribution of most groups of 486 hyperthermophilic archaeal viruses. However, how these virus communities are structured and 487 whether geographically remote hydrothermal ecosystems undergo virus immigration is not fully 488 understood. 489 490 A previous study has revealed a biogeographic pattern among Sulfolobus islandicus-infecting 491 rudiviruses isolated from hot springs in Iceland and United States [81]. That is, viruses from the 492 same geographical location were more closely related to each other than they were to viruses from 493 other locations and the larger the distance the more divergent the virus genomes were. Similar

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

494 observation has been made for the Sulfolobus-infecting spindle-shaped viruses of the family 495 Fuselloviridae [97-99]. By contrast, a study focusing on three relatively closely spaced hot springs 496 in Yellowstone National Park concluded that horizontal virus movement, rather than mutation, is 497 the dominant factor controlling the viral community structure [100]. 498 499 Phylogenomic and comparative genomic analysis of the rudiviruses reported herein and those 500 sequenced previously revealed a strong biogeographic pattern. All rudiviruses formed clades 501 corresponding to the geographical origins of their isolation. This observation strongly suggests 502 that diversification and evolution of rudiviruses is influenced by special confinement within 503 discrete high-temperature continental hydrothermal systems, with little horizontal migration of 504 viral particles over large distances. Consistently, analyses of the CRISPR spacers carried by 505 hyperthermophilic archaea predominantly target local viruses, further indicating geographically 506 defined co-evolution of viruses and their hosts [97,101]. This is in stark contrast with the global 507 architecture of virus communities associated with hyperhalophilic archaea, where genomic 508 similarity between viruses does not correspond to geographical distance [102,103]. Indeed, it has 509 been suggested that hypersaline environments worldwide function like a single habitat, with 510 extensive exchange of microbes and their viruses between remote hypersaline sources [102]. It has 511 been shown that small salt crystals harboring halophilic archaea are carried on bird feathers, 512 suggesting that bird migration is a driving force of haloarchaeal spread across the globe [104]. 513 Obviously, similar route of dissemination is hardly imaginable for microbial communities thriving 514 in acidic high-temperature hydrothermal settings, offering an explanation for the different patterns 515 observed for hyperthermophilic and hyperhalophilic archaeal viruses. Notably, however, it has 516 been suggested that reversible silicification of virus particles, which is conceivable in 517 environments, might promote long-distance host-independent virus dispersal [105]. 518 519 We also show, for the first time, that relatively closely related rudiviruses infect phylogenetically 520 distant hosts, belonging to three different genera. This observation is inconsistent with stable 521 coevolution of rudiviruses with their hosts, but rather points to host switch events in the recent 522 history of Italian rudiviruses. At least in the case of rudivirises, relatively few genetic changes 523 appear to be necessary for gaining the ability to infect a new host. In tailed , host 524 range switches are typically associated with mutations in genes encoding the tail fiber proteins

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

525 responsible for host recognition [106]. Several molecular mechanisms underlying mutability of 526 the tail fiber genes have been described, including genetic drift, diversity generating retroelements 527 and phase variation cassettes. The latter two systems have been demonstrated to function also in 528 viruses infecting anaerobic methane-oxidizing (ANME) and hyperhalophilic archaea, respectively 529 [107,108]. Both mechanisms depend on specific enzymes, namely, reverse-transcriptase and 530 invertase. However, neither of the two systems is present in rudiviruses, suggesting that genetic 531 drift is the most likely mechanism responsible for generating diversity in the host recognition 532 module of rudiviruses. 533 534 Analysis of CRISPR spacers from hyperthermophilic archaea confirmed that viruses closely 535 related to those isolated in this study were infecting highly different hosts in the recent past. Indeed, 536 viruses infecting Saccharolobus and Metallosphaera species were targeted by spacers from 537 CRISPR arrays of Metallosphaera and Saccharolobus, respectively. This observation further 538 reinforces a relatively recent host switch event. Notably, this phenomenon appears to be also 539 applicable to rudivirus from other geographical locations. In particular, SBRV1, a rudivirus from 540 a Japanese hot spring [109], was found to be targeted by 521 unique CRISPR spacers associated 541 with all four principal CRISPR repeat sequences present in Sulfolobales [101], suggesting either 542 very broad host range or frequent host switches. Furthermore, we have recently shown that 543 CRISPR targeting is an important factor driving the genome evolution of hyperthermophilic 544 archaeal viruses [101]. Given the presence of multiple CRISPR spacers matching rudivirus 545 genomes, we hypothesize that necessity to switch hosts might be, to a large extent, driven by 546 CRISPR targeting. Notably, matching of CRISPR spacers to protospacers in viral genomes is one 547 of the most widely used approaches of host identification for viruses discovered by metagenomics, 548 with the estimated host genus prediction accuracy of 70–90% [83,84,89]. Our results suggest that 549 in the case of rudiviruses, spacer matching might not provide accurate predictions beyond the rank 550 of family (i.e., ). 551 552 Collectively, we show that terrestrial hydrothermal systems harbor a highly diverse virome 553 represented by multiple families with unique virus morphologies not described in other 554 environments. Genomes of the newly isolated viruses, especially those infecting Pyrobaculum 555 species, remain a rich source of unknown genes, which could be involved in novel mechanisms of

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

556 virus-host interactions. Furthermore, our results suggest that global rudivirus communities display 557 biogeographic pattern and diversify into distinct lineages confined to discrete geographical 558 locations. This diversification appears to involve relatively frequent host switching, potentially 559 evoked by host CRISPR-Cas immunity systems. Future studies should focus on characterization 560 of molecular mechanisms allowing rudiviruses to avoid CRISPR targeting by infecting new hosts. 561 562 563 ACKNOWLEDGEMENTS 564 This work was supported by l’Agence Nationale de la Recherche (France) project ENVIRA (to 565 M.K.) and the European Union’s Horizon 2020 research and innovation program under grant 566 agreement 685778, project VIRUS X (to D.P.). D.P.B. is part of the Pasteur – Paris University 567 (PPU) International PhD Program. This project has received funding from the European Union's 568 Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant 569 agreement No 665807. We are also grateful to the Ultrastructural BioImaging (UTechS UBI) unit 570 of Institut Pasteur for access to electron microscopes and Marc Monot from the Biomics Platform 571 of Institut Pasteur for helpful discussions on genome assembly. 572 573 COMPETING INTERESTS 574 The authors declare that they have no conflict of interest. 575

21

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

576 REFERENCES 577 1. Prangishvili D, Bamford DH, Forterre P, Iranzo J, Koonin EV and Krupovic M. The enigmatic 578 archaeal virosphere. Nat Rev Microbiol. 2017; 15:724-739. 579 2. Munson-McGee JH, Snyder JC and Young MJ. Archaeal Viruses from High-Temperature 580 Environments. Genes (Basel). 2018; 9. 581 3. Wang H, Peng N, Shah SA, Huang L and She Q. Archaeal extrachromosomal genetic elements. 582 Microbiol Mol Biol Rev. 2015; 79:117-52. 583 4. Dellas N, Snyder JC, Bolduc B and Young MJ. Archaeal Viruses: Diversity, Replication, and 584 Structure. Annu Rev Virol. 2014; 1:399-426. 585 5. Wang H, Guo Z, Feng H, Chen Y, Chen X, Li Z, Hernandez-Ascencio W, Dai X, Zhang Z, Zheng X et 586 al. Novel Sulfolobus Virus with an Exceptional Capsid Architecture. J Virol. 2018; 92. 587 6. Krupovic M, Quemin ER, Bamford DH, Forterre P and Prangishvili D. Unification of the globally 588 distributed spindle-shaped viruses of the Archaea. J Virol. 2014; 88:2354-8. 589 7. Contursi P, Fusco S, Cannio R and She Q. Molecular biology of fuselloviruses and their satellites. 590 . 2014; 18:473-89. 591 8. Bath C and Dyall-Smith ML. His1, an archaeal virus of the Fuselloviridae family that infects 592 Haloarcula hispanica. J Virol. 1998; 72:9392-5. 593 9. Gorlas A, Koonin EV, Bienvenu N, Prieur D and Geslin C. TPV1, the first virus isolated from the 594 hyperthermophilic genus Thermococcus. Environ Microbiol. 2012; 14:503-16. 595 10. Geslin C, Le Romancer M, Erauso G, Gaillard M, Perrot G and Prieur D. PAV1, the first virus-like 596 particle isolated from a hyperthermophilic euryarchaeote, "Pyrococcus abyssi". J Bacteriol. 597 2003; 185:3888-94. 598 11. Kim JG, Kim SJ, Cvirkaite-Krupovic V, Yu WJ, Gwak JH, Lopez-Perez M, Rodriguez-Valera F, 599 Krupovic M, Cho JC and Rhee SK. Spindle-shaped viruses infect marine ammonia-oxidizing 600 thaumarchaea. Proc Natl Acad Sci U S A. 2019; 116:15645-15650. 601 12. Rice G, Stedman K, Snyder J, Wiedenheft B, Willits D, Brumfield S, McDermott T and Young MJ. 602 Viruses from extreme thermal environments. Proc Natl Acad Sci U S A. 2001; 98:13341-5. 603 13. Liu Y, Ishino S, Ishino Y, Pehau-Arnaudet G, Krupovic M and Prangishvili D. A novel type of 604 polyhedral viruses infecting hyperthermophilic archaea. J Virol. 2017; 91:e00589-17. 605 14. Wagner C, Reddy V, Asturias F, Khoshouei M, Johnson JE, Manrique P, Munson-McGee J, 606 Baumeister W, Lawrence CM and Young MJ. Isolation and characterization of Metallosphaera 607 turreted icosahedral virus, a founding member of a new family of archaeal viruses. J Virol. 2017; 608 91:e00925-17. 609 15. Wang F, Liu Y, Su Z, Osinski T, de Oliveira GAP, Conway JF, Schouten S, Krupovic M, Prangishvili D 610 and Egelman EH. A packing for A-form DNA in an icosahedral virus. Proc Natl Acad Sci U S A. 611 2019. 612 16. Veesler D, Ng TS, Sendamarai AK, Eilers BJ, Lawrence CM, Lok SM, Young MJ, Johnson JE and Fu 613 CY. Atomic structure of the 75 MDa Sulfolobus turreted icosahedral virus 614 determined by CryoEM and X-ray crystallography. Proc Natl Acad Sci U S A. 2013; 110:5504-9. 615 17. Krupovic M and Koonin EV. Multiple origins of viral capsid proteins from cellular ancestors. Proc 616 Natl Acad Sci U S A. 2017; 114:E2401-E2410. 617 18. Liu Y, Osinski T, Wang F, Krupovic M, Schouten S, Kasson P, Prangishvili D and Egelman EH. 618 Structural conservation in a membrane-enveloped filamentous virus infecting a 619 hyperthermophilic . Nat Commun. 2018; 9:3360. 620 19. Hochstein R, Bollschweiler D, Dharmavaram S, Lintner NG, Plitzko JM, Bruinsma R, Engelhardt H, 621 Young MJ, Klug WS and Lawrence CM. Structural studies of Acidianus tailed spindle virus reveal 622 a structural paradigm used in the assembly of spindle-shaped viruses. Proc Natl Acad Sci U S A. 623 2018; 115:2120-2125.

22

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

624 20. Ptchelkine D, Gillum A, Mochizuki T, Lucas-Staat S, Liu Y, Krupovic M, Phillips SEV, Prangishvili D 625 and Huiskonen JT. Unique architecture of thermophilic archaeal virus APBV1 and its genome 626 packaging. Nat Commun. 2017; 8:1436. 627 21. Kasson P, DiMaio F, Yu X, Lucas-Staat S, Krupovic M, Schouten S, Prangishvili D and Egelman EH. 628 Model for a novel membrane envelope in a filamentous hyperthermophilic virus. Elife. 2017; 629 6:e26268. 630 22. DiMaio F, Yu X, Rensen E, Krupovic M, Prangishvili D and Egelman EH. Virology. A virus that 631 infects a encapsidates A-form DNA. Science. 2015; 348:914-7. 632 23. Hong C, Pietila MK, Fu CJ, Schmid MF, Bamford DH and Chiu W. Lemon-shaped halo archaeal 633 virus His1 with uniform tail but variable capsid structure. Proc Natl Acad Sci U S A. 2015; 634 112:2449-54. 635 24. Stedman KM, DeYoung M, Saha M, Sherman MB and Morais MC. Structural insights into the 636 architecture of the hyperthermophilic Fusellovirus SSV1. Virology. 2015; 474:105-9. 637 25. Krupovic M, Cvirkaite-Krupovic V, Iranzo J, Prangishvili D and Koonin EV. Viruses of archaea: 638 Structural, functional, environmental and evolutionary genomics. Virus Res. 2018; 244:181-193. 639 26. Iranzo J, Krupovic M and Koonin EV. The Double-Stranded DNA Virosphere as a Modular 640 Hierarchical Network of Gene Sharing. MBio. 2016; 7:e00978-16. 641 27. Iranzo J, Koonin EV, Prangishvili D and Krupovic M. Bipartite Network Analysis of the Archaeal 642 Virosphere: Evolutionary Connections between Viruses and Capsidless Mobile Elements. J Virol. 643 2016; 90:11043-11055. 644 28. Zhan Z, Zhou J and Huang L. Site-Specific Recombination by SSV2 Integrase: Substrate 645 Requirement and Domain Functions. J Virol. 2015; 89:10934-44. 646 29. Mochizuki T, Sako Y and Prangishvili D. Provirus induction in hyperthermophilic archaea: 647 characterization of pernix spindle-shaped virus 1 and ovoid virus 648 1. J Bacteriol. 2011; 193:5412-9. 649 30. Clore AJ and Stedman KM. The SSV1 viral integrase is not essential. Virology. 2007; 361:103-11. 650 31. Prangishvili D, Vestergaard G, Haring M, Aramayo R, Basta T, Rachel R and Garrett RA. Structural 651 and genomic properties of the hyperthermophilic archaeal virus ATV with an extracellular stage 652 of the reproductive cycle. J Mol Biol. 2006; 359:1203-16. 653 32. Wiedenheft B, Stedman K, Roberto F, Willits D, Gleske AK, Zoeller L, Snyder J, Douglas T and 654 Young M. Comparative genomic analysis of hyperthermophilic archaeal Fuselloviridae viruses. J 655 Virol. 2004; 78:1954-61. 656 33. Fusco S, She Q, Fiorentino G, Bartolucci S and Contursi P. Unravelling the Role of the F55 657 Regulator in the Transition from Lysogeny to UV Induction of Sulfolobus Spindle-Shaped Virus 1. 658 J Virol. 2015; 89:6453-61. 659 34. Schleper C, Kubo K and Zillig W. The particle SSV1 from the extremely thermophilic archaeon 660 Sulfolobus is a virus: demonstration of infectivity and of transfection with viral DNA. Proc Natl 661 Acad Sci U S A. 1992; 89:7645-9. 662 35. Fusco S, Aulitto M, Bartolucci S and Contursi P. A standardized protocol for the UV induction of 663 Sulfolobus spindle-shaped virus 1. Extremophiles. 2015; 19:539-46. 664 36. Quemin ER, Chlanda P, Sachse M, Forterre P, Prangishvili D and Krupovic M. Eukaryotic-like virus 665 budding in archaea. MBio. 2016; 7:e01439-16. 666 37. Bize A, Karlsson EA, Ekefjard K, Quax TE, Pina M, Prevost MC, Forterre P, Tenaillon O, Bernander 667 R and Prangishvili D. A unique virus release mechanism in the Archaea. Proc Natl Acad Sci U S A. 668 2009; 106:11306-11. 669 38. Brumfield SK, Ortmann AC, Ruigrok V, Suci P, Douglas T and Young MJ. Particle assembly and 670 ultrastructural features associated with replication of the lytic archaeal virus sulfolobus turreted 671 icosahedral virus. J Virol. 2009; 83:5964-70.

23

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

672 39. Hurwitz S and Lowenstern JB. Dynamics of the Yellowstone hydrothermal system. Rev Geophys. 673 2014; 52:375-411. 674 40. Munson-McGee JH, Peng S, Dewerff S, Stepanauskas R, Whitaker RJ, Weitz JS and Young MJ. A 675 virus or more in (nearly) every cell: ubiquitous networks of virus-host interactions in extreme 676 environments. ISME J. 2018; 12:1706-1714. 677 41. Snyder JC, Bateson MM, Lavin M and Young MJ. Use of cellular CRISPR (clusters of regularly 678 interspaced short palindromic repeats) spacer-based microarrays for detection of viruses in 679 environmental samples. Appl Environ Microbiol. 2010; 76:7251-8. 680 42. Bolduc B, Wirth JF, Mazurie A and Young MJ. Viral assemblage composition in Yellowstone acidic 681 hot springs assessed by network analysis. ISME J. 2015; 9:2162-77. 682 43. Piochi M, Kilburn CRJ, Di Vito MA, Mormone A, Tramelli A, Troise C and De Natale G. The 683 volcanic and geothermally active Campi Flegrei caldera: an integrated multidisciplinary image of 684 its buried structure. Int Journ Earth Sci. 2014; 103:401-421. 685 44. Piochi M, Mormone A, Balassone G, Strauss H, Troise C and De Natale G. Native sulfur, sulfates 686 and from the active Campi Flegrei volcano (southern Italy): Genetic environments and 687 degassing dynamics revealed by mineralogy and isotope geochemistry. J Volcanol Geotherm 688 Res. 2015; 304:180-193. 689 45. Rensen EI, Mochizuki T, Quemin E, Schouten S, Krupovic M and Prangishvili D. A virus of 690 hyperthermophilic archaea with a unique architecture among DNA viruses. Proc Natl Acad Sci U 691 S A. 2016; 113:2478-83. 692 46. Zillig W, Kletzin A, Schleper C, Holz I, Janekovic D, Hain J, Lanzendörfer M and Kristjansson JK. 693 Screening for Sulfolobales, their and their Viruses in Icelandic Solfataras. Syst Appl 694 Microbiol. 1993; 16:609-628. 695 47. Quemin ER, Pietila MK, Oksanen HM, Forterre P, Rijpstra WI, Schouten S, Bamford DH, 696 Prangishvili D and Krupovic M. Sulfolobus Spindle-Shaped Virus 1 Contains Glycosylated Capsid 697 Proteins, a Cellular Chromatin Protein, and Host-Derived Lipids. J Virol. 2015; 89:11681-91. 698 48. Schneider CA, Rasband WS and Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat 699 Methods. 2012; 9:671-5. 700 49. Huber R, Sacher M, Vollmann A, Huber H and Rose D. Respiration of arsenate and selenate by 701 hyperthermophilic archaea. Syst Appl Microbiol. 2000; 23:305-14. 702 50. Amo T, Paje ML, Inagaki A, Ezaki S, Atomi H and Imanaka T. Pyrobaculum calidifontis sp. nov., a 703 novel hyperthermophilic archaeon that grows in atmospheric air. Archaea. 2002; 1:113-21. 704 51. Sako Y, Nunoura T and Uchida A. Pyrobaculum oguniense sp. nov., a novel facultatively aerobic 705 and hyperthermophilic archaeon growing at up to 97 degrees C. Int J Syst Evol Microbiol. 2001; 706 51:303-9. 707 52. Bettstetter M, Peng X, Garrett RA and Prangishvili D. AFV1, a novel virus infecting 708 hyperthermophilic archaea of the genus acidianus. Virology. 2003; 315:68-79. 709 53. She Q, Singh RK, Confalonieri F, Zivanovic Y, Allard G, Awayez MJ, Chan-Weiher CC, Clausen IG, 710 Curtis BA, De Moors A et al. The complete genome of the crenarchaeon 711 P2. Proc Natl Acad Sci U S A. 2001; 98:7835-40. 712 54. Guo L, Brugger K, Liu C, Shah SA, Zheng H, Zhu Y, Wang S, Lillestol RK, Chen L, Frank J et al. 713 Genome analyses of Icelandic strains of Sulfolobus islandicus, model for genetic and 714 virus-host interaction studies. J Bacteriol. 2011; 193:1672-80. 715 55. Jaubert C, Danioux C, Oberto J, Cortez D, Bize A, Krupovic M, She Q, Forterre P, Prangishvili D 716 and Sezonov G. Genomics and genetics of Sulfolobus islandicus LAL14/1, a model 717 hyperthermophilic archaeon. Open Biol. 2013; 3:130010.

24

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

718 56. Gudbergsdottir S, Deng L, Chen Z, Jensen JV, Jensen LR, She Q and Garrett RA. Dynamic 719 properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with vector- 720 borne viral and genes and protospacers. Mol Microbiol. 2011; 79:35-49. 721 57. Chen L, Brugger K, Skovgaard M, Redder P, She Q, Torarinsson E, Greve B, Awayez M, Zibat A, 722 Klenk HP et al. The genome of Sulfolobus acidocaldarius, a model of the 723 Crenarchaeota. J Bacteriol. 2005; 187:4992-9. 724 58. Pausan MR, Csorba C, Singer G, Till H, Schopf V, Santigli E, Klug B, Hogenauer C, Blohs M and 725 Moissl-Eichinger C. Exploring the Archaeome: Detection of Archaeal Signatures in the Human 726 Body. Front Microbiol. 2019; 10:2796. 727 59. Bolger AM, Lohse M and Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. 728 Bioinformatics. 2014; 30:2114-20. 729 60. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, 730 Pham S, Prjibelski AD et al. SPAdes: a new genome assembly algorithm and its applications to 731 single-cell sequencing. J Comput Biol. 2012; 19:455-77. 732 61. Lukashin AV and Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids 733 Res. 1998; 26:1107-15. 734 62. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, 735 Kubal M et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 736 2008; 9:75. 737 63. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ. Gapped BLAST 738 and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 739 25:3389-402. 740 64. Söding J, Biegert A and Lupas AN. The HHpred interactive server for protein homology detection 741 and structure prediction. Nucleic Acids Res. 2005; 33:W244-8. 742 65. Krogh A, Larsson B, von Heijne G and Sonnhammer EL. Predicting transmembrane protein 743 topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001; 744 305:567-80. 745 66. Meier-Kolthoff JP and Goker M. VICTOR: genome-based phylogeny and classification of 746 prokaryotic viruses. Bioinformatics. 2017; 33:3396-3404. 747 67. Ågren J, Sundström A, Håfström T and Segerman B. Gegenees: fragmented alignment of multiple 748 genomes for determining phylogenomic distances and genetic signatures unique for specified 749 target groups. PLoS One. 2012; 7:e39107. 750 68. Quemin ER, Lucas S, Daum B, Quax TE, Kuhlbrandt W, Forterre P, Albers SV, Prangishvili D and 751 Krupovic M. First insights into the entry process of hyperthermophilic archaeal viruses. J Virol. 752 2013; 87:13379-85. 753 69. Vestergaard G, Haring M, Peng X, Rachel R, Garrett RA and Prangishvili D. A novel rudivirus, 754 ARV1, of the hyperthermophilic archaeal genus Acidianus. Virology. 2005; 336:83-92. 755 70. Häring M, Peng X, Brugger K, Rachel R, Stetter KO, Garrett RA and Prangishvili D. Morphology 756 and genome organization of the virus PSV of the hyperthermophilic archaeal genera 757 Pyrobaculum and Thermoproteus: a novel virus family, the Globuloviridae. Virology. 2004; 758 323:233-42. 759 71. Ahn DG, Kim SI, Rhee JK, Kim KP, Pan JG and Oh JW. TTSV1, a new virus-like particle isolated 760 from the hyperthermophilic crenarchaeote Thermoproteus tenax. Virology. 2006; 351:280-90. 761 72. Krupovic M, White MF, Forterre P and Prangishvili D. Postcards from the edge: structural 762 genomics of archaeal viruses. Adv Virus Res. 2012; 82:33-62. 763 73. Neumann H, Schwass V, Eckerskorn C and Zillig W. Identification and characterization of the 764 genes encoding three structural proteins of the Thermoproteus tenax virus TTV1. Mol Gen 765 Genet. 1989; 217:105-10.

25

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

766 74. Prangishvili D, Rensen E, Mochizuki T, Krupovic M and Ictv Report C. ICTV Virus 767 Profile: Tristromaviridae. J Gen Virol. 2019; 100:135-136. 768 75. Gudbergsdóttir SR, Menzel P, Krogh A, Young M and Peng X. Novel viral genomes identified from 769 six metagenomes reveal wide distribution of archaeal viruses and high viral diversity in 770 terrestrial hot springs. Environ Microbiol. 2016; 18:863-74. 771 76. Prangishvili D, Koonin EV and Krupovic M. Genomics and biology of Rudiviruses, a model for the 772 study of virus-host interactions in Archaea. Biochem Soc Trans. 2013; 41:443-50. 773 77. He F, Bhoobalan-Chitty Y, Van LB, Kjeldsen AL, Dedola M, Makarova KS, Koonin EV, Brodersen DE 774 and Peng X. Anti-CRISPR proteins encoded by archaeal lytic viruses inhibit subtype I-D immunity. 775 Nat Microbiol. 2018; 3:461-469. 776 78. Quax TE, Krupovic M, Lucas S, Forterre P and Prangishvili D. The Sulfolobus rod-shaped virus 2 777 encodes a prominent structural component of the unique virion release system in Archaea. 778 Virology. 2010; 404:1-4. 779 79. Athukoralage JS, McMahon SA, Zhang C, Grüschow S, Graham S, Krupovic M, Whitaker RJ, 780 Gloster T and White MF. A viral ring nuclease anti-CRISPR subverts type III CRISPR immunity. 781 Nature. 2019; In press. 782 80. Iranzo J, Lobkovsky AE, Wolf YI and Koonin EV. Evolutionary dynamics of the prokaryotic 783 adaptive immunity system CRISPR-Cas in an explicit ecological context. J Bacteriol. 2013; 784 195:3834-44. 785 81. Bautista MA, Black JA, Youngblut ND and Whitaker RJ. Differentiation and Structure in 786 Sulfolobus islandicus Rod-Shaped Virus Populations. Viruses. 2017; 9. 787 82. Roux S, Krupovic M, Daly RA, Borges AL, Nayfach S, Schulz F, Sharrar A, Matheus Carnevali PB, 788 Cheng JF, Ivanova NN et al. Cryptic inoviruses revealed as pervasive in bacteria and archaea 789 across Earth's biomes. Nat Microbiol. 2019; 4:1895-1906. 790 83. Roux S, Adriaenssens EM, Dutilh BE, Koonin EV, Kropinski AM, Krupovic M, Kuhn JH, Lavigne R, 791 Brister JR, Varsani A et al. Minimum Information about an Uncultivated Virus Genome (MIUViG). 792 Nat Biotechnol. 2019; 37:29-37. 793 84. Edwards RA, McNair K, Faust K, Raes J and Dutilh BE. Computational approaches to predict 794 -host relationships. FEMS Microbiol Rev. 2016; 40:258-72. 795 85. Grissa I, Vergnaud G and Pourcel C. The CRISPRdb database and tools to display CRISPRs and to 796 generate dictionaries of spacers and repeats. BMC Bioinformatics. 2007; 8:172. 797 86. Simmonds P, Adams MJ, Benko M, Breitbart M, Brister JR, Carstens EB, Davison AJ, Delwart E, 798 Gorbalenya AE, Harrach B et al. Consensus statement: Virus taxonomy in the age of 799 metagenomics. Nat Rev Microbiol. 2017; 15:161-168. 800 87. Gregory AC, Zayed AA, Conceicao-Neto N, Temperton B, Bolduc B, Alberti A, Ardyna M, 801 Arkhipova K, Carmichael M, Cruaud C et al. Marine DNA Viral Macro- and Microdiversity from 802 Pole to Pole. Cell. 2019; 177:1109-1123 e14. 803 88. Nishimura Y, Watai H, Honda T, Mihara T, Omae K, Roux S, Blanc-Mathieu R, Yamamoto K, 804 Hingamp P, Sako Y et al. Environmental Viral Genomes Shed New Light on Virus-Host 805 Interactions in the Ocean. mSphere. 2017; 2. 806 89. Roux S, Brum JR, Dutilh BE, Sunagawa S, Duhaime MB, Loy A, Poulos BT, Solonenko N, Lara E, 807 Poulain J et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean 808 viruses. Nature. 2016; 537:689-693. 809 90. Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann M, Mikhailova N, 810 Rubin E, Ivanova NN and Kyrpides NC. Uncovering Earth's virome. Nature. 2016; 536:425-30. 811 91. Reyes A, Semenkovich NP, Whiteson K, Rohwer F and Gordon JI. Going viral: next-generation 812 sequencing applied to phage populations in the human gut. Nat Rev Microbiol. 2012; 10:607-17.

26

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

813 92. Roux S, Hallam SJ, Woyke T and Sullivan MB. Viral dark matter and virus-host interactions 814 resolved from publicly available microbial genomes. Elife. 2015; 4. 815 93. Minot S, Bryson A, Chehoud C, Wu GD, Lewis JD and Bushman FD. Rapid evolution of the human 816 gut virome. Proc Natl Acad Sci U S A. 2013; 110:12450-5. 817 94. Fawaz M, Vijayakumar P, Mishra A, Gandhale PN, Dutta R, Kamble NM, Sudhakar SB, 818 Roychoudhary P, Kumar H, Kulkarni DD et al. Duck gut viral metagenome analysis captures 819 snapshot of viral diversity. Gut Pathog. 2016; 8:30. 820 95. Haring M, Rachel R, Peng X, Garrett RA and Prangishvili D. Viral diversity in hot springs of 821 Pozzuoli, Italy, and characterization of a unique archaeal virus, Acidianus bottle-shaped virus, 822 from a new family, the Ampullaviridae. J Virol. 2005; 79:9904-11. 823 96. Arnold HP, Ziese U and Zillig W. SNDV, a novel virus of the extremely thermophilic and 824 acidophilic archaeon Sulfolobus. Virology. 2000; 272:409-16. 825 97. Pauly MD, Bautista MA, Black JA and Whitaker RJ. Diversified local CRISPR-Cas immunity to 826 viruses of Sulfolobus islandicus. Philos Trans R Soc Lond B Biol Sci. 2019; 374:20180093. 827 98. Held NL and Whitaker RJ. Viral biogeography revealed by signatures in Sulfolobus islandicus 828 genomes. Environ Microbiol. 2009; 11:457-66. 829 99. Stedman KM, Clore A and Combet-Blanc Y. (2006) In Logan, N. A., Lappin-Scott, H. M. and 830 Oyston, P. C. F. (eds.), Prokaryotic Diversity: Mechanisms and Significance. Cambridge University 831 Press, Cambridge, Vol. 66, pp. 131-143. 832 100. Snyder JC, Wiedenheft B, Lavin M, Roberto FF, Spuhler J, Ortmann AC, Douglas T and Young M. 833 Virus movement maintains local virus population diversity. Proc Natl Acad Sci U S A. 2007; 834 104:19102-7. 835 101. Medvedeva S, Liu Y, Koonin EV, Severinov K, Prangishvili D and Krupovic M. Virus-borne mini- 836 CRISPR arrays are involved in interviral conflicts. Nat Commun. 2019; In press. 837 102. Atanasova NS, Roine E, Oren A, Bamford DH and Oksanen HM. Global network of specific virus- 838 host interactions in hypersaline environments. Environ Microbiol. 2012; 14:426-40. 839 103. Atanasova NS, Bamford DH and Oksanen HM. Virus-host interplay in high salt environments. 840 Environ Microbiol Rep. 2016; 8:431-44. 841 104. Kemp B, Tabish E, Wolford A, Jones D, Butler J and Baxter B. The Biogeography of Great Salt Lake 842 Halophilic Archaea: Testing the Hypothesis of Avian Mechanical Carriers. Diversity. 2018; 10:124. 843 105. Laidler JR, Shugart JA, Cady SL, Bahjat KS and Stedman KM. Reversible inactivation and 844 desiccation tolerance of silicified viruses. J Virol. 2013; 87:13927-9. 845 106. De Sordi L, Khanna V and Debarbieux L. The Gut Microbiota Facilitates Drifts in the Genetic 846 Diversity and Infectivity of Bacterial Viruses. Cell Host Microbe. 2017; 22:801-808 e3. 847 107. Paul BG, Bagby SC, Czornyj E, Arambula D, Handa S, Sczyrba A, Ghosh P, Miller JF and Valentine 848 DL. Targeted diversity generation by intraterrestrial archaea and archaeal viruses. Nat Commun. 849 2015; 6:6585. 850 108. Klein R, Rossler N, Iro M, Scholz H and Witte A. Haloarchaeal myovirus phiCh1 harbours a phase 851 variation system for the production of protein variants with distinct cell surface adhesion 852 specificities. Mol Microbiol. 2012; 83:137-50. 853 109. Liu Y, Brandt D, Ishino S, Ishino Y, Koonin EV, Kalinowski J, Krupovic M and Prangishvili D. New 854 archaeal viruses discovered by metagenomic analysis of viral communities in enrichment 855 cultures. Environ Microbiol. 2019; 21:2002-2014. 856 857

27

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

858 FIGURES AND LEGENDS

859 860 Figure 1. Electron micrographs of the VLPs observed in enrichment cultures: (A) fuselloviruses 861 (tailless lemon-shaped virions), (B) bicaudaviruses (large, tailed lemon-shaped virions), (C) 862 ampullaviruses (bottle-shaped virions), (D) rudiviruses (rod-shaped virions), (E) lipothrixviruses 863 (filamentous virions), (F) globuloviruses (spherical enveloped virions), (G) tristromaviruses 864 (filamentous enveloped virions) and (H) guttaviruses (droplet-shaped virions). Samples were 865 negatively stained with 2% (wt/vol) uranyl acetate. Bars. 200 nm. 866

28

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

867 868 Figure 2. Electron micrographs of the five isolated viruses. A. Metallosphaera rod-shaped virus 869 1. B. Acidianus rod-shaped virus 3. C. Saccharolobus solfataricus rod-shaped virus 1.D. 870 Pyrobaculum filamentous virus 2. E. Pyrobaculum spherical virus 2. Samples were negatively 871 stained with 2% (wt/vol) uranyl acetate. Scale bars: 500 nm; in insets: 100 nm. 872 873 874 875 876 877

29

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

878 879 Figure 3. Genome alignment of the three members of the Globuloviridae family. The open reading 880 frames (ORFs) are represented by arrows that indicate the direction of transcription. The terminal 881 inverted repeats (TIRs) are denoted by black bars at the ends of the genomes. Genes encoding the 882 major structural proteins are shown in dark grey. The functional annotations of the predicted ORFs 883 are depicted above/below the corresponding ORF. Homologous genes are connected by shading 884 in grayscale based on the level of amino acid sequence identity. VP, virion protein; wHTH, winged 885 helix-turn-helix domain. 886 887 888

889 890 Figure 4. Genome comparison of the three members of the Tristromaviridae family. The open 891 reading frames (ORFs) are represented by arrows that indicate the direction of transcription. The 892 terminal inverted repeats (TIRs) are denoted by black bars at the ends of the genomes. Genes 893 encoding the major structural proteins are shown in dark grey, whereas the four-gene block 894 discussed in the text is shown in black. The functional annotations of the predicted ORFs are 895 depicted above/below the corresponding ORFs. Homologous genes are connected by shading in 896 grayscale based on the level of amino acid sequence identity. The dotted line represents the 897 incompleteness of the TTV1 genome. GHase, glycoside hydrolase; GTase, glycosyltransferase; 898 TP/VP, virion protein.

30

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

899

900 901 Figure 5. Genome alignment of the Italian rudiviruses. ARV3, MRV1 and SSRV1 are newly 902 isolated members of the Rudiviridae family, whereas ARV1 and ARV2 were reported previously 903 [69,75]. The open reading frames (ORFs) are represented by arrows that indicate the direction of 904 transcription. The terminal inverted repeats (TIRs) are denoted by black bars at the ends of the 905 genomes. The functional annotations of the predicted ORFs are depicted above/below the 906 corresponding ORFs. Homologous genes are connected by shading in grayscale based on the 907 amino acid sequence identity. ATase, acetyltransferase; GHase, glycoside hydrolase; GTase, 908 glycosyltransferase; HJR, Holliday junction resolvase; MCP, major capsid protein; MTase, 909 methyltransferase; SSB, ssDNA binding protein; TFB, Transcription factor B; ThyX, thymidylate 910 synthase; wHTH, winged helix-turn-helix domain. 911

31

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

912 913 Figure 6. Inferred phylogenomic tree of all known members of the Rudiviridae family based on 914 whole genome VICTOR [66] analysis at the amino acid level. The tree is rooted with 915 lipothrixviruses, and the branch length is scaled in terms of the Genome BLAST Distance 916 Phylogeny (GBDP) distance formula D6. Only branch support values >70% are shown. For each 917 genome, the abbreviated virus name, GenBank accession number and host organism (when 918 known) are indicated. Question marks denote that the host is not known. The tree is divided into 919 colored blocks according to the geographical origin of the compared viruses. 920

32

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.16.907410; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

921 Table 1. Host range of the archaeal viruses isolated in this study.

Newly-isolated archaeal virus Host strain MRV1 ARV3 SSRV1 PSV2 PFV2 Metallosphaera sedula POZ202 H - - - - Acidianus brierleyi POZ9 - H - - - Acidianus convivator - - - - - Acidianus hospitalis - + - - - Saccharolobus solfataricus P1 - - - - - Saccharolobus solfataricus P2 - - - - - Saccharolobus solfataricus POZ149 - - H - - Sulfolobus islandicus LAL14/1 - - - - - Sulfolobus islandicus REN2H1 - - - - - Sulfolobus islandicus HVE10/4 - - - - - Sulfolobus islandicus REY15A - - - - - Sulfolobus islandicus ΔC1C2 - - - - - Sufolobus acidocaldarius DSM639 - - - - - Pyrobaculum arsenaticum PZ6 (DSM 13514) - - - - - Pyrobaculum arsenaticum 2GA - - - H H Pyrobaculum calidifontis VA1 (DSM 21063) - - - - - Pyrobaculum oguniense TE7 (DSM 13380) - - - - + 922 H, isolation host; +, supports virus replication.

923

924 Table 2. Genomic features of the hyperthermophilic archaeal viruses isolated in this study.

Virus Host Size (bp) TIR Topology GC% ORFs Accession # MRV1 Metallosphaera sedula POZ202 20269 + linear 34.12 27 MN876843 ARV3 Acidianus brierleyi POZ9 23666 - linear 32.02 33 MN876842 SSRV1 Saccharolobus solfataricus POZ149 26097 - linear 32.31 37 MN876841 PSV2 Pyrobaculum arsenaticum 2GA 18212 + linear 45.01 32 MN876845 PFV2 Pyrobaculum arsenaticum 2GA 17602 + linear 45.32 39 MN876844 925 TIR, terminal inverted repeats; ORFs, open reading frames. 926

33