Structure of the archaeal head-tailed virus HSTV-1 completes the HK97 fold story

Maija K. Pietiläa,b, Pasi Laurinmäkib, Daniel A. Russellc, Ching-Chung Koc, Deborah Jacobs-Serac, Roger W. Hendrixc, Dennis H. Bamforda,b, and Sarah J. Butcherb,1

aDepartment of Biosciences and bInstitute of Biotechnology, University of Helsinki, FI-00014, Helsinki, Finland; and cDepartment of Biological Sciences, Pittsburgh Bacteriophage Institute, University of Pittsburgh, Pittsburgh, PA 15260

Edited by Michael G. Rossmann, Purdue University, West Lafayette, IN, and approved May 13, 2013 (received for review March 16, 2013) It has been proposed that viruses can be divided into a small number particles are most likely nanocompartments for enzyme storage, of structure-based viral lineages. One of these lineages is exemplified not viruses because they contain no nucleic acids (24, 25). by bacterial virus Hong Kong 97 (HK97), which represents the head- To date, only approximately 30 high-resolution struc- tailed dsDNA bacteriophages. Seemingly similar viruses also infect tures from archaeal viruses have been determined, and only 5 are archaea. Here we demonstrate using genomic analysis, electron MCPs. However, none come from head-tailed viruses (26). All cryomicroscopy, and image reconstruction that the major coat protein known archaeal head-tailed viruses infect either extreme halophiles fold of newly isolated archaeal Haloarcula sinaiiensis tailed virus 1 or anaerobic methanogens belonging to the phylum Euryarchaeota has the canonical coat protein fold of HK97. Although it has been (20). Here we set out to screen a previously isolated group of anticipated previously, this is physical evidence that bacterial and haloarchaeal head-tailed viruses (19) to find a suitable candidate archaeal head-tailed viruses share a common architectural principle. for the determination of the MCP fold by electron cryomicro- The HK97-like fold has previously been recognized also in herpesvi- scopy (cryo-EM). A podovirus, Haloarcula sinaiiensis tailed virus 1 ruses, and this study expands the HK97-like lineage to viruses from all (HSTV-1) (19), was chosen because it was also stable at low sa- three domains of life. This is only the second established lineage to linity and could be readily purified. The genome of HSTV-1 was include archaeal, bacterial, and eukaryotic viruses. Thus, our findings sequenced, and the major structural were identified. The support the hypothesis that the last common universal ancestor of 3D reconstruction of the HSTV-1 capsid at subnanometer reso- cellular organisms was infected by a number of different viruses. lution revealed that the virus adopts the HK97-like fold. Thus, the BIOPHYSICS AND HK97-like lineage can be extended to archaeal head-tailed viru- COMPUTATIONAL BIOLOGY archaeal virus | virus evolution | major capsid protein fold ses, suggesting a common ancestor for prokaryotic head-tailed viruses and eukaryotic herpesviruses. very organism is constantly under the attack of viral infections Ebecause viruses outnumber cellular organisms by at least an Results order of magnitude. It has been estimated that viruses kill ap- Virus Production and Morphology. HSTV-1 virus sloppy agar stocks 11 proximately one fifth of the marine biomass on a daily basis (1, 2). had high titers, on average 3 × 10 pfu/mL. Turbidity measure- Furthermore, viruses are the leading cause of infectious diseases ments of the infected cultures demonstrated that the virus is lytic (3). Thus, viruses have a crucial role in controlling the evolution of (Fig. S1). Virus titers in liquid cultures, however, reached only ap- 9 cellular organisms owing to their high abundance and genetic di- proximately 6 × 10 pfu/mL, and for this reason the virus was pu- versity (1). However, cryoelectron microscopic and crystallographic rified directly from the stocks. The specific infectivity of the purified 12 studies have revealed that this genetic diversity may be reduced to HSTV-1 virions was ∼9 × 10 pfu/mg of protein, and the density in a limited number of structure-based viral lineages (2, 4–6). CsCl was ∼1.45 g/mL, which is typical of head-tailed viruses (27). The discovery that a bacterial and an animal virus have the Cryo-EM micrographs of HSTV-1 virions at high and low salinity same major capsid protein (MCP) fold and virion architecture (5, showed that the overall morphotype remained unaltered despite 7) led to the hypothesis that viruses infecting cells that belong to changes in salinity (Fig. 1A). HSTV-1 was inactivated at low salinity, different domains of life (in this case Bacteria and Eukarya) may but the infectivity was regained when the high salinity conditions have a common origin and be related, although there is no de- were restored (Fig. 1B).HSTV-1hadanisometric,genome-filled tectable sequence similarity. Further examination of viral capsid capsid and a short, podovirus-like tail with at least five long fibers protein structures has revealed that this double-β-barrel MCP of attached to the central tail structure (Fig. 1, Inset). tailless icosahedral viruses is also present in viruses infecting ar- chaea (5, 8, 9). These observations have led to the hypothesis of Genome Sequence and Structural Proteins. The genome of HSTV-1 viral structure-based lineages unifying viruses previously consid- contained 53 putative open reading frames (ORFs) (Table S1). As ered unrelated and leading to a paradigm shift in how we see the has been seen in the few head-tailed archaeal viruses sequenced viral universe (2, 4, 5). previously (21, 28, 29), the organization of the genome is remi- A second lineage has been proposed containing head-tailed niscent of the genomes of tailed bacteriophages: the predicted bacterial viruses and herpesviruses sharing the so-called canonical occupy 90% of the genome and are closely spaced and in Hong Kong 97 (HK97)-like MCP fold (4, 10). This fold was first determined for bacteriophage HK97 (10) and has since been dis- covered in seven additional phages and in herpes simplex virus type Author contributions: M.K.P., R.W.H., D.H.B., and S.J.B. designed research; M.K.P., P.L., D.A.R., and C.-C.K. performed research; M.K.P., P.L., D.J.-S., R.W.H., D.H.B., and S.J.B. 1 (11-18). Head-tailed viruses morphologically similar to phages analyzed data; and M.K.P., R.W.H., D.H.B., and S.J.B. wrote the paper. are also known to infect archaeal organisms (19, 20). Comparative The authors declare no conflict of interest. genomics indicate that these archaeal viruses share common genes This article is a PNAS Direct Submission. with bacteriophages for virion assembly and maturation, as well as Data deposition: The sequence reported in this paper has been deposited in the GenBank genome packaging (21, 22). The fundamental issue is whether ar- database (accession no. KC117378). The 3D reconstruction has been deposited with the chaeal head-tailed viruses also share the canonical HK97-like Electron Microscopy Data Bank (accession no. EMD-2279). MCP fold, as suggested by bioinformatics (21, 23). Interestingly, 1To whom correspondence should be addressed. E-mail: sarah.butcher@helsinki.fi. the HK97-like fold has been recognized in icosahedral particles This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. produced by hyperthermophilic archaea (24, 25). However, these 1073/pnas.1303047110/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1303047110 PNAS Early Edition | 1of6 Downloaded by guest on September 29, 2021 Fig. 1. Podovirus HSTV-1 tolerates salinity changes. (A) Cryo-EMs of HSTV-1 at 4-μm underfocus. (I) Virions incubated in 9% (wt/vol) SW containing 1.2 M NaCl, (II) virions in 20 mM Tris·HCl (pH 7.2) buffer, and (III) virions first incubated in 20 mM Tris·HCl (pH 7.2) and then restored in 9% (wt/vol) SW. (Inset) (6-μm underfocus) A particle viewed along the tail and showing possible tail fibers. (Scale bars, 50 nm.) (B) Specific infectivity of HSTV-1 virions at a logarithmic scale. (I, II, and III) High and low salinity conditions as in A.

some cases slightly overlapping (Fig. 2A). The majority of the pre- The HSTV-1 genome is 32,189 bp long. The sequence assembles dicted proteins of HSTV-1 make no sequence matches to the se- as a circle, and there is no indication in the sequence data for quence databases in a BlastP search, and of the ones that do match, discrete ends, which argues that the genome is circularly permuted more than half match “hypothetical proteins” with unknown func- across the population of virions. As 1 we have chosen the tion. The predicted proteins that do match proteins of known first base pair of the large terminase subunit . We identified function are also typical of the tailed bacteriophages, with functions four genes encoding structural proteins and two more as genes such as DNA replication, nucleotide metabolism, and structural encoding enzymes (terminase and protease) required for capsid proteins of the virion. Some of the structural proteins can be assembly and maturation: the terminase large subunit gene (gene 1) identified by a simple BlastP search, and some are found with more and the portal gene (gene 3) are identified directly by the se- extensive searching (PSI-Blast, HHpred). We confirmed a few of quence similarity of their protein products to the corresponding these identifications by N-terminal sequencing of virion compo- proteins of tailed bacteriophages. The portal similarity extends nents (Fig. 2 B and C). Strikingly, the order of the genes for struc- only through the first two-thirds of the sequence; the last one-third tural proteins identified in this way is the same as the order almost is homologous to a minor head protein found in some bacter- always observed for the corresponding genes in the genomes of iophages (gpF in phage Mu; gp7 in phage SPP1), which is encoded tailed bacteriophages. by a gene immediately following the portal gene. The two genes are

Fig. 2. Genome and proteins of HSTV-1. (A) Genome of HSTV-1 is presented with markers spaced at 1-kbp and 100-bp intervals. The predicted genes are shown as boxes either above or below the genome, depending on whether they are rightward or leftward transcribed, respectively. The gene numbers are shown for annotated genes, and putative functions are shown above the genes. Abbreviated functions are histidine-asparagine-histidine homing endonuclease(HNH), proliferating cell nuclear antigen family of sliding clamps (PCNA), and a minichromosome maintenance helicase (MCM). (B)Proteinprofile of the purified virions in a tricine-SDS-polyacrylamide gel stained with Coomassie blue. Numbers on the left indicate molecular mass markers. The identified proteins are indicated by the corresponding gp (gene product) number. (C) N-terminal sequences determined from the protein bands gp12, gp29, and gp14. The first two amino acids of the N-terminal sequence of gp12 and gp29 could not be identified. The theoretical molecular masses of the mature proteins are in parentheses.

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1303047110 Pietilä et al. Downloaded by guest on September 29, 2021 evidently fused in HSTV-1. The major capsid gene (gene 14) was Gene 19 encodes a putative homing endonuclease. Such endo- identified by N-terminal sequencing of the most prominent virion nuclease genes are usually contained within an intron, which typ- protein revealed by SDS-gel electrophoresis (Fig. 2B). This se- ically disrupts an essential gene. It seems likely that gene 19 is in quence starts at amino acid 107 of the predicted protein, indicating fact part of an intron, inasmuch as it is within an island of low G+C that the major capsid protein is cleaved at this position, presumably content that extends from nucleotide ∼12,650 to ∼13,510. during capsid maturation. The protease is probably the product of gene 13; the sequence-based evidence for this identification is not 3D Structure of the Capsid. We solved the structure of the DNA- strong, but the location of the gene just upstream from the gene filled capsid of HSTV-1 using cryo-EM and 3D image re- encoding its putative target is consistent with the arrangement construction. High salt conditions lower the contrast and conse- found in tailed bacteriophages. Two additional genes (genes 12 quently inhibit high-resolution reconstruction (30). To avoid this, and 29) are identified as encoding virion components on the basis we imaged the virions in low salt conditions, which they could still of the N-terminal sequences of their proteins (Fig. 2 B and C). The tolerate, allowing high-resolution structure determination to 8.9 Å sequence of gene 12 does not identify any known viral proteins, (Fig. 3) (31). but sequence-based structural predictions show a high probability The virion has several layers of highly ordered, concentrically for a protein with a β-helix structure. Gene 12 is unusual in having packed dsDNA with a spacing of ∼23.6 Å (Fig. 3A). The genome a distinctly different G+C content from most of the rest of the packaging density is 0.46 bp/nm3. The diameter of the HSTV-1 genome (46% G+C for gene 12, compared with 60% for the ge- capsid is 560 Å facet to facet and 624 Å vertex to vertex (Fig. nome as a whole). Gene 29 also encodes a moderately abundant 3A). The shell is only 20 Å thick. On the virion surface, HSTV-1 component of the virion. Because the tail fibers are the most has 20 cone-shaped towers (Fig. 3 A and B). These towers are at prominent feature of the virions not otherwise accounted for, we least 138 Å long and 74 Å wide, located at positions of threefold suggest that they are composed of the protein encoded by gene 29. symmetry at the center of each facet. The tower density in the There are five additional plausible functional identifications in reconstruction is considerably lower than that of the rest of the the remainder of the HSTV-1 genome. Gene 48 encodes a puta- capsid, yet the structure is well defined, suggesting that not all tive cytosine methyl transferase and gene 52 a putative adenine tower sites are occupied (Fig. 3 B and C). methyl transferase. We find two likely DNA replication functions, The handedness of the capsid was determined using tilt experi- a member of the proliferating cell nuclear antigen family of ments and HK97 proheads [triangulation number (T-number) = 7 sliding clamps encoded by gene 40 and a member of the mini- laevo] as a control. Capsomers of the virion are arranged on a T = 7

maintenance family of replication helicases encoded laevo lattice, with the capsid protein clustered into flat hexamers and BIOPHYSICS AND

by gene 45. These are the archaeal versions of replication angular pentamers, similar to those seen in HK97. Thus, there are COMPUTATIONAL BIOLOGY functions that are frequently found in bacteriophage genomes. seven nonequivalent gp14 subunits, organized into an ABCDEF

Fig. 3. HSTV-1 capsid structures. (A) A 2.26-Å-thick central section of the HSTV-1 icosahedral reconstruction viewed along an icosahedral twofold axis. Twofold (ellipse), threefold (triangle), and fivefold (pentagon) symmetry axes are indicated. (Scale bar, 10 nm.) (B–D) Capsid isosurface representations along an icosahedral twofold axis. The HSTV-1 capsid is displayed at σ = 0.9 (B) and σ = 3.0 (C and D) threshold. The empty HSTV-1 (D) capsid is shown sectioned with the front half of the capsid removed. The empty capsid was generated by manually removing most of the genome density in UCSF Chimera. The symmetry axes are indicated as in A. The models were colored using radial-depth cueing (bars, 210- to 360-Å radius) in UCSF Chimera (56).

Pietilä et al. PNAS Early Edition | 3of6 Downloaded by guest on September 29, 2021 hexamer and a G5 pentamer using the established HK97 nomen- clature (Fig. 3 B–D) (10). In total there are 415 copies of gp14 in the capsid, because one pentamer is replaced by a portal vertex to which the tail is attached (Fig. 1). There are three main types of contacts between capsomers: the basal domains interact via a strict threefold contact on the threefold axis of symmetry (Fig. 3), a quasi-threefold contact between two hexamers and a pentamer, and a quasi- threefold contact between hexamers connecting near the twofold axis of symmetry. Close inspection of the capsid revealed that the overall topol- ogy of the MCP seemed to be similar to that of HK97. It was possible to directly place the asymmetric unit from the atomic model of HK97 Head II (32) into the reconstruction (Fig. 4A and Movie S1), resulting in the superposition of the major features of each monomer (Fig. S2). We then averaged the monomers within one hexamer together to improve the definition of the monomer (Fig. 4B). The G subunit was noticeably too different to the A–F subunits to be used in the averaging. Independently we generated a homology model of gp14, identified as the MCP (Fig. 2), using the I-TASSER server (33, 34). The best model had an I-TASSER C-score of only −2.81 which is a low confidence value (33). All of the top five models were very similar to the HK97 fold. We took the top model and used rigid body fitting to place the model within the averaged electron density of one monomer. The su- perposition of the homology model (Fig. 4B, blue ribbon) within a single monomer was as good as that with the HK97 MCP (Fig. 4B, red ribbon), thus showing that the homology model is indeed validated by the experimental results, despite the low C-score. According to the structural alignment of HK97 gp5* and the proteolytically cleaved form of gp14, only 33 of 307 amino acids are identical. Gp14 has the two typical mixed α-β domains found in this protein fold, the axial (A) domain and the peripheral (P) domain, along with an extended N terminus and the E loop (Fig. 4 and Fig. S2). The A domains interact around the five- and quasi- sixfold axes within the capsomers, whereas the P domains meet at the three- and quasi-threefold axes (Fig. 4) between capsomers (Fig. 3). The N terminus of one subunit lies parallel to the E-loop of the adjacent subunit. Discussion Fig. 4. HSTV-1 gp14 has the HK97 fold. (Left) Viewed from the capsid in- Our study shows that the podovirus HSTV-1 infecting an ex- H. sinaiiensis terior; (Right) viewed from the capsid exterior. (A) Cryo-EM density of an tremely halophilic archaeon, (19), has the same ca- HSTV-1 asymmetric unit drawn at σ = 3.0 threshold (gray transparent). The nonical HK97-like fold of the MCP as bacterial head-tailed viruses atomic model of HK97 asymmetric unit (PDB ID 1OHG) (32) is shown in red and eukaryotic herpesviruses. Thus, we evolutionarily connect ribbon after rigid body fitting into the density in UCSF Chimera (56). (B) these viruses from all three domains of life. Hence we provide Averaged density of the six segmented subunits from one hexamer shown at structural and genomic evidence that the head-tailed viruses were σ = 3.5 threshold (gray transparent). HK97 chain E in red ribbon (PDB ID evolving before the archaea, bacteria, and eukaryotic domains of 1OHG) (32) and homology model of gp14 in blue ribbon (33, 34) after rigid fi life split approximately 2 to 3 billion years ago (2, 5, 7). body tting into the density in UCSF Chimera (56). (C) Red surface represents a hexamer built from six copies of the averaged subunit. Green wireframe Analysis of the HSTV-1 genome showed conservation with ar- shows the position of the pentameric subunit that has a significantly dif- chaeal and bacterial head-tailed viruses in four groups: genes ferent conformation than the hexameric units and thus was not included in encoding virion structural components (MCPs and portal pro- the averaging. teins), those involved in virion assembly (prohead proteases and terminase proteins), nucleotide metabolism, and DNA replication. Because the genome seems to be circularly permutated, and we One of the very few residues that is conserved is gp5* Arg294 identified both potential portal and terminase genes, HSTV-1 (gp14 Arg311). Arg311 from adjacent subunits form a ring around most likely uses a headful-DNA packaging mechanism. Further- both the quasi-sixfold axis of symmetry and the fivefold axis of more, because the HSTV-1 MCP is proteolytically processed (Fig. symmetry, which could coordinate anions (32). B C 2 and ), the head may undergo maturation whereby the We found that HSTV-1 was stable but reversibly noninfectious N-terminal domain of the MCP is proteolytically removed and at low salinity. Hence HSTV-1 requires high salinity for infection, the capsid undergoes expansion and stabilization as the DNA is most probably for effective adsorption on to the host, the extremely packaged, thus mimicking the HK97 assembly pathway (35). halophilic archaeon, H. sinaiiensis (19). We identified base plate However, one of the crucial steps in HK97 assembly is the for- fi mation of cross-links within the capsid through covalent isopep- tail bers, most likely built of gp29, that are probably involved in tide bonds between Lys169 and Asn356 on adjacent subunits (10, the initial stages of this interaction. Both haloarchaeal myo- and 36). Despite the similar overall conformation of the MCP of HK97 siphoviruses have previously been shown to have a similar de- and HSTV-1, these residues are not conserved when the HSTV-1 pendency on high salinity for infection (37). homology model and HK97 gp5 are aligned (Fig. 4B), and there is HSTV-1 has large, cone-shaped tower structures extending no evidence for such cross-linking to stabilize the HSTV-1 capsid. from the surface (Fig. 3). On the basis of its abundance in the

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1303047110 Pietilä et al. Downloaded by guest on September 29, 2021 virion, we suggest that gp12 is the main component of these Cryo-EM and Image Processing. Freshly purified virus samples were applied on towers. Most of the genome has a much higher G+C content Quantifoil R 2/2 grids, vitrified in liquid ethane as described previously (48), than gene 12, thus we argue that gene 12 joined the genome at an and transferred to a GATAN 626 cryo-holder for observation in an FEI evolutionarily recent time. The recent tower addition may have Tecnai F20 microscope operating at 200 keV at liquid nitrogen tempera- ture. Images from the virus particles in 20 mM Tris·HCl (pH 7.2) were col- helped the virus to adapt to new conditions, such as a new host lected under low-dose conditions on Kodak SO163 film at 62,000× strain, and may also help to stabilize the capsid. magnification, with a range of underfocii of 0.6–3.2 μm. Images from the In the homology modeling described here, the sequence simi- virus particles in 9% (wt/vol) SW or 20 mM Tris·Cl (pH 7.2) were collected larities were too low to give a reliable structural prediction without on a GATAN Ultrascan 4000 charge-coupled device camera at 68,000× the experimental verification that we provided. Viruses evolve magnification at 4.0- or 6.0-μm underfocus. rapidly, and thus the genome or protein sequence similarity may be The negatives were scanned using a PhotoScan TD scanner (Z/I Imaging) detected only between closely related viruses (38). Consequently, with 7-μm sampling and 12-bit grayscale readout. The digitized micrographs structure-based approaches offer an opportunity to reach over were processed, and viral particles were picked as previously described (49). The initial models for the viruses at approximately 30-Å resolution were longer evolutionary distances and to detect connections in structure generated using a random model computation method with a subset of 150 and assembly pathways even between viruses infecting organisms far from focus particles (50). These models were then used as starting maps from different domains of life (2, 4, 5). The final question remains for iterative full orientation searches and origin determinations of all of the how many structure-based viral lineages exist. It has been estimated particles using AUTO3DEM (51) including periodic recentering of the par- that despite the immense number of viruses, there might be only ticles using RobEM. Full contrast transfer function correction was performed a low number of unique viral coat protein folds owing to the limited in AUTO3DEM. The resolution was determined using a Fourier shell corre- protein fold space (4). However, the only way to test the hypothesis lation cutoff of 0.5 (31). A total of 7,115 particles from 164 micrographs were fi is to determine more coat protein folds and viral structures, espe- included in the nal reconstruction. The final reconstruction was sharpened using the program EM-Bfactor as cially from archaeal viruses with novel morphotypes. previously described (52, 53). The B-factor applied was 450.17. The handed- ness was determined using tilt experiments as described previously, with Materials and Methods HK97 proheads as a reference (54, 55). Visualization was done using Uni- Virus Production and Purification. H. sinaiiensis strain ATCC 33800 (39) was versity of California, San Francisco (UCSF) Chimera (56). The empty capsid was used as a host for HSTV-1 (19). Cultures were grown aerobically at 37 °C in generated erasing most of the genome density using the “Volume Eraser” modified growth media containing artificial salt water (SW) (40). A 30% (wt/vol) tool in UCSF Chimera, and the capsid volume was measured by generating an stock solution of SW (http://www.haloarchaea.com/resources/halohandbook/ “Icosahedron Surface” inside the empty capsid and measuring the volume of · Halohandbook_2008_v7.pdf) contains 240 g NaCl, 30 g MgCl2 6H2O, 35 g this surface using the “Measure Volume and Area” tool (56). BIOPHYSICS AND · · · MgSO4 7H2O, 7 g KCl, 5 mL of 1 M CaCl2 2H2O, and 80 mL of 1 M Tris HCl An asymmetric unit from the HK97 Head II atomic model [Protein Data Bank COMPUTATIONAL BIOLOGY (pH 7.2) per liter of water. Virus stocks were prepared as previously de- (PDB) ID 1OHG] (32) was fitted into the EM density using the “fit-in-map” tool scribed (40), except that semiconfluent top-layer agar plates were incubated of UCSF Chimera. One density corresponding to each of the hexamer subunits for 5 d. was then written out using the “Zone” tool to a radius of 6 Å around the fitted For viral infection studies, cells of middle and late exponential cultures of HK97 molecule. The resulting densities were overlaid using the “Fit in map” H. sinaiiensis were infected using a multiplicity of infection of 15, incubated tool and combined into an averaged density using the “Vop” command. UCSF

30 min at 37 °C, and then grown aerobically at 37 °C. Turbidity (A550) was Chimera’s “volume eraser” tool was used to delete outlier density (56). Finally followed, and after cell lysis, the number of viruses in the growth medium arefined model of the asymmetric unit was constructed by replacing each was determined using a plaque assay. hexameric subunit by the averaged density. Independently the structure of HSTV-1 particles were precipitated from the virus stocks using 10% (wt/vol) HSTV-1 gp14 was predicted using the homology modeling tool I-TASSER (33, polyethylene glycol 6000 and purified as previously described for Halorubrum 34). The top 10 template structures that were used by the program were e15 sodomense tailed virus 2 (37). The density of HSTV-1 in CsCl was determined major capsid protein GP7 (PDB ID 3C5B) (57), bacteriophage HK97 Head II (PDB as previously reported (37). To determine the virus stability at low salinity, ID 1OHG, PDB ID 1FH6) (10, 32), bacteriophage HK97 K169Y Head I (PDB ID the virus pellets were treated as previously described (37). 2FS3) (58), and bacteriophage HK97 Prohead I (PDB ID 3P8Q) (59). The best I-TASSER model was fitted in to the average hexamer density using the “Fit-in- ” Genome Sequencing and Protein Analysis. Purified viral genomic DNA was map tool of UCSF Chimera (56). sequenced by the Pittsburgh Bacteriophage Institute at the University of The reconstruction has been deposited in the Electron Microscopy Data Pittsburgh’s Genomics and Proteomics Core Laboratories to a depth of 40-fold Bank, accession code EMD-2279. coverage using 454 technologies. Raw reads were assembled using GS De Novo Assembler (Roche version 1.1); assemblies were then quality controlled ACKNOWLEDGMENTS. We thank Sari Korhonen and Eevakaisa Vesanen for using Consed, version 16 (41). Finished sequences were analyzed and anno- excellent technical assistance; Gunilla Rönnholm for protein sequencing; and the Biocenter Finland National Cryo Electron Microscopy Unit and Protein tated in genome editors, including DNAMaster (http://cobamide2.bio.pitt. Chemistry Facility, Institute of Biotechnology, Helsinki University, and the edu), Glimmer (42), GeneMark (43), tRNA ScanSE (44), and Aragorn (45), to CSC-IT Center for Science Ltd. for providing services. The University of Helsinki identify genome features. The nucleotide sequence has been deposited in provided support to the European Union Strategy Forum on Research Infra- GenBank under accession number KC117378. structures Instruct Centre for Virus Production and Purification used in this Protein concentrations were determined using the Coomassie blue method study. Graham Hatfull encouraged this collaboration and supported this work (46) and bovine serum albumin as a standard. The purified virions were through his Howard Hughes Medical Institute Professorship and National Institutes of Health Grant GM51975 (to R.W.H. and Graham Hatfull). This analyzed by modified tricine-SDS/PAGE with 4% (wt/vol) and 14% (wt/vol) research was also supported by Academy Professor (Academy of Finland) acrylamide concentration in the stacking and separation gels, respectively Funding Grants 255342 and 256518 (to D.H.B.); Academy of Finland Grants (47). Protein N-terminal sequences were determined as previously described 129684 (Centre of Excellence in Virus Research 2006–2011, to S.J.B. and (40) in the Protein Chemistry Core Facility of the Institute of Biotechnology, D.H.B.) and 139178 (to S.J.B.); and National Institutes of Health Grant University of Helsinki. R01GM47795 (to R.W.H.).

1. Suttle CA (2007) Marine viruses—major players in the global ecosystem. Nat Rev 6. Krupovic M, Bamford DH (2010) Order to the viral universe. J Virol 84(24): Microbiol 5(10):801–812. 12476–12479. 2. Bamford DH (2003) Do viruses form lineages across different domains of life? Res 7. Benson SD, Bamford JK, Bamford DH, Burnett RM (1999) Viral evolution revealed by Microbiol 154(4):231–236. bacteriophage PRD1 and human adenovirus coat protein structures. Cell 98(6): 3. Yang X, Yang H, Zhou G, Zhao GP (2008) Infectious disease in the genomic era. Annu 825–833. Rev Genomics Hum Genet 9:21–48. 8. Khayat R, et al. (2005) Structure of an archaeal virus capsid protein reveals a common 4. Abrescia NGA, Bamford DH, Grimes JM, Stuart DI (2012) Structure unifies the viral ancestry to eukaryotic and bacterial viruses. Proc Natl Acad Sci USA 102(52): universe. Annu Rev Biochem 81:795–822. 18944–18949. 5. Benson SD, Bamford JK, Bamford DH, Burnett RM (2004) Does common archi- 9. Rice G, et al. (2004) The structure of a thermophilic archaeal virus shows a double- tecture reveal a viral lineage spanning all three domains of life? Mol Cell 16(5): stranded DNA viral capsid type that spans all domains of life. Proc Natl Acad Sci USA 673–685. 101(20):7716–7720.

Pietilä et al. PNAS Early Edition | 5of6 Downloaded by guest on September 29, 2021 10. Wikoff WR, et al. (2000) Topologically linked protein rings in the bacteriophage HK97 35. Hendrix RW, Johnson JE (2012) Bacteriophage HK97 capsid assembly and maturation. capsid. Science 289(5487):2129–2133. Adv Exp Med Biol 726:351–363. 11. Jiang W, et al. (2003) Coat protein fold and maturation transition of bacteriophage 36. Duda RL, et al. (1995) Structural transitions during bacteriophage HK97 head as- P22 seen at subnanometer resolutions. Nat Struct Biol 10(2):131–135. sembly. J Mol Biol 247(4):618–635. 12. Fokine A, et al. (2005) Structural and functional similarities between the capsid pro- 37. Pietilä MK, et al. (2013) Insights into head-tailed viruses infecting extremely halophilic teins of bacteriophages T4 and HK97 point to a common ancestry. Proc Natl Acad Sci archaea. J Virol 87(6):3248–3260. USA 102(20):7163–7168. 38. Duffy S, Shackelton LA, Holmes EC (2008) Rates of evolutionary change in viruses: 13. Jiang W, et al. (2006) Structure of epsilon15 bacteriophage reveals genome organi- Patterns and determinants. Nat Rev Genet 9(4):267–276. zation and DNA packaging/injection apparatus. Nature 439(7076):612–616. 39. Javor B, Requadt C, Stoeckenius W (1982) Box-shaped halophilic bacteria. J Bacteriol 14. Baker ML, Jiang W, Rixon FJ, Chiu W (2005) Common ancestry of herpesviruses and 151(3):1532–1542. tailed DNA bacteriophages. J Virol 79(23):14967–14970. 40. Pietilä MK, Roine E, Paulin L, Kalkkinen N, Bamford DH (2009) An ssDNA virus in- 15. Morais MC, et al. (2005) Conservation of the capsid structure in tailed dsDNA bac- fecting archaea: A new lineage of viruses with a membrane envelope. Mol Microbiol – teriophages: The pseudoatomic structure of phi29. Mol Cell 18(2):149 159. 72(2):307–319. 16. Effantin G, Boulanger P, Neumann E, Letellier L, Conway JF (2006) Bacteriophage T5 41. Gordon D, Abajian C, Green P (1998) Consed: A Graphical Tool for Sequence Finishing. structure reveals similarities with HK97 and T4 suggesting evolutionary relationships. Genome Research 8:195–202. – J Mol Biol 361(5):993 1002. 42. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene 17. Agirrezabala X, et al. (2007) Quasi-atomic model of bacteriophage t7 procapsid shell: identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641. – Insights into the structure and evolution of a basic fold. Structure 15(4):461 472. 43. Borodovsky M, McIninch J (1993) Recognition of genes in DNA sequence with am- 18. Lander GC, et al. (2008) Bacteriophage lambda stabilization by auxiliary protein gpD: biguities. Biosystems 30(1-3):161–171. Timing, location, and mechanism of attachment determined by cryo-EM. Structure 44. Lowe TM, Eddy SR (1997) tRNAscan-SE: A program for improved detection of transfer – 16(9):1399 1406. RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964. 19. Atanasova NS, Roine E, Oren A, Bamford DH, Oksanen HM (2012) Global network of 45. Laslett D, Canback B (2004) ARAGORN, a program to detect tRNA genes and tmRNA specific virus-host interactions in hypersaline environments. Environ Microbiol 14(2): genes in nucleotide sequences. Nucleic Acids Res 32(1):11–16. – 426 440. 46. Bradford MM (1976) A rapid and sensitive method for the quantitation of microgram 20. Pina M, Bize A, Forterre P, Prangishvili D (2011) The archeoviruses. FEMS Microbiol quantities of protein utilizing the principle of protein-dye binding. Anal Biochem Rev 35(6):1035–1054. 72:248–254. 21. Krupovic M, Forterre P, Bamford DH (2010) Comparative analysis of the mosaic ge- 47. Schägger H, von Jagow G (1987) Tricine-sodium dodecyl sulfate-polyacrylamide gel nomes of tailed archaeal viruses and proviruses suggests common themes for virion electrophoresis for the separation of proteins in the range from 1 to 100 kDa. Anal architecture and assembly with tailed viruses of bacteria. J Mol Biol 397(1):144–160. Biochem 166(2):368–379. 22. Prangishvili D, Garrett RA, Koonin EV (2006) Evolutionary genomics of archaeal vi- 48. Adrian M, Dubochet J, Lepault J, McDowall AW (1984) Cryo-electron microscopy of ruses: Unique viral genomes in the third domain of life. Virus Res 117(1):52–67. viruses. Nature 308(5954):32–36. 23. Krupovic M, Spang A, Gribaldo S, Forterre P, Schleper C (2011) A thaumarchaeal 49. Seitsonen J, et al. (2010) Interaction of alphaVbeta3 and alphaVbeta6 integrins with provirus testifies for an ancient association of tailed viruses with archaea. Biochem human parechovirus 1. J Virol 84(17):8509–8519. Soc Trans 39(1):82–88. 50. Yan X, Dryden KA, Tang J, Baker TS (2007) Ab initio random model method facilitates 24. Akita F, et al. (2007) The crystal structure of a virus-like particle from the hyper- 3D reconstruction of icosahedral particles. J Struct Biol 157(1):211–225. thermophilic archaeon Pyrococcus furiosus provides insight into the evolution of vi- 51. Yan X, Sinkovits RS, Baker TS (2007) AUTO3DEM—an automated and high through- ruses. J Mol Biol 368(5):1469–1483. put program for image reconstruction of icosahedral particles. J Struct Biol 157(1): 25. Heinemann J, et al. (2011) Fossil record of an archaeal HK97-like provirus. Virology – 417(2):362–368. 73 82. 26. Krupovic M, White MF, Forterre P, Prangishvili D (2012) Postcards from the edge: 52. Fernández JJ (2008) High performance computing in structural determination by – Structural genomics of archaeal viruses. Adv Virus Res 82:33–62. electron cryomicroscopy. J Struct Biol 164(1):1 6. 27. Ackermann HW (1998) Tailed bacteriophages: the order caudovirales. Adv Virus Res 53. Rosenthal PB, Henderson R (2003) Optimal determination of particle orientation, 51:135–201. absolute hand, and contrast loss in single-particle electron cryomicroscopy. J Mol Biol – 28. Prangishvili D, Forterre P, Garrett RA (2006) Viruses of the Archaea: A unifying view. 333(4):721 745. Nat Rev Microbiol 4(11):837–848. 54. Cheng N, et al. (2002) Handedness of the herpes simplex virus capsid and procapsid. – 29. Krupovic M, Prangishvili D, Hendrix RW, Bamford DH (2011) Genomics of bacterial J Virol 76(15):7855 7859. and archaeal viruses: Dynamics within the prokaryotic virosphere. Microbiol Mol Biol 55. Huiskonen JT, Kivelä HM, Bamford DH, Butcher SJ (2004) The PM2 virion has a novel Rev 75(4):610–635. organization with an internal membrane and pentameric receptor binding spikes. 30. Jäälinoja HT, et al. (2008) Structure and host-cell interaction of SH1, a membrane- Nat Struct Mol Biol 11(9):850–856. containing, halophilic euryarchaeal virus. Proc Natl Acad Sci USA 105(23):8008–8013. 56. Pettersen EF, et al. (2004) UCSF Chimera—a visualization system for exploratory re- 31. van Heel M, Schatz M (2005) Fourier shell correlation threshold criteria. J Struct Biol search and analysis. J Comput Chem 25(13):1605–1612. 151(3):250–262. 57. Jiang W, et al. (2008) Backbone structure of the infectious epsilon15 virus capsid re- 32. Helgstrand C, et al. (2003) The refined structure of a protein catenane: the HK97 vealed by electron cryomicroscopy. Nature 451(7182):1130–1134. bacteriophage capsid at 3.44 A resolution. J Mol Biol 334(5):885–899. 58. Gan L, et al. (2006) Capsid conformational sampling in HK97 maturation visualized by 33. Roy A, Kucukural A, Zhang Y (2010) I-TASSER: A unified platform for automated X-ray crystallography and cryo-EM. Structure 14(11):1655–1665. protein structure and function prediction. Nat Protoc 5(4):725–738. 59. Huang RK, et al. (2011) The Prohead-I structure of bacteriophage HK97: Implications 34. Zhang Y (2007) Template-based modeling and free modeling by I-TASSER in CASP7. for scaffold-mediated control of particle assembly and maturation. J Mol Biol 408(3): Proteins 69(Suppl 8):108–117. 541–554.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1303047110 Pietilä et al. Downloaded by guest on September 29, 2021