CORE Metadata, citation and similar papers at core.ac.uk

Provided by Elsevier - Publisher Connector

Available online at www.sciencedirect.com

Virology 375 (2008) 292–300 www.elsevier.com/locate/yviro

Archaeal proviruses TKV4 and MVV extend the PRD1-adenovirus lineage to the phylum Euryarchaeota ⁎ Mart Krupovič, Dennis H. Bamford

Department of Biological and Environmental Sciences, Biocenter 2, P.O. Box 56 (Viikinkaari 5), FIN-00014 University of Helsinki, Finland Institute of Biotechnology, Biocenter 2, P.O. Box 56 (Viikinkaari 5), FIN-00014 University of Helsinki, Finland Received 17 December 2007; returned to author for revision 28 January 2008; accepted 30 January 2008 Available online 4 March 2008

Abstract

The viral lineage hypothesis predicting a common origin for that infect hosts residing in different domains of life gains more support as data on viral structures accumulates. One such lineage is the PRD1-adenovirus lineage, which unites icosahedral dsDNAviruses with large facets and a double β-barrel trimer coat . This lineage is represented by a number of viruses infecting and eukaryotes. However, only one member of the lineage, Sulfolobus turreted icosahedral , infecting a crenarchaeal host, has been identified in the domain Archaea. In this study we characterize the genomic sequences of two archaeal proviruses, TKV4 and MVV,integrated into the 5′- and 3′-distal regions of tRNA genes of the euryarchaeal species Thermococcus kodakaraensis KOD1 and Methanococcus voltae A3, respectively. Bioinformatic approaches allowed placement of TKV4 and MVV into the PRD1-adenovirus lineage, thus extending the lineage to the second archaeal phylum, Euryarchaeota. © 2008 Elsevier Inc. All rights reserved.

Keywords: Virus evolution; Archaeal viruses; Comparative genomics; PRD1-adenovirus lineage

Introduction the genes in the viruses' genomes. However, with this caveat, we have found it useful to speak of lineages of viruses in terms of the Relationships between viruses that may have a common lineages of the genes specifying virion structure. The virion- origin but diverged early in evolution cannot easily be traced by specifying genes constitute the sole part of the virus' genome standard sequence analysis techniques, which are usually only that uniquely identifies it as a virus rather than some other sort of effective for comparison of closely related entities. However, genetic element such as a plasmid. We take this part of the virus recent advances in structural analysis of viruses led to a real- as its most characteristic feature and call it the viral “self”.Itis ization that viruses considered to be unrelated, infecting thus likely that such “self” determinants are inherited vertically evolutionary distant hosts, can be unexpectedly similar (Bam- (Bamford, 2003). On the other hand, viral genes encoding ford et al., 2005a). The similarity extends from general prin- involved in the interaction of a virus with its host are ciples of the virion architecture to the topology of the major likely to be transferred between unrelated viruses via lateral gene proteins (MCPs). It is clear that over evolutionary time flow and, consequently, belong to the second group of viral viruses have exchanged genes and groups of genes horizontally, determinants, the “nonself” category. This category includes with the result that individual viruses are mosaics of genes with such determinants as replication, transcription regulation, different evolutionary histories. This means that it is not possible recombination, lysis, and viral entry functions (Bamford et al., to describe a lineage for viruses that applies to all or even most of 2005a; Krupovič and Bamford, 2007). Viruses sharing related “self” elements can be grouped into lineages (Bamford, 2003; Bamford et al., 2005a; Benson et al., 2004). One such viral lineage is the PRD1-adenovirus lineage, ⁎ Corresponding author. Viikki Biocenter, P.O. Box 56 (Viikinkaari 5), FIN- 00014, University of Helsinki, Finland. Fax: +358 9 191 59098. which unites icosahedral tailless viruses with dsDNA genomes. E-mail addresses: [email protected] (M. Krupovič), The core structural features of these viruses include trimeric [email protected] (D.H. Bamford). MCPs (whose subunits have a double β-barrel fold) arrayed as

0042-6822/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.virol.2008.01.043 M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300 293 triangular plates to form the icosahedral virus coat (Bamford et al., PRD1-adenovirus lineage. Intuitively, these “self” genes are 2005a). Under the icosahedral protein shell, members of this the best seed candidates when trying to find new viruses of the lineage usually possess an internal membrane, the only exception same lineage using bioinformatic tools. This approach was known being adenovirus, which lacks lipids among its structural successfully applied to identify thirteen putative components. In addition to the common MCP, members of the (viral genomes integrated into bacterial chromosome) closely PRD1-adenovirus lineage share a characteristic ATPase, exem- related to the virulent icosahedral membrane-containing bac- plified by the packaging ATPase P9 of PRD1 teriophage PM2 (Krupovič and Bamford, 2007). Here we used (Strömsten et al., 2005). The MCP and the ATPase seem to be the the same strategy to extend the archaeal branch of the PRD1- only two genes that are shared throughout the lineage. adenovirus lineage. The PRD1-adenovirus lineage spans all three domains of life The MCP of Sulfolobus turreted icosahedral virus (STIV) (Benson et al., 1999, 2004; Khayat et al., 2005; Nandhagopal which infects the acidophilic hyperthermophilic crenarchaeon et al., 2002). In the domain Bacteria, the lineage is represented Sulfolobus solfataricus (Rice et al., 2004) was used as a query by the phages PRD1 and PM2, which infect enterobacteria and in the PSI-BLAST search (Altschul et al., 1997). The only marine , respectively (Benson et al., 1999; sequence that returned with a considerable E-value (1.5e-2) was Abrescia et al., submitted), and a group of Bam35-like phages, the hypothetical protein TK1353 (accession no. YP_183766) infecting Gram-positive Bacillus species (Benson et al., 2004; from T. kodakaraensis KOD1. After the second iteration a Laurinmaki et al., 2005). The eukaryotic branch of the lineage hypothetical protein (accession no. EDP40293) from M. voltae contains adenoviruses, phycodnaviruses, iridoviruses, and pos- A3 was identified (E=9e-17). sibly and poxviruses (Benson et al., 1999, 2004; We next verified the adjacent genomic regions of Hyun et al., 2007; Nandhagopal et al., 2002; Yan et al., 2005). T. kodakaraensis KOD1 and M. voltae A3 for the presence of The third domain of life, Archaea, is divided into two major the second “self” determinant, the putative packaging ATPase- phyla, Euryarchaeota and Crenarchaeota (Woese et al., 1990). coding gene. Indeed, in both cases open reading frames (ORFs) Only two archaeal spherical viruses, Sulfolobus turreted encoding putative ATPases were detected. Detailed analysis of icosahedral virus (STIV) and SH1, have been characterized the two putative viral regions is presented below. (Bamford et al., 2005b; Maaty et al., 2006; Porter et al., 2005; Rice et al., 2004). STIV infects a crenarchaeal host, while SH1 is Integration sites of putative proviruses TKV4 and MVV a halovirus infecting the euryarchaeon Haloarcula hispanica. Both virions possess an internal membrane covered by an Archaeal integrases can be divided into two types based on the icosahedral capsid (Bamford et al., 2005b; Maaty et al., 2006) integration mechanisms (She et al., 2004). A characteristic of the and therefore share general features common to the majority of first type (SSV1-type) integrases is that, upon integration of the the PRD1-adenovirus lineage members. While the high-resolu- circular genetic element (virus genome or plasmid) into the host tion structure of the MCP of STIV confirmed that it is a genuine chromosome, the integrase gene is partitioned into longer Int (C) member of the lineage (Khayat et al., 2005), recent structural and shorter Int (N) fragments. The second type is comprised of analysis of the SH1 virion by electron microscopy revealed that it enzymes that maintain an intact integrase-coding gene after has a novel capsid arrangement, distinct from the canonical one integration. In both cases, the att sites are usually highly similar seen in members of the PRD1-adenovirus lineage (Jäälinoja or identical to the 3′-distal regions of a tRNA gene, which is et al., submitted for publication). Therefore, the question of therefore used as the integration site (Reiter et al., 1989). For type whether the PRD1-adenovirus lineage can be extended to the I integrases the att site is located inside the viral integrase gene, phylum Euryarchaeota in the domain Archaea remains open. while those for type II are adjacent to the integrase gene (She Various parasitic elements, such as viruses and transposable et al., 2004). Upon integration of the circular genetic element, the elements, usually constitute a considerable fraction of cellular att sequence flanks the inserted element as a direct repeat. genomes. Therefore, genome analysis of a cellular organism T. kodakaraensis KOD1 is a hyperthermophilic euryarch- provides information not only on the organism itself, but might aeon isolated from a solfatara (102 °C, pH 5.8) in Japan also reveal interesting evolutionary connections between (Morikawa et al., 1994). Its genome has been sequenced and parasitic elements it harbors. This proved to be true when a four regions, TKV1–TKV4, which are flanked with fragments number of complete bacterial genome sequences were analyzed of an integrase gene have been predicted to be of viral origin through the eyes of virologists (Casjens, 2003; Hendrix et al., (Fukui et al., 2005). Attachment (att) sites of TKV1–TKV3 1999; Krupovič and Bamford, 2007). In this study we identified were analyzed and suggested to be related to those previously and analyzed two euryarchaeal proviruses residing in Thermo- identified for Sulfolobus shibatae virus SSV1 and pRN family coccus kodakaraensis KOD1 and Methanococcus voltae A3 plasmids from Sulfolobus. Interestingly, the att site for TKV4 and show that they are most likely related to membrane- was predicted to be distinct from those for TKV1–TKV3 or containing viruses belonging to the PRD1-adenovirus lineage. related elements in other archaea, suggesting a genetic deviation of TKV4 from the other elements (Fukui et al., 2005). Unfor- Results and discussion tunately, detailed analysis of those regions has not been presented due to lack of homology to known genes. The puta- The double β-barrel major capsid protein (MCP) and the tive MCP and ATPase genes identified in T. kodakaraensis packaging ATPase are invariably present in all members of the KOD1 genome turned out to be located inside the TKV4 region. 294 M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300

We carried out a thorough analysis of the att sites of the TKV4 element. As reported originally by Fukui et al. (2005) the Int (N) fragment overlaps the tRNALeu gene. However, it was not possible to identify the second att site inside the region coding for the Int (C). Therefore, we scanned a region upstream of the annotated Int (C) fragment for the presence of a sequence similar to the Int (N)/tRNALeu overlapping region. Indeed, 132 bp upstream of the predicted start site for Int (C) we identified an alternative in-frame start point (position 1182705 in the T. kodakaraensis KOD1 genome, accession no. NC_006624). A 42 bp sequence in the newly predicted 5′-distal region of the Int (C) was identical to the Int (N)/tRNALeu overlapping region (Fig. 1). Interestingly, during the TKV4 integration, the site- specific recombination seems to have occurred between the viral genome and the 5′-distal region of the tRNA gene, not the 3′- distal region as in all other reported cases. Fig. 1 reconstitutes a plausible scheme for TKV4 (ancestor virus) integration into the genome of T. kodakaraensis KOD1. M. voltae A3 is a mesophilic methanogenic euryarchaeon isolated from a salt marsh in the United States (Whitman et al., 1986). The genetic content of the regions surrounding the two “self” genes, encoding the putative MCP and ATPase, was analyzed for the presence of other viral sequences and possible virus integration sites. A putative integrase-coding ORF was identified 7781 nucleotides apart from the MCP gene. In the PSI-BLAST search the predicted gene product gave the best hit (E=1e-67) to the integrase (NP_247341) of the previously described provirus proMJF1 of Methanococcus jannaschii DSM 2661 (Makino et al., 1999). When PSI-BLAST was Fig. 1. A plausible route for integration of the TKV4 virus into the genome of T. kodakaraensis KOD1. Predicted genes are depicted as arrows, indicating the executed against viral sequences at NCBI, the best match direction of transcription. Black shading indicates either conserved ORFs or ORFs (E=2e-9) was to the XerD-like integrase (YP_001467975) of with a hypothetical function; gray shading designates ORFs encoding putative the Thermus phage P74-26. The putative integrase gene of membrane-associated proteins; and white shading indicates ORFs with no Ser similarity and no assigned function. The identified attachment site (att)isshown M. voltae A3 was adjacent to the tRNA gene. Interestingly, Leu proMJF1 was also proposed to use tRNASer gene for the and the anticodon of the tRNA is underlined. GLDC, glycine decarboxylase gene; NA-binding, nucleic acid-binding; MCP, major capsid protein; Int. N- and integration (Makino et al., 1999). Knowing that att sites are C-frag., N- and C-terminal fragments of the integrase, respectively. often located within tRNA genes and that the att sequences flank the integrated provirus, we verified whether sequences similar to the 5′-or3′-distal region of the tRNASer can be found VLPs (Wood et al., 1989). Downstream of the third att site we elsewhere in the M. voltae A3 genome. Surprisingly, a 54 identified a 3712 bp region which was 90% identical (at the nucleotide sequence identical to the 3′-distal region of the nucleotide level) to a genomic region of A3 VLP. The fragment tRNASer gene was identified three times within a 37,948 bp encompassed genes for the replicase and the integrase and is region, dividing it into two subregions of 12,039 bp and likely a result of a genomic duplication event (Fig. 2B). On the 25,909 bp in length (Fig. 2A). The putative MCP and ATPase basis of hybridization experiments the authors reported that genes in the M. voltae A3 genome were located inside a shorter only 80% of the VLP-specific DNA could be eliminated in a region, which we refer to as MVV. curing experiment (Wood et al., 1989). However, the presence M. voltae A3 has previously been shown to spontaneously of the duplicated region, which contains the same restriction produce lemon-shaped virus-like particles (VLPs). Although fragment identified by the authors, suggests that A3 VLP is an the exact nucleotide sequence of the circular ∼23 kb-long active virus, capable of leaving the chromosome of the host and dsDNA of A3 VLP has not been determined, the restriction map forming virions. was constructed (Wood et al., 1989). In addition, hybridization The presence of three identical att sites is puzzling. A experiments revealed that A3 VLP DNA is integrated in the probable scenario for their presence assumes that MVV and A3 chromosome of M. voltae A3 in a site-specific manner. VLP viruses utilized the same tRNASer for integration. A3 VLP Interestingly, the second region of M. voltae A3, adjacent to integrated into the right att site (att 2inFig. 2B) of the MVV and flanked with the putative att sites, shared nearly preintegrated MVV genome. This possibly led to the inactiva- identical restriction pattern and was of a comparable size to the tion of the MVV provirus. This is supported by the observation A3 VLP DNA (Fig. 2B). This suggests that this region encodes that the subjection of M. voltae A3 cultures to various stress determinants needed to produce the previously observed A3 factors (UV irradiation, temperature shift, hydrogen starvation, M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300 295

Fig. 2. Genomic organization of the putative proviruses MVV and A3 VLP. (A) Schematic representation of the genomic region of Methanococcus voltae A3 containing proviruses MVV and A3 VLP. Three identical att sites are indicated as black rectangles. The att sequence is shown and the anticodon of the tRNASer is underlined. Numbers under the arrows denote the length of MVVand A3 VLP proviruses. (B) Predicted genes are depicted and colored as in Fig. 1. In silico predicted restriction map of A3 VLP provirus is shown. EcoRI recognition site which is missing in the restriction map constructed by Wood et al. (1989) is underlined, while the one not identified in our analysis is shown in gray. mitomycin C or acridine orange induction) did not result in the of Sulfolobus turreted icosahedral virus (STIV) which infects production of any detectable viruses (Wood et al., 1989). the acidophilic hyperthermophilic crenarchaeon S. solfataricus (Rice et al., 2004). To obtain further support for this observation Analysis of the TKV4 and MVV sequences we submitted the putative MCP sequences of TKV4 and MVV to the Structure Prediction MetaServer (Ginalski et al., 2003). TKV4 and MVV span 18,818 bp and 12,039 bp of the The structure of STIV MCP (Khayat et al., 2005) was the only T. kodakaraensis KOD1 (genome accession no. NC_006624) template obtained for structural modeling with a significant and M. voltae A3 (WGS fragment accession no. ABHB01000001) score (73.22 for TKV4 and 114.38 for MVV; scores above 50 genomes, respectively. TKV4 contains 33 and MVV — 20 open are assumed to be significant and correspond to a prediction reading frames (ORFs) encoding peptides of at least 50 amino accuracy of above 90% [Ginalski et al., 2003]). The sequences acids-long. Only a few of the predicted gene products showed of TKV4 and MVV align very well with the MCP of STIV — significant sequence similarity to known proteins. This is a blocks of identical and similar residues are distributed evenly common characteristic of archaeal viruses (Prangishvili et al., throughout the alignment (Fig. 3A). The pI values for TKV4, 2006). Nevertheless, PSI-BLASTanalysis and functional domain MVV and STIV proteins are also very similar, 4.96, 5.22 and prediction, in addition to the integrase (see above), allowed 4.89, respectively. We also modeled the structure of the identification of five and four putative genes in TKV4 and MVV, corresponding TKV4 protein using the MCP of STIV as a respectively, for which viral counterparts were present in public template (Fig. 3B). The model generated was nearly identical to databases. Among the six predicted protein products of TKV4, the structure of the MCP of STIV (compare Fig. 3B and C). The four had counterparts in viruses infecting Gram-positive bacteria, few insertions in the STIV structure with respect to that of while two genes were specific to archaeal viruses only (integrase TKV4 were mainly in the loop regions (Fig. 3A and C). The and a homolog [gpTK1349, YP_183762; E=5.2e-8] of the gp89 model was further verified, along with X-ray structure of STIV from Pyrococcus abyssi virus 1. In addition, one putative gene MCP, using the MolProbity program (Lovell et al., 2003). The product (gpTK1373, YP_183786) was found to share similarity Ramachandran plot statistics for the two structures were very (E=1e-11) with a hypothetical protein (YP_001325418) form similar. Of all STIV MCP residues, 95.6% were in favoured Methanococcus aenolyticus Nankai-3. positions, while 100% were in allowed positions. 95.1% and 99.4% of all TKV4 gpTK1353 residues were in favoured and Major capsid proteins allowed positions, respectively. The good stereochemical qual- ity of the model indicates that gpTK1353 has a potential to As mentioned above, putative MCPs of TKV4 (YP_183766) adopt the same fold as the MCP of STIV, i.e. the double β-barrel and MVV (EDP40293) share sequence similarity with the MCP fold, and further suggests that the two proteins are functionally 296 M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300

Fig. 3. Comparison of the major capsid protein of STIV with those of TKV4 and MVV. (A) A sequence alignment of the corresponding proteins (STIV, GenBank accession number YP_025022; TKV4, YP_183766; MVV, EDP40293) with identical residues boxed in black and similar residues in gray. The secondary structure determined from the X-ray structure of STIV B345 (PDB code 2bbd) is shown above the alignment with α helices, β strands, and turns represented by red rectangles, blue arrows, and yellow bulges, respectively. (B) Three-dimensional model of the putative major capsid protein of provirus TKV4. The model is based on the X-ray structure of STIV B345, using the alignment shown in panel (A) as a guide. The modeled structure is colored according to the sequence similarity (blosum30 matrix) with the corresponding protein of STIV. (C) The STIV B345 X-ray model showing the insertions with respect to the TKV4 sequence (white). equivalent. Therefore, it is reasonable to assume that gpTK1353 same morphology. The STIV virion consists of an icosahedrally and consequently the corresponding sequence of MVV rep- organized proteinaceous capsid surrounding a lipid membrane, resent the MCP of TKV4 and MVV, respectively. which encloses the dsDNA genome (Maaty et al., 2006; Rice The MCP, the main building block of the virion, bears the et al., 2004). Such virion characteristics as well as the fold of the information on the overall architecture of the virus. Therefore, it MCP are common in the members of the PRD1-adenovirus is conceivable that TKV4, MVV and STIV virions have the lineage (Bamford et al., 2005a). M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300 297

Interestingly, among the low statistical value (EN1) results responsible for the formation of a functional virion are common of the second PSI-BLAST iteration (MCP of STIVas a seed) we to all members of the lineage and are inherited from the common were able to identify the MCP of Bacillus bacteriophage GIL16, ancestor (discussed in Bamford (2003), Bamford et al. (2005a), a close relative of phage Bam35 (Verheust et al., 2005). It and Krupovič and Bamford (2007)). should be noted that with the STIV MCP sequence alone, the Indeed, when gpTK1356 of TKV4 was used as a query, the walk from archaeal viruses belonging to the PRD1-adenovirus PSI-BLAST search returned a list of ATPases including those of lineage to the bacterial viruses in the same lineage is not viral origin. When PSI-BLAST was executed against viral possible. This means that the addition of the gpTK1353 sequences at NCBI, the best match (E=7e-4) was to the putative sequence into the PSI-BLAST iteration led to the construction packaging ATPase of the membrane-containing bacteriophage of a new profile which was sensitive enough to pick up the Bam35, infecting Gram-positive bacteria (Strömsten et al., distantly related MCP sequence of GIL16. As expected, manual 2005). Similarly, upstream of the putative MCP gene of MVV addition of the GIL16 MCP sequence to the next iteration we identified an ORF coding for a putative ATPase (accession resulted in the pulling down of all members of the Tectiviridae no. EDP40294). The best viral match was to the putative family infecting Gram-positive hosts (Bam35, GIL01, packaging ATPase of PM2 (NP_049900, E=1e-5). pBClin15), but not the ones infecting Gram-negative bacteria. Alignment of the putative packaging ATPases of TKV4 and This further confirms that TKV4 is an authentic member of the MVV with corresponding sequences from previously estab- PRD1-adenovirus lineage. Moreover, this suggests that TKV4 lished members of the PRD1-adenovirus lineage is shown in might be a missing link in the successive walk from bacterial to Fig. 4A. In addition to the conserved Walker A and B motifs, archaeal members of this viral lineage. putative packaging ATPases of membrane-containing viruses, infecting hosts from all three domains of life, contain a third Packaging ATPases motif — the P9/A32-specific motif (Strömsten et al., 2005). This motif is also conserved in the putative packaging ATPases In addition to the common MCP, members of the PRD1- of TKV4 and MVV (Fig. 4A), suggesting that TKV4 and MVV adenovirus lineage share a characteristic ATPase (Strömsten originated from viruses with a membrane. This is in agreement et al., 2005). The MCP and the ATPase seem to be the only two with the finding that 39% (13/33) and 30% (6/20) of putative genes that are shared throughout the lineage and are referred to as TKV4 and MVV proteins, respectively, were predicted to be the viral “self”. The viral “self” hypothesis predicts that features associated with a membrane.

Fig. 4. (A) Comparison of the putative ATPases from proviruses TKV4 and MVV with corresponding sequences from membrane-containing dsDNA viruses of bacterial or archaeal origin. For simplicity only the Walkers A and B, and the P9/A32-specific motifs (Strömsten et al., 2005) are shown (boxed). Numbers between the boxes indicate the spacing between the motifs. Identical residues are boxed in black and similar residues are in gray. GenBank accession numbers of the aligned sequences: TKV4 (YP_183769), MVV (EDP40294), Bam35 (NP_943760), pBClin15 (NP_829897), PRD1 (AAX45927), PM2 (NP_049900), STIV (AAS89100). (B) Alignment of the three conserved motifs (I–III) of superfamily I rolling circle replication proteins (Ilyina and Koonin, 1992) with corresponding motifs from the putative replication protein of the MVV provirus. GenBank accession numbers of the aligned sequences: provirus MVV (EDP40280), Pseudoalteromonas corticovirus PM2 (AAD43543), Escherichia coli microviruses ϕX174 (NP_040703), NC29 (YP_512470) and α3 (NP_039590), Chlamydophila abortus microvirus Clp4 (YP_338244), Bdellovibrio bacteriovorus microvirus ϕMH2K (NP_073537), Xanthomonas campestris inovirus ϕLf (AAC54630), E. coli myovirus P2 (NP_046795). (C) Alignment of genomic regions containing genes coding for the packaging ATPases and the double β-barrel major capsid proteins (MCP) of membrane-containing dsDNA viruses of bacterial (PM2, PRD1, Bam35) and archaeal (STIV) origins with the corresponding regions of proviruses TKV4 and MVV. Sequences are aligned to the ATPase-coding gene. 298 M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300

Replication proteins predicted transcription factor of STSV1, but also SpoVT/AbrB- like transcriptional regulators of Bacillus phage phBC6A51 MCM proteins are DNA-dependent ATPases required for the (accession no. NP_852518) and Geobacillus phage GSVE2 initiation of archaeal and eukaryotic DNA replication (Maiorano (accession no. YP_001285838). et al., 2006). A gene coding for an MCM family protein is MVV,on the other hand, seems to utilize a MarR-like winged- present in the genome of TKV4 and is homologous to a large helix transcriptional regulator (accession no. EDP40286), as number of archaeal MCM proteins, as analyzed by PSI-BLAST. revealed by protein domain prediction program InterProScan Interestingly, when gpTK1361 (YP_183774) was used as a seed (E=7.5e-9). The protein shares significant sequence similarity and PSI-BLAST was executed against viral sequences only, two with a number of transcription factors from other methanococci other viral proteins were identified. The best match was to the (30–50% identity). MarR-like transcriptional regulators have MCM family protein (YP_919062) from haloarchaeal virus BJ1 also been found in the genomes of crenarchaeal fuselloviruses (E=1e-53), while the second best match was to the MCM (Kraft et al., 2004) and rudiviruses (Vestergaard et al., 2005). domain of the protein from Bacillus phage phBC6A51 Indeed, after the second PSI-BLAST iteration hits were obtained (NP_852497; E=7e-5). The latter protein of phage phBC6A51 to ORF E150 of Sulfolobus spindle-shaped virus Ragged Hills consists of two domains — an N-terminal (residues 86–189) (NP_963938; E=1e-5) and MarR family transcription factor of primase domain (Pfam entry: DNA_primase_S) similar to that Sulfolobus spindle-shaped virus 4 (YP_001552198, E=1e-4). found in the small subunit of archaeal and eukaryotic DNA primases, and a C-terminal (residues 629–855) MCM domain. Organization of the viral “self” genes As also noticed by McGeoch and Bell (2005), this observation is intriguing since both domains are believed to be archaea/ As mentioned above, genes coding for the MCP and the eukaryote-specific. packaging ATPase are the only two genes that are present in MVV genome is most probably replicated by the rolling genomes of all members of the PRD1-adenovirus lineage. circle mechanism. We identified an ORF located upstream of Interestingly, in prokaryotic members of this lineage (bacterial the putative integrase gene and potentially encoding a 493 aa- and archaeal viruses) those two genes are always found close to long protein (accession no. EDP40280). When the protein each other with the ATPase gene being upstream of the MCP- sequence was used as a seed in the PSI-BLAST search, only two coding gene (Fig. 4C). TKV4 and MVV also follow this pattern, archaeal proteins with unknown function were identified as further indicating that these two proviruses belong to the PRD1- possible homologues with a significant E-value. However, after adenovirus lineage. It is tempting to speculate that such an three iterations a number of rolling circle replication (RCR) organization is maintained to decrease the likelihood of initiation proteins from were identified among separation of the viral “self” genes by recombination. Eukaryotic the search results (E=9e-14–9e-66). Three motifs, I–III, are members of the lineage, such as adenoviruses, phycodnaviruses, common to RCR proteins from diverse subgroups but only the iridoviruses, African swine fever virus (ASFV), and Mimivirus superfamily I members contain two active tyrosine residues (Benson et al., 2004; Strömsten et al., 2005) possess much larger within motif III, whereas the others contain only one (Ilyina and genomes and the distance between the two genes is usually larger Koonin, 1992). We identified all three motifs in the sequence of than it is in the prokaryotic relatives. Nonetheless, the putative the putative replication protein of MVV (Fig. 4B). The third packaging ATPase- and the MCP-coding genes of the ASFV are motif contained both conserved tyrosine residues, suggesting spatially separated by only two genes, although the MCP-coding that MVV replication protein belongs to the superfamily I of gene is situated upstream of the ATPase-coding gene. RCR proteins. The same replication strategy is employed by bacteriophage PM2, a genuine member of the PRD1-adenovirus Conclusions lineage (Espejo et al., 1971; Männistö et al., 1999). It is interesting to note that the two viral elements, TKV4 and MVV, In this study we analyzed genomic sequences of two pro- presumably sharing the same overall architectural principle use viruses, TKV4 and MVV, residing in the genomes of hyper- different replication strategies, suggesting that viral “self” thermophilic euryarchaeon T. kodakaraensis KOD1 and elements and the replication systems are uncoupled. This is in methanogenic euryarchaeon M. voltae A3, respectively. We agreement with our previous finding that corticoviral PM2-like propose a scheme in which the circular dsDNA genome of the prophages encode at least four nonorthologous replication TKV4 virus integrates into the chromosome of T. kodakaraensis proteins (Krupovič and Bamford, 2007). KOD1, leading to the genomic organization of the current organism (Fig. 1). Temperate viruses, including archaeal ones, Transcription factors usually integrate into the 3′-distal region of various tRNA genes, while TKV4 uses the 5′-distal region of tRNALeu as the SpoVT/AbrB-like transcriptional regulators have been pre- att site. In addition to TKV4 and MVV we also identified a viously found to be encoded in the genomes of two crenarchaeal region, adjacent to the MVV genomic sequence, which we viruses, STSV1 and SIFV (Prangishvili et al., 2006). An predict to represent the integrated form of the previously exhaustive PSI-BLAST search (4 iterations, 0.05 as an E-value described A3 VLP (Wood et al., 1989). Comparative analysis of cut-off; BLOSUM62 matrix) using gpTK1372 (YP_183785) of the viral sequences showed that TKV4 and MVV share several TKV4 as a query led to the identification of not only the genes with archaeal and bacterial viruses. Interestingly, TKV4 is M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300 299 one of the few examples of viruses where the genome rep- them was chosen on the basis of having the best stereochemical lication apparently relies on proteins of the MCM family. quality. The latter was validated using MolProbity (Lovell et al., Although it is not currently possible to tell with certainty 2003)at(http://molprobity.biochem.duke.edu/). The structural whether any of these prophages is inducible, we predict that superpositioning of the model was generated and the X-ray MVV replicated its genome via rolling circle mechanism, as all structure of the STIV MCP was performed using the STAMP three conserved motifs of the superfamily I rolling circle repli- algorithm (Russell and Barton, 1992), and the results were cation proteins were identified in the putative MVV replication visualized with the VMD program (Humphrey et al., 1996). proteins. We were also able to identify genes coding for the MCPs of TKV4 and MVV and modeled the three-dimensional Acknowledgments structure of the former protein, using the MCP of STIV as a template. The stereochemical quality of the model supports the We are grateful to Dr. Janne Ravantti for the useful sequence analysis-based prediction of the function for this discussions and his help with the modeling. We also sincerely protein. Finally, the presence of the double β-barrel MCP in thank the knowledgeable unknown reviewers for their brilliant combination with conservation of the putative packaging suggestions. This work was supported by the Academy of ATPase, common to all internal-membrane-containing viruses, Finland grants 1213467 (Finnish Center of Excellence Program allowed us to place TKV4 and MVV into the PRD1-adenovirus [2006–2011]), 1213992 and 1210253 to D.H.B. MK is lineage. Up to now, STIV, which infects a crenarchaeal host, supported by the Viikki Graduate School in Biosciences. was the only representative of the PRD1-adenovirus lineage in the domain Archaea. Proviruses TKV4 and MVV extend the References lineage to the phylum Euryarchaeota. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of Materials and methods protein database search programs. Nucleic Acids Res. 25, 3389–3402. Bamford, D.H., 2003. Do viruses form lineages across different domains of life? – Genomic sequence analysis of proviruses TKV4 and MVV Res. Microbiol. 154, 231 236. Bamford, D.H., Grimes, J.M., Stuart, D.I., 2005a. What does structure tell us about virus evolution? Curr. Opin. Struct. Biol. 15, 655–663. The genome sequence of T. kodakaraensis KOD1 (Fukui et al., Bamford, D.H., Ravantti, J.J., Rönnholm, G., Laurinavičius, S., Kukkaro, P., 2005; accession no. NC_006624) and the shotgun sequence Dyall-Smith, M., Somerharju, P., Kalkkinen, N., Bamford, J.K., 2005b. containing the MVV provirus (accession no. ABHB01000001) Constituents of SH1, a novel lipid-containing virus infecting the halophilic – were downloaded from the GenBank database. The genomic euryarchaeon Haloarcula hispanica. J. Virol. 79, 9097 9107. Benson, S.D., Bamford, J.K., Bamford, D.H., Burnett, R.M., 1999. Viral sequences were analyzed using Vector NTI Suite 8.0 (InforMax evolution revealed by bacteriophage PRD1 and human adenovirus coat Inc). Possible homologues were searched for using PSI-BLAST protein structures. Cell 98, 825–833. (BLOSUM62 matrix, 0.05 as an E-value cut-off) (Altschul et al., Benson, S.D., Bamford, J.K., Bamford, D.H., Burnett, R.M., 2004. Does 1997) in the nonredundant protein database at NCBI. Pro- common architecture reveal a viral lineage spanning all three domains of – tein domains and membrane spanning regions were predicted life? Mol. Cell 16, 673 685. Casjens, S., 2003. Prophages and bacterial genomics: what have we learned so using InterProScan (http://www.ebi.ac.uk/InterProScan/index. far? Mol. Microbiol. 49, 277–300. html) and TMpred (http://www.ch.embnet.org/software/ Espejo, R.T., Canelo, E.S., Sinsheimer, R.L., 1971. Replication of bacteriophage TMPRED_form.html), respectively. Sequence alignments were PM2 deoxyribonucleic acid: a closed circular double-stranded molecule. created using CLUSTALW (Thompson et al., 1994). tRNA genes J. Mol. Biol. 56, 597–621. were analyzed using tRNAscan-SE (http://lowelab.ucsc.edu/ Fukui, T., Atomi, H., Kanai, T., Matsumi, R., Fujiwara, S., Imanaka, T., 2005. Complete genome sequence of the hyperthermophilic archaeon Thermo- tRNAscan-SE/). coccus kodakaraensis KOD1 and comparison with Pyrococcus genomes. Genome Res. 15, 352–363. Prediction, modeling and verification of the three-dimensional Ginalski, K., Elofsson, A., Fischer, D., Rychlewski, L., 2003. 3D-Jury: a simple structure approach to improve protein structure predictions. Bioinformatics 19, 1015–1018. Hendrix, R.W., Smith, M.C., Burns, R.N., Ford, M.E., Hatfull, G.F., 1999. BioInfoBank MetaServer (Ginalski et al., 2003), which Evolutionary relationships among diverse bacteriophages and prophages: all derives and scores consensus predictions from the results of the world's a phage. Proc. Natl. Acad. Sci. U. S. A. 96, 2192–2197. various primary structure prediction servers, was used for Humphrey, W., Dalke, A., Schulten, K., 1996. VMD: visual molecular dynamics. prediction of the tertiary structure of the putative major capsid J. Mol. Graph. 14, 33–38. proteins (MCPs) of TKV4 and MVV. The sequence of the STIV Hyun, J.K., Coulibaly, F., Turner, A.P., Baker, E.N., Mercer, A.A., Mitra, A.K., 2007. The structure of a putative scaffolding protein of immature poxvirus MCP (accession no. YP_025022), for which structural informa- particles as determined by electron microscopy suggests similarity with tion is available (PDB code 2bbd), was aligned with the capsid proteins of large icosahedral DNAviruses. J. Virol. 81, 11075–11083. corresponding protein sequences of TKV4 (accession no. Ilyina, T.V., Koonin, E.V., 1992. Conserved sequence motifs in the initiator YP_183766) and MVV (accession no. EDP40293) using proteins for rolling circle DNA replication encoded by diverse replicons from – CLUSTALW (Thompson et al., 1994). The resulting alignment eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res. 20, 3279 3285. Khayat, R., Tang, L., Larson, E.T., Lawrence, C.M., Young, M., Johnson, J.E., was utilized to build a three-dimensional model of the putative 2005. Structure of an archaeal virus capsid protein reveals a common TKV4 MCP using the MODELLER program (Marti-Renom ancestry to eukaryotic and bacterial viruses. Proc. Natl. Acad. Sci. U. S. A. et al., 2000). Ten variants of the model were generated and one of 102, 18944–18949. 300 M. Krupovič, D.H. Bamford / Virology 375 (2008) 292–300

Kraft, P., Oeckinghaus, A., Kümmel, D., Gauss, G.H., Gilmore, J., Wiedenheft, Prangishvili, D., Garrett, R.A., Koonin, E.V., 2006. Evolutionary genomics of B., Young, M., Lawrence, C.M., 2004. Crystal structure of F-93 from Sul- archaeal viruses: unique viral genomes in the third domain of life. Virus Res. folobus spindle-shaped virus 1, a winged-helix DNA binding protein. 117, 52–67. J. Virol. 78, 11544–11550. Reiter, W.D., Palm, P., Yeats, S., 1989. Transfer RNA genes frequently serve as Krupovič, M., Bamford, D.H., 2007. Putative prophages related to lytic tailless integration sites for prokaryotic genetic elements. Nucleic Acids Res. 17, marine dsDNA phage PM2 are widespread in the genomes of aquatic 1907–1914. bacteria. BMC Genomics 8, 236. Rice, G., Tang, L., Stedman, K., Roberto, F.,Spuhler, J., Gillitzer, E., Johnson, J.E., Laurinmaki, P.A., Huiskonen, J.T., Bamford, D.H., Butcher, S.J., 2005. Douglas, T., Young, M., 2004. The structure of a thermophilic archaeal virus Membrane proteins modulate the bilayer curvature in the bacterial virus shows a double-stranded DNA viral capsid type that spans all domains of life. Bam35. Structure 13, 1819–1828. Proc. Natl. Acad. Sci. U. S. A. 101, 7716–7720. Lovell, S.C., Davis, I.W., Arendall III, W.B., de Bakker, P.I., Word, J.M., Prisant, Russell, R.B., Barton, G.J., 1992. Multiple protein sequence alignment from M.G., Richardson, J.S., Richardson, D.C., 2003. Structure validation by tertiary structure comparison: assignment of global and residue confidence Calpha geometry: phi, psi and Cbeta deviation. Proteins 50, 437–450. levels. Proteins 14, 309–323. Maaty, W.S., Ortmann, A.C., Dlakić, M., Schulstad, K., Hilmer, J.K., Liepold, She, Q., Shen, B., Chen, L., 2004. Archaeal integrases and mechanisms of gene L., Weidenheft, B., Khayat, R., Douglas, T., Young, M.J., Bothner, B., 2006. capture. Biochem. Soc. Trans. 32, 222–226. Characterization of the archaeal thermophile Sulfolobus turreted icosahedral Strömsten, N.J., Bamford, D.H., Bamford, J.K., 2005. In vitro DNA packaging virus validates an evolutionary link among double-stranded DNA viruses of PRD1: a common mechanism for internal-membrane viruses. J. Mol. from all domains of life. J. Virol. 80, 7625–7635. Biol. 348, 617–629. Maiorano, D., Lutzmann, M., Méchali, M., 2006. MCM proteins and DNA Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving replication. Curr. Opin. Cell Biol. 18, 130–136. the sensitivity of progressive multiple sequence alignment through sequence Makino, S., Amano, N., Koike, H., Suzuki, M., 1999. Prophages inserted in weighting, position-specific gap penalties and weight matrix choice. Nucleic archaebacterial genomes. Proc. Jpn. Acad., Ser. B 75, 166–171. Acids Res. 22, 4673–4680. Männistö, R.H., Kivelä, H.M., Paulin, L., Bamford, D.H., Bamford, J.K., 1999. Verheust, C., Fornelos, N., Mahillon, J., 2005. GIL16, a new Gram-positive The complete genome sequence of PM2, the first lipid-containing bacterial tectiviral phage related to the Bacillus thuringiensis GIL01 and the Bacillus virus to be isolated. Virology 262, 355–363. cereus pBClin15 elements. J. Bacteriol. 187, 1966–1973. Marti-Renom, M.A., Stuart, A.C., Fiser, A., Sanchez, R., Melo, F., Sali, A., Vestergaard, G., Häring, M., Peng, X., Rachel, R., Garrett, R.A., Prangishvili, 2000. Comparative protein structure modeling of genes and genomes. Annu. D., 2005. A novel rudivirus, ARV1, of the hyperthermophilic archaeal genus Rev. Biophys. Biomol. Struct. 29, 291–325. Acidianus. Virology 336, 83–92. McGeoch, A.T., Bell, S.D., 2005. Eukaryotic/archaeal primase and MCM Whitman, W.B., Shieh, J., Sohn, S., Caras, D.S., Premachandran, U., 1986. proteins encoded in a bacteriophage genome. Cell 120, 167–168. Isolation and characterization of 22 mesophilic methanococci. Syst. Appl. Morikawa, M., Izawa, Y., Rashid, N., Hoaki, T., Imanaka, T., 1994. Purification Microbiol. 7, 235–240. and characterization of a thermostable thiol protease from a newly isolated Woese, C.R., Kandler, O., Wheelis, M.L., 1990. Towards a natural system of hyperthermophilic Pyrococcus sp. Appl. Environ. Microbiol. 60, 4559–4566. organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Nandhagopal, N., Simpson, A.A., Gurnon, J.R., Yan, X., Baker, T.S., Graves, Natl. Acad. Sci. U. S. A. 87, 4576–4579. M.V., Van Etten, J.L., Rossmann, M.G., 2002. The structure and evolution of Wood, A.G., Whitman, W.B., Konisky, J., 1989. Isolation and characterization the major capsid protein of a large, lipid-containing DNA virus. Proc. Natl. of an archaebacterial viruslike particle from Methanococcus voltae A3. Acad. Sci. U. S. A. 99, 14758–14763. J. Bacteriol. 171, 93–98. Porter, K., Kukkaro, P., Bamford, J.K., Bath, C., Kivelä, H.M., Dyall-Smith, M.L., Yan, X., Chipman, P.R., Castberg, T., Bratbak, G., Baker, T.S., 2005. The marine Bamford, D.H., 2005. SH1: a novel, spherical halovirus isolated from an algal virus PpV01 has an icosahedral capsid with T = 219 quasisymmetry. Australian hypersaline lake. Virology 335, 22–33. J. Virol. 79, 9236–9243.