Yutin and Koonin Virology Journal 2012, 9:161 http://www.virologyj.com/content/9/1/161

RESEARCH Open Access Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA of Natalya Yutin and Eugene V Koonin*

Abstract Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic group that consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive comparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50 conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryotic viruses. Results: We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrained tree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the core genes have been independently acquired from different sources by different NCLDV lineages whereas for the majority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDV was demonstrated. Conclusions: A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexity and diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherence between the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a common ancestral although the set of ancestral genes might be smaller than previously inferred from patterns of gene presence-absence.

Background of genes encoding responsible for many func- Viruses are ubiquitous, obligate, intracellular parasites of tions essential for virus reproduction. all cellular life forms that rely on the host cell translation One of the largest viral divisions that seem to be system, metabolism and, in many cases, the replication monophyletic includes 6 recognized families and a 7th and transcription systems, for their reproduction. There candidate family of viruses with large DNA is no evidence that all viruses have a monophyletic ori- that infect diverse eukaryotes and are collectively known gin, at least not under the traditional concept of mono- as Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) phyly. Indeed, not a single gene is conserved in the [3-6]. The formally recognized NCLDV families are Pox- genomes of all known viruses although a small group of viridae, Asfarviridae, , , Phycod- “viral hallmark genes” encoding some of the key proteins naviridae, and ; in addition, the recently involved in genome replication and virion structure for- discovered and the related Lausannevirus mation are shared by large, diverse subsets of viruses could not be assigned to any of the 6 families, and are [1,2]. However, several large groups of viruses infecting likely to become founding members of a new family diverse hosts do appear to share common ancestry in [7,8]. Hereinafter we speak of 7 NCLDV families for the the strict sense, that is, to have evolved from the same sake of simplicity. ancestral virus, as indicated by the conservation of sets By far the most thoroughly studied group of the NCLDV are the , the family of animal viruses * Correspondence: [email protected] that include a major human pathogen, the smallpox National Center for Biotechnology Information, National Library of Medicine, virus, important animal pathogens, such as rabbit National Institutes of Health, Bethesda, MD 20894, USA

© 2012 Yutin and Koonin; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Yutin and Koonin Virology Journal 2012, 9:161 Page 2 of 18 http://www.virologyj.com/content/9/1/161

myxoma virus, as well as vaccinia virus (VACV), one of between the viral families. Moreover, for some of the the best characterized models of molecular biology [9]. NCLDV genes with important functions in virus Another family of the NCLDV that has recently become reproduction, indications of complex evolutionary his- a major focus of attention is the Mimiviridae that tories have been obtained. The showcase for such evolu- includes giant viruses infecting and probably tionary complexity is the viral DNA ligase which is algae [10-13]. The genome of the prototype virus of this represented by two distantly related forms across the family, Acanthamoeba polyphaga [14], NCLDV families. A maximum likelihood reconstruction slightly exceeds one megabase (Mb), and other related based on the presence-absence of conserved genes in viruses possess even larger genomes [15,16], so the the viral genomes has implied that one of the two forms, Mimiviridae are undisputed genome size record holders the ATP-dependent DNA ligase, was the ancestral form in the virosphere. Indeed, in terms of genome size and that was present in the genome of the prototype complexity, the NCLDV eclipse numerous parasitic bac- NCLDV but was replaced by the distantly related teria, and approach the simplest free-living prokaryotes. NAD-dependent ligase in several viral lineages. How- The NCLDV infect animals and diverse unicellular ever, when the reconstruction was supplemented by eukaryotes, and either replicate exclusively in the cyto- phylogenetic analysis of the two forms of DNA ligase, plasm of the host cells, or encompass both cytoplasmic the opposite conclusion has been reached, namely that and nuclear stages in their life cycle. Most of the the NAD –dependent ligase was the ancestral form in NCLDV do not strongly depend on the host replication NCLDV that was displaced by the ATP-dependent lig- or transcription systems for completing their replication ase on several independent occasions [18]. This change [9,17]. The autonomous life style of the NCLDV is sup- in perspective occurred because phylogenetic analysis ported by a set of conserved proteins that are encoded indicated that the ATP-dependent ligases from different in the viral genomes and mediate most of the processes lineages of the NCLDV clustered with distinct groups of essential for viral reproduction. These essential, con- eukaryotic homologs whereas the NAD–dependent served proteins include DNA polymerases, helicases, ligases of the NCLDV appeared to be monophyletic. In and primases responsible for DNA replication, RNA the same vein, complex phylogenies suggestive of mul- polymerase subunits and transcription factors that func- tiple have been observed for tion in transcription initiation and elongation, Holliday several core NCLDV genes such as thymidine kinase or junction resolvases and topoisomerases involved in gen- the two subunits of ribonucleotide reductase [19]. ome DNA processing and maturation, ATPase pumps Taken together, these findings imply that some of the mediating DNA packaging, molecular chaperones apparently conserved genes of the NCLDV might actu- involved in assembly and capsid proteins them- ally have complex histories which could include inde- selves [3-5]. Although several viral hallmark genes are pendent (convergent) acquisition of these genes from shared by NCLDV and other large DNA viruses, such as different cellular organisms as well as displacement of herpesviruses, baculoviruses and some bacteriophages ancestral viral gene by homologs of cellular provenance. [2], the conservation of the large set of core genes clearly This line of reasoning prompted us to perform a com- demarcates the NCLDV as a distinct, most likely mono- prehensive phylogenetic analysis of the set of the puta- phyletic class of viruses [4,6]. More specifically, recon- tive ancestral NCLDV genes. Here we present the results structions of the ancestral NCLDV genome composition of this analysis which suggest that, although the exist- using maximum parsimony and maximum likelihood ence of a common ancestor of the NCLDV is beyond methods have delineated a set of approximately 50 genes reasonable doubt, most of the conserved NCLDV genes that are inferred to have been responsible for the key indeed had complex evolutionary histories. functions in the reproduction of the last common ances- tor of the NCLDV [5]. Results The core, presumably ancestral set of the NCLDV Approach and rationale genes was delineated using sequence similarity-based Maximum likelihood reconstruction of the gene reper- methods that have been previously employed for identi- toire of the putative ancestral NCLDV has revealed a fication of clusters of orthologous genes in diverse cellu- core consisting of approximately 50 viral genes (Table 1). lar life forms. The comparative genomic analysis Most of these genes are present in various subsets of the underlying these reconstructions was deliberately limited NCLDV genomes rather than all genomes (Additional to the NCLDV genomes to simplify the analysis and to file 1) but high likelihoods of ancestral provenance have facilitate detection of distant relationships between viral been estimated for each of them, with the absences ac- proteins. Indeed, some of the core NCLDV proteins, cordingly attributed to lineage-specific gene loss. We such as for example the packaging ATPases and the di- constructed multiple alignments and position-specific sulfide chaperones, show only weak sequence similarity scoring matrices for each of the ancestral NCLDV genes Yutin and Koonin Virology Journal 2012, 9:161 Page 3 of 18 http://www.virologyj.com/content/9/1/161

Table 1 Evolutionary scenarios for the ancestral NCLDV genes NCVOG [3,4] Vaccinia Functional # of viral NCVOG annotation Evolutionary scenario from constrained virus gene category families/ tree analysis numbera genomes NCVOG0038 E9L DNA replication, 6/45 DNA polymerase elongation Monophyletic NCLDV, common origin with recombination subunit family B baculoviruses and possibly herpesviruses and repair NCVOG0023 D5R DNA replication, 6/45 D5-like helicase-primase Monophyletic NCLDV but displaced by a recombination bacteriophage homolog in phycodnaviruses. and repair NCVOG1060 G5R DNA replication, 5/35 FLAP-like endonuclease Possibly monophyletic NCLDV although recombination XPG (cd00128) displacement by a phage homolog in and repair poxviruses cannot be ruled out; loss in phycodnaviruses and subsequent acquisition of a eukaryotic homolog by E. huxlei virus NCVOG0036 H6R DNA replication, 3/23 DNA topoisomerase IB Possibly monophyletic NCLDV recombination and repair NCVOG0037 None DNA replication, 6/15 DNA topoisomerase II Polyphyletic NCLDV, gene acquired recombination from different eukaryotes and repair NCVOG0035 None DNA replication, 3/7 NAD + dependent DNA Monophyletic NCLDV recombination ligase (smart00532) and repair NCVOG0034 A50R DNA replication, 4/19 ATP-dependent Polyphyletic NCLDV, gene acquired from recombination DNA ligase different eukaryotes and repair (pfam01068, PRK01109) NCVOG0278 A22R DNA replication, 5/36 RuvC, Holliday junction Insufficient sequence recombination resolvases (HJRs); cl00243. conservation for reliable phylogenetic analysis and repair Extended Pox_A22, Poxvirus A22 family (pfam04848). Marseille virus lacks C-term conserved positions. NCVOG0004_APa None DNA replication, 4/6 AP (apurinic) Strongly supported monophyly of NCLDV recombination endonuclease family 2 and repair NCVOG0004_NTa None DNA replication, 3/4 Nucleotidyltransferase Polyphyletic NCLDV, gene acquired from recombination (DNA polymerase different eukaryotes and repair beta family) NCVOG1192 None DNA replication, 4/13 YqaJ viral recombinase Possibly monophyletic NCLDV recombination family (pfam09588) and repair

3 NCVOG0024 None DNA replication, /4 Superfamily II helicase Limited sequence similarity between proteins recombination related to herpesvirus from different NCLDV; probable polyphyletic and repair replicative helicase origin; possibly not an ancestral NCLDV gene (origin-binding protein UL9), pfam03121 NCVOG1115 D4R DNA replication, 3/23 Uracil-DNA glycosylase Present in only a few NCLDV; uncertain, recombination monophyly of the NCLDV cannot be ruled out and repair NCVOG0274 J6R Transcription 6/36 DNA-directed RNA Displacement of ancestral NCLDV gene in and RNA polymerase subunit mimivirus and ASFV with eukaryotic RNAP2 and processing alpha RNAP1 subunit genes, respectively; loss in most phycodnaviruses. Ancestral NCLDV gene possibly derived from RNAP I NCVOG0271 A24R Transcription and 6/36 DNA-directed RNA Possibly polyphyletic NCLDV, with one subset RNA processing polymerase derived from RNAP1. and another from RNAP2; subunit beta likely more recent displacement with RNAP2 in mimivirus; however, monophyly of NCLDV cannot be ruled out; loss in most phycodnaviruses. Yutin and Koonin Virology Journal 2012, 9:161 Page 4 of 18 http://www.virologyj.com/content/9/1/161

Table 1 Evolutionary scenarios for the ancestral NCLDV genes (Continued) NCVOG0273 G5.5R Transcription and 5/15 DNA-directed RNA Uncertain, not enough sequence conservation RNA processing polymerase subunit 5 for reliable phylogenetic analysis NCVOG1164 A1L Transcription and 6/44 A1L transcription Uncertain, not enough sequence conservation RNA processing factor/late transcription for reliable phylogenetic analysis factor VLTF-2 NCVOG0262 A2L Transcription and 6/45 Poxvirus Late No obvious homologs outside NCLDV RNA processing Transcription Factor (monophyly of NCLDV by default). VLTF3 like NCVOG0261 A7L Transcription and 5/35 Poxvirus early No obvious homologs outside NCLDV RNA processing transcription factor (monophyly of NCLDV by default). (VETF), large subunit (pfam04441) NCVOG0272 E4L Transcription and 6/39 Transcription factor S-II Uncertain, not enough sequence conservation RNA processing (TFIIS)-domain- for reliable phylogenetic analysis containing protein NCVOG1127 One Transcription and 4/11 transcription initiation Strongly supported monophyly of NCLDV RNA processing factor IIB NCVOG0076 A18R Transcription and 6/38 DNA or RNA helicases of Monophyletic NCLDV except for displacement RNA processing superfamily II (COG1061) with a eukaryotic homolog in one phycodnavirus and acquisition of a distinct eukaryotic paralog in ASFV NCVOG0267 I8R Transcription and 3/23 RNA-helicase DExH-NPH-II Monophyletic NCLDV; independent losses in RNA processing several NCLDV lineages NCVOG1117 D1R Transcription and 6/33 mRNA capping enzyme Complex, variable domain architecture; apparent RNA processing large subunit monophyly of NCLDV for the conserved methyltransferase domain; guanylyltransferases of apparent distinct eukaryotic origin in a single iridovirus and a single phycodnavirus NCVOG0236 D9R, D10R Transcription and 6/29 Nudix hydrolase Uncertain, not enough sequence conservation for RNA processing reliable phylogenetic analysis NCVOG1088 None Transcription and 3/13 RNA ligase Monophyletic NCLDV; common origin with RNA processing baculovirus and possibly bacteriophage homologs - J3R Transcription and 2/16 PolyA polymerase, Present only in poxviruses and one mimivirus; RNA processing regulatory subunit however, monophyly strongly supported; possible ancestral gene NCVOG0276 F4L Nucleotide 6/29 Ribonucleotide reductase Polyphyletic NCLDV, complex evolutionary metabolism small subunit scenario; ancestral status uncertain; trees similar for the large and small subunits NCVOG1353 I4L Nucleotide 6/24 ribonucleoside diphosphate Polyphyletic NCLDV, complex evolutionary metabolism reductase, large subunit scenario; ancestral status uncertain; trees similar for the large and small subunits NCVOG0319 J2R Nucleotide 5/20 Thymidine kinase Most likely polyphyletic NCLDV although metabolism monophyly could not be statistically rejected NCVOG0320 A48R Nucleotide 4/21 Thymidylate kinase Most likely polyphyletic NCLDV although metabolism monophyly could not be statistically rejected NCVOG1068 F2L Nucleotide 4/30 dUTPase (cl00493) Polyphyletic NCLDV, monophyly rejected metabolism NCVOG0022 D13L Virion structure and 6/45 NCLDV major capsid No homologs outside NCLDV – monophyletic morphogenesis protein NCLDV by default NCVOG0249 A32L Virion structure and 6/45 A32-like packaging ATPase Only distant homologs outside NCLDV – morphogenesis monophyletic NCLDV by default NCVOG0211 F9L, L1R Virion structure and 4/34 myristylated envelope No homologs outside NCLDV – monophyletic morphogenesis protein NCLDV by default NCVOG1122 G9R, J5L, Virion structure and 3/31 Myristylated protein; No homologs outside NCLDV – monophyletic A16L morphogenesis pfam03003, DUF230 NCLDV by default NCVOG0052 E10R Virion structure and 6/44 disulfide (thiol) Monophyletic NCLDV morphogenesis oxidoreductase/isomerase; Erv1 / Alr family (pfam04777) Yutin and Koonin Virology Journal 2012, 9:161 Page 5 of 18 http://www.virologyj.com/content/9/1/161

Table 1 Evolutionary scenarios for the ancestral NCLDV genes (Continued) NCVOG0256 H3L Virion structure and 2/22 envelope protein, Apparent monophyletic NCLDV but represented morphogenesis glycosyltransferase only in poxviruses and ; independent acquisition cannot be ruled out NCVOG0040 H1L Signal transduction, 4/30 cd00127, DSPc, Dual Most likely independent acquisitions by regulation specificity phosphatases different NCLDV (DSP); Ser/Thr and Tyr protein phosphatases NCVOG0330 VACWR012, Signal transduction, 5/26 RING-finger-containing E3 Most likely independent acquisition by different VACWR207 regulation ubiquitin ligase groups of the NCLDV; probably not an ancestral (COG5432: RAD18) gene NCVOG0329 Signal transduction, 2/3 UBCc, Ubiquitin-conjugating Present only in mimivirus and ASFV; independent regulation enzyme E2 (cd00195) acquisitions from eukaryotes, NCLDV monophyly rejected; not an ancestral gene NCVOG0246 Signal transduction, 3/4 Ulp1-like protease/ Uncertain, highly diverged NCLDV sequences regulation deubiquitinating enzyme NCVOG0009 Virus-host 3/4 pfam00653: BIR (Baculovirus Most likely independent acquisition by interactions Inhibitor of apoptosis different groups of the NCLDV; probably not protein Repeat) domain an ancestral gene NCVOG0012 A33R, A44R, Virus-host 3/20 C-type lectin: smart00034, Poorly conserved sequence, most likely A40R interactions cd03594, cd03593, pfam00059, independent acquisitions by different groups cd00037, pfam05966 of NCLDV; probably not an ancestral gene NCVOG1361 Uncharacterized 6/11 T5orf172 domain (pfam10544) Different domain architectures; most likely independent acquisition by different groups of the NCLDV; probably not an ancestral gene NCVOG1360 VACWR011, Uncharacterized 3/18 KilA domain (pfam04383); Different domain architectures; most likely VACWR208 always present at protein independent acquisition by different groups N-termini except for of the NCLDV; probably not an ancestral gene mimiviruses. In some proteins fused to the a RING-finger domain. NCVOG1424 Uncharacterized 3/6 Uncharacterized domain; Different domain architectures; most likely found downstream of KilA, independent acquisition by different groups BRO, and MSV199 domains. of the NCLDV; probably not an ancestral gene Also found in some baculoviruses. NCVOG0010 Uncharacterized 4/11 pfam02498: Bro-N; BRO Different domain architectures; most likely family, N-terminal domain: independent acquisition by different groups This family includes the of the NCLDV; probably not an ancestral gene N-terminus of baculovirus BRO and ALI motif proteins. NCVOG0059 Uncharacterized 2/3 FtsJ-like methyltransferase Present only in mimivirus and ASFV; monophyly family proteins (pfam01728) cannot be ruled out aThe systematic gene names are for the Copenhagen strain of Vaccinia virus except for the names starting with VACWR which are from the WR strain because the respective genes are missing/disrupted in the Copenhagen strain. and used these to search for homologs from cellular NCLDV. When justified, additional constrained trees, organisms and other viruses. The resulting sequence sets with topologies corresponding to plausible evolutionary were clustered to include all NCLDV proteins along with scenarios, were analyzed for individual genes. Below we representatives of as many major lineages of cellular present the results of the constrained phylogenetic tree organisms as possible but also to trim the sets down to a analysis of the inferred ancestral NCLDV genes. size amenable to detailed phylogenetic analysis (see Methods for the details). For the sequence sets thus Genes involved in genome replication, recombination and selected, ML phylogenetic trees were constructed and repair examined using the constrained tree approach (see The DNA replication of the NCLDV is largely independ- Methods). Specifically, for all trees in which the NCLDV ent of the host replication, and accordingly, genes en- did not appear as a single strongly supported clade, the coding protein components of the complex replication likelihood of the original tree was compared with the machinery represent a major part of the NCLDV gene likelihood of a tree constrained for monophyly of core (Table 1). This is the largest group in the ancestral Yutin and Koonin Virology Journal 2012, 9:161 Page 6 of 18 http://www.virologyj.com/content/9/1/161

NCLDV gene set that includes 13 genes two of which, analogous protein from a bacteriophage in the ancestor the DNA polymerase (DNAP) and the primase-helicase, of phycodnaviruses. This appears to be a typical case of are central and indispensable to replication and are xenologous gene displacement [25]). shared by all viruses of this class (it has been claimed The third gene involved in replication that is present that the primase-helicase was missing in some phycod- in a large majority of the NCLDV encodes a FLAP nu- naviruses [20]; however, in the process of NCVOG con- clease, an enzyme that is also ubiquitous in cellular life struction [5], this gene was identified in all complete forms and removes single-stranded overhangs from rep- NCLDV genomes, the high sequence divergence in some lication and recombination intermediates. Nucleases of of the viruses notwithstanding). The B family of DNAPs, this family are essential for replication and in particular to which the NCLDV polymerases belong, includes the recombination in poxviruses [26] but can also assume main replicative polymerases of all and eukar- other functions such as mRNA degradation as is the yotes as well as many bacterial, archaeal and eukaryotic case in herpesviruses [27]. The phylogenetic tree of viruses. The unconstrained phylogenetic tree for the FLAP nucleases did not include an NCLDV clade. In- DNAP failed to recover an NCLDV clade. Instead, the stead, poxviruses clustered with bacterial homologs, the majority of the NCLDV DNAPs along with the DNAPs only representative in phycodnaviruses (Emiliana huxlei of herpesviruses formed a clade with eukaryotic DNAP virus) placed within the eukaryotic branch, and the rest delta and zeta; a sister group to this large branch was a of the NCLDV formed a clade with the herpesvirus clade that consisted of poxvirus, asfarvirus and baculo- homologs (Figure 1D). However, a constrained tree with virus DNAPs (Figure 1A). However, constrained trees in an NCLDV clade (excluding the Emiliana huxlei virus) which monophyly of the NCLDV was enforced could could not be statistically rejected (Additional file 2). The not be rejected by statistical tests (Additional file 2). placement of E. huxlei virus within the eukaryotic sub- Moreover, among the tested trees, the tree in which the tree was supported by high boostrap values, and mono- NCLDV clade also included baculoviruses (as the sister phyly of this virus with the rest of the NCLDV was group to asfarviruses) and herpesviruses as the sister weakly supported (AU value <0.1) even if not firmly group to the composite NCLDV-baculovirus clade, had rejected (Additional file 2). At present we cannot confi- the highest associated likelihood although none of the dently conclude whether the poxvirus FLAP nuclease is analyzed tree topologies could be statistically rejected monophyletic with those of other NCLDV or evolved (Figure 1B and Additional file 2). Thus, the results of the through displacement of the ancestral form with a phage constrained tree analysis are compatible with the mono- homolog. However, even the simplest evolutionary sce- phyly of the DNAPs of all NCLDV and moreover of all nario for this gene involves loss of the ancestral gene in large DNA viruses of eukaryotes. This conclusion con- phycodnaviruses followed by regain of the eukaryotic trasts the results of several previous phylogenetic ana- homolog by the E. huxlei virus. lyses that failed to recover an NCLDV clade but did not The remaining genes in this group are less common in report attempts to analyze constrained trees [21-23]. It NCLDV although the ML reconstruction mapped them seems likely that the apparent non-monophyly of the to the last common ancestor; the implication of the in- NCLDV in previous studies was caused by branch length ferred ancestral status of these genes is that they have effects, in particular acceleration of evolution in some of been lost on multiple occasions during the evolution of the viruses resulting in long branch attraction [24] (note the NCLDV. Topoisomerase IB is represented in pox- the branch length, in particular for Poxviridae in viruses, one phycodnavirus and mimiviruses. In the Figure 1A,B) and possibly other artifacts of phylogenetic phylogenetic tree, poxviruses form a clade with the phy- analysis. codnavirus but the position of the mimivirus is uncertain Phylogenetic analysis of the second hallmark viral rep- (Figure 2A). The monophyly of the NCLDV could not lication gene, the primase-helicase, revealed a strongly be statistically rejected, so for this gene there is no supported clade that included 6 of the 7 NCLDV fam- convincing indication of displacement in any of the ilies: phycodnaviruses belonged to a different, also well- NCLDV. supported branch together with diverse bacteriophage A greater number of NCLDV encode Topoisomerase and bacterial (most likely, prophage) homologs II that is unrelated to Topoisomerase IB. The phylogen- (Figure 1C). In this case, the constrained tree topology etic tree of Topo II (Figure 2B) did not include an that included an NCLDV clade was rejected at a statisti- NCLDV clade, and monophyly of the NCLDV could be cally significant level (Additional file 2). Thus, the result statistically rejected, with the implication of independent of the phylogenetic analysis of the NCLDV primase- acquisition of the Topo II gene from eukaryotes in helicase appears to be best compatible with the presence mimiviruses and from a prokaryotic source in crocodile of this gene in the ancestral NCLDV followed by poxvirus, with the provenance of this gene in the rest of displacement with a homologous and functionally the NCLDV remaining uncertain (Additional file 2). In Yutin and Koonin Virology Journal 2012, 9:161 Page 7 of 18 http://www.virologyj.com/content/9/1/161

ABDNA polymerase DNA polymerase, constrained 100 100 Archaea B1 Archaea B1 99 100 89 Archaea B3 82 91 100 Archaea B3 100 Eukaryotic alpha 100 Eukaryotic alpha 72 70 100 100 52 Poxviridae Eukaryotic delta 100 100 93 77 Eukaryotic zeta 100 100 Asfarviridae Poxviridae 100 100 Eukaryotic delta 94 Baculoviridae 89 57 100 100 Eukaryotic zeta Asfarviridae 86 100 100 91 Ascoviridae 86 Ascoviridae 94 94 67 98 Iridoviridae 84 Iridoviridae 78 60 Marseillevirus Marseillevirus 100 72 58 Mimiviridae 80 Mimiviridae 61 60 76 54 Phycodnaviridae 96 100 Phycodnaviridae Phycodnaviridae (Phaeovirus) 100 100 Herpesviridae

0.2 0.2 CDPrimase-helicase FLAP endonuclease

88 99 Archaea 100 Bacteria 81 Bacteria/phages 90 91 100 100 93 Poxviridae 59 Phycodnaviridae Ea Entdi167391034 q2 Emihu73852488 69 Bf Lacre148543897 El Enccu19173066 77 El Macmu109105914 100 Bacteria/phages 87 E9 Arath42570539 Bp Desps51244031 E8 Phatr219125938 61 Ew Triva123335114 52 Bp Dessa242279615 Ec Tetth146171182 Ek Leibr154339662 100 99 Irido/Ascoviridae Herpesviridae 89 99 Ascoviridae 52 88 Marseillevirus 78 Iridoviridae 94 97 58 96 Mimiviridae Marseillevirus 65 63 69 Asfarviridae environmental sequences 92 85 environmental sequences 100 Poxviridae Mimivirus 54 environmental sequences

0.5 0.2 Figure 1 Phylogenetic trees of ancestral NCLDV genes involved in DNA replication. (A). DNA polymerase, the original ML tree. (B). DNA polymerase, constrained tree, monophyly of NCLDV, baculoviruses and herpesviruses enforced. (C). Primase-helicase. (D). FLAP endonuclease. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated. Species abbreviations: Arath, Arabidopsis thaliana; Desps, Desulfotalea psychrophila LSv54; Dessa, Desulfovibrio salexigens DSM 2638; Emihu, virus 86; Enccu, Encephalitozoon cuniculi GB-M1; Entdi, Entamoeba dispar SAW760; Lacre, Lactobacillus reuteri DSM 20016; Leibr, Leishmania braziliensis MHOM/BR/75/M2904; Macmu, Macaca mulatta; Phatr, Phaeodactylum tricornutum CCAP 1055/1; Tetth, Tetrahymena thermophila; Triva, Trichomonas vaginalis G3; Bf, Firmicutes; Bp, Proteobacteria; E8, Stramenopiles; E9, Viridiplantae; Ea, Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; Ew, Parabasalidea; q2, . Yutin and Koonin Virology Journal 2012, 9:161 Page 8 of 18 http://www.virologyj.com/content/9/1/161

ABTopoisomerase IB DNA Topoisomerase II 88 Eukaryores 95 Bacteria 88 54 Mimivirus Mimivirus 57 50 env142094046 98 Poxviridae 100 85 100 q6 Ostvi163955216 89 100 q6 Ostta290343645 50 env142904322 env136436120 q1 Parbu157953897 50 environmental sequences 80 env144065368 45 100 Eukaryotes 100 l1 Invir109287964 71 96 At Nitma161528830 l2 Invir15078758 87 Marseillevirus 0.2 q2 Emihu73852918 c1 Afrsw9628219 C AP endonuclease 2 u1 Crovi115531754 100 100 u2 Melsa9631347 Archaea, Bacteria 52 u2 Amsmo9964524 c1 Afrsw9628240 0.2 51 env134945261 78 68 env139705312 D Nucleotidyltransferase env142958695 68 Mimivirus 85 Mimivirus 74 100 env143874461 env143530885 53 env138728848 54 env136260467 98 91 Marseillevirus 76 Eukaryotes env136708238 75 98 env134864780 env143744333 Bacteria, Archaea, Eukaryotes 92 environmental sequences u2 Amsmo9964524 82 100 0.2 u2 Melsa9631347 env136814837 E YqaJ-like recombinase env136279503 c1 Afrsw9628204 56 E9 Arath30697546 51 100 E9 Sorbi242073512 97 Archaea, Bacteria 64 E9 Phypa168007250 Bp Ruesp99080624 67 0.2 99 E9 Micsp255088687 83 E9 Ostlu145350267 70 q1 Parbu9631735 65 env142361341 49 env143316185 85 q6 Ostvi163955119 55 c1 Afrsw9628216 60 environmental sequences 87 79 55 environmental sequences 74 70 Mimivirus 99 Eukaryotes 100 Bacteria, phahes

76 env135373769 88 u2 Amsmo9964554 99 env144017490 q1 Parbu9632034 q3 Ectsi13242536 Figure 2 (See legend on next page.) Yutin and Koonin Virology Journal 2012, 9:161 Page 9 of 18 http://www.virologyj.com/content/9/1/161

(See figure on previous page.) Figure 2 Phylogenetic trees of ancestral NCLDV genes involved in DNA replication, recombination and repair. (A). Topoisomerase IB. (B). DNA Topoisomerase II. (C). AP endonuclease 2. (D). Nucleotidyltransferase. (E). YqaJ-like recombinase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Invir, Invertebrate iridescent virus; Crovi, Crocodilepox virus; Afrsw, African swine fever virus; Amsmo, Amsacta moorei entomopoxvirus 'L'; Arath, Arabidopsis thaliana; Ectsi, Ectocarpus siliculosus virus 1; Emihu, Emiliania huxleyi virus 86; Ostlu, Ostreococcus lucimarinus CCE9901; Ostta, Ostreococcus tauri virus 1; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus FR483; Phypa, Physcomitrella patens subsp. patens; Melsa, Melanoplus sanguinipes entomopoxvirus; Micsp, Micromonas sp. RCC299; Nitma, Nitrosopumilus maritimus SCM1; Ruesp, Ruegeria sp. TM1040; Sorbi, Sorghum bicolor; At, Thaumarchaeota; Bp, Proteobacteria; E9, Viridiplantae; c1, Asfarviridae; l1, Chloriridovirus; l2, Iridovirus; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; u2, Entomopoxvirinae. principle, the opposite direction of gene transfer, from and this conclusion is supported by our present ana- specific groups of NCLDV to the respective cellular life lysis (Figure 3A,B). However, for the alpha subunit of forms, cannot be ruled out. However, it appears excep- the RNAP, the constrained tree in which the NCLDV tionally unlikely that, for example, a diverse assortment form a clade, with the exception of the mimivirus of bacteria and archaea received the Topo II gene specif- and ASFV, actually had the highest likelihood ically from the crocodile poxvirus lineage. (Figure 3C; Additional file 2). Notably, in this tree Phylogenetic analysis of apurinic endonuclease 2 the mimivirus gene clustered with the eukaryotic (AP2), a repair enzyme represented in a diverse subset RNAP 2 whereas ASFV clustered with RNAP 1 of NCLDV, shows unequivocal support for the mono- (Figure 3C), suggestive of two independent displace- phyly of the NCLDV, assuming that the numerous homolo- ments of the ancestral NCLDV gene. Similar results gous environmental sequences belong to uncharacterized were obtained for the RNAP beta subunit: although members of the respective NCLDV groups (Figure 2C). In in this case the original tree with polyphyletic entomopoxviruses, AP2 is fused with a nucleotidyltransfer- NCLDV had the greatest likelihood, the tree with an ase (a member of the DNA polymerase beta family) NCLDV clade excluding mimivirus and ASFV was whereas ASFV and mimivirus possess a distinct gene en- nearly as well supported (Additional file 2). As in the coding a nucleotidyltransferase. However, the phylogenetic case of the alpha subunit, in the RNAP beta subunit tree of nucleotidyltransferases strongly suggests independ- tree the mimivirus gene clustered with the eukaryotic ent acquisitions of this gene by several NCLDV lineages RNAP 2 whereas ASFV clustered with RNAP 1 (Figure 2D). (Figure 3B). The phylogenetic tree of the YqaJ family recombinase, Three additional RNAP subunits/transcription factors an enzyme whose role in NCLDV replication and/or re- were too divergent to reconstruct reliable phylogenetic pair remains uncertain, is best compatible with multiple trees (Table 1). However, for Transcription factor TFIIB, acquisitions from bacteriophages although monophyly of monophyly of the NCLDV was strongly supported the NCLDV could not be ruled out (Figure 2E and (Figure 3D). Additional file 2). Phylogenetic analysis of Superfamily 2 helicases hom- The unusual case of two distinct DNA ligases was ana- ologous to Vaccinia A18 protein (a helicase involved in lyzed in detail previously [18]. Surprisingly, the ATP- late transcription [30]) revealed a strongly supported dependent ligase that is most common among the NCLDV clade, with two NCLDV genes placed in other NCLDV shows clear signs of polyphyletic origin with parts of the tree (Figure 4A). These two anomalies in- multiple, independent acquisitions by several virus clude the E. huxlei phycodnavirus which fell within a lineages whereas the less common NAD-dependent lig- bacteriophage cluster and one of the two members of ase appears to be monophyletic, and probably ancestral. this helicase family encoded by ASFV. Interestingly, this A re-analysis performed in the course of this work using ASFV protein clustered in the tree and shared domain an up to date set of sequences provided strong support organization with homologs from Kinetoplastida, sug- for this conclusion (Additional file 3). gesting the possibility of gene acquisition from unicellu- lar eukaryotes by an ancestral asfarvirus. Monophyly of Genes involved in transcription and mRNA maturation this Asfarvirus protein with other NCLDV was strongly All the NCLDV, with the exception of the majority of rejected (Additional file 2). Thus, two Asfarvirus pro- phycodnaviruses, encode two large subunits of the teins of this family likely have different origins: one is DNA-dependent RNA polymerase (RNAP) that are also bona fide NCLDV protein whereas another was acquired universally conserved in all cellular life forms. It has from a host. been reported that in phylogenetic trees the NCLDV Another family of helicases implicated in transcription RNAP subunits come across as polyphyletic [28,29], includes homologs of Vaccinia D6 and D11 which are Yutin and Koonin Virology Journal 2012, 9:161 Page 10 of 18 http://www.virologyj.com/content/9/1/161

ABRNA polymerase alpha subunit RNA polymerase beta subunit 79 Eukaryotes II 81 91 Eukaryotes II 100 El Hydma221110129 98 93 Mimivirus 100 El Hydma221110135 env144053945 Mimivirus 84 52 100 Asco/Iridoviridae 100 Asco/Iridoviridae 57 65 58 q2 Emihu73852908 92 Marseillevirus 94 82 Marseillevirus q2 Emihu73852534 100 Bacteria 100 100 71 Eukaryotes III 77 100 92 Archaea Archaea 100 100 Eukaryotes III 69 Poxviridae 100 100 Bacteria 53 Poxviridae 100 100 Eukaryotes I Eukaryotic I 74 c1 Afrsw9628206 c1 Afrsw9628161

0.2 0.1 CDRNA polymerase alpha subunit, Transcription factor TFIIB constrained 79 q1 Acatu155371663 100 q1 Parbu155370233 84 Eukaryotes II q1 ParNY157952458 99 El Hydma221110135 c1 Afrsw9628176 100 env135257836 Mimivirus 87 73 Marseillevirus 76 68 83 Asco/Iridoviridae 100 env143819106 q6 Ostvi163955150 100 Poxviridae 100 86 environmental sequences 92 Marseillevirus 100 53 Mimivirus q2 Emihu73852534 90 79 67 environmental sequences c1 Afrsw9628206 100 98 environmental sequences 100 Eukaryotes I 94 Eukaryotes 100 Eukaryotes III 85 51 env143237987 Archaea Bb Psyto91218585 74 100 Bacteria 100 74 Archaea

0.2 0.2 Figure 3 Phylogenetic trees of ancestral NCLDV genes involved in transcription. (A). RNA polymerase alpha subunit. (B). RNA polymerase beta subunit. (C). RNA polymerase alpha subunit, constrained tree with an NCLDV clade excluding mimivirus and ASFV. (D). Transcription factor TFIIB. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Acatu, Acanthocystis turfacea Chlorella virus 1; Afrsw, African swine fever virus; Emihu, Emiliania huxleyi virus 86; Hydma, Hydra magnipapillata; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus FR483; ParNY, Paramecium bursaria Chlorella virus NY2A; Psyto, Psychroflexus torquis ATCC 700755; Bb, Bacteroidetes; El, Opisthokonta; c1, Asfarviridae; q1, Chlorovirus; q2, Coccolithovirus; q6, unclassified Phycodnaviridae. present in ASFV and in mimivirus. Thus, the presence exclusion of the single representative in iridoviruses that of an ancestral gene of this family in the NCLDV ances- appears to be an independent acquisition of a eukaryotic tor implies losses in several groups of viruses. Phylogen- homolog subsequent to the loss of the ancestral NCLDV etic analysis of this gene strongly supports the gene in iridoviruses (Figure 4C,D). monophyly of the NCLDV (Figure 4B). The RNA ligase is an RNA processing enzyme [31] The capping enzyme of the NCLDV is a complex, that is represented in by conserved orthologs irido- three-domain protein. The N-terminal phosphatase do- viruses, ascoviruses, and Marseillevirus and by a more main is too divergent for phylogenetic analysis. Phylogen- distant homolog in ASFV; the precise role of this en- etic analysis of the other two domains, guanylyltransferase zyme in the NCLDV reproduction remains unclear. In and methyltransferase, identifies an NCLDV clade, to the the phylogenetic tree, the NCLDV ligases of the NCLDV Yutin and Koonin Virology Journal 2012, 9:161 Page 11 of 18 http://www.virologyj.com/content/9/1/161

A A18-like helicase B D6/D11-like helicase

81 env134898567 100 83 env143049744 93 Poxviridae 91 env136380008 Mimivirus Mimivirus 99 96 98 env136133804 c1 Afrsw9628180 100 env144077465 Mimivirus q6 Ostvi163955040 74 Eukaryotes q3 Ectsi13242538 93 93 52 100 96 q1 Acatu155371590 53 Bacteria q1 Parbu9631722 100 Poxviridae 100 Bacteria 81 Iridoviridae 96 Eukaryotes 75 Marseillevirus c1 Afrsw9628229 100 environmental sequences 0.2 99 62 Ek Leiin146094415 100 c1 Afrsw9628148 Ek Trycr71407144 D Capping enzyme 94 77 El Phano169616097 65 Ec Plavi156081817 (methyltransferase) 82 Ea Entdi167392006 100 53 q2 Emihu73852574 58 env135954485 98 95 Bacteria 100 env136245388 97 71 Archaea 83 90 Mimivirus Bacteria 100 96 Poxviridae Bacteria 66 Marseillevirus 96 Bacteria 67 58 env136185156 99 env136595843 0.2 c1 Afrsw9628208 99 q6 Ostvi163955066 C Capping enzyme 100 env142418996 (guanylyltransferase) 80 env142908014 60 E9 Phypa168007236 env136139714 53 env135440874 93 100 c1 Afrsw9628208 Eukaryotes 68 env135039429 Mimivirus 67 76 Marseillevirus 0.2 83 environmental sequences 99 Poxviridae 100 Ea Dicdi66805201 Ea Entdi167392533 E RNA ligase 96 El Enccu85014195 91 El Entbi169806196 q2 Emihu73852927 54 74 env136191089 74 Asco/Iridoviridae 84 q6 Ostvi290343628 91 82 environmental sequences 53 env143320606 Marseillevirus 99 env136743363 70 100 q1 Acatu155371666 q1 Parbu9631672 environmental sequences l4 Infsp19881469 96 El Caeel71980521 62 env144204233 54 El Bruma170577180 c1 Afrsw9628169 El Anoga158292641 90 71 Bacteria/phages 100 El Trica91083171 El Strpu72005733 98 99 Baculoviridae 92 Eukaryotes 96 Eukaryotes b1 Trini116326771 Eukaryotes 54 El Monbr167535971 89 env144139321 99 Eukaryotes env140405543 100 Eukaryotes

0.2 0.2 Figure 4 (See legend on next page.) Yutin and Koonin Virology Journal 2012, 9:161 Page 12 of 18 http://www.virologyj.com/content/9/1/161

(See figure on previous page.) Figure 4 Phylogenetic trees of ancestral NCLDV genes involved in mRNA processing/maturation. (A). Superfamily 2 helicases homologous to Vaccinia A18. (B). Superfamily 2 helicases homologous to Vaccinia D6 and D11. (C). Capping enzyme (guanylyltransferase domain). (D). Capping enzyme (methyltransferase domain). (E). RNA ligase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Phypa, Physcomitrella patens subsp. patens; Dicdi, Dictyostelium discoideum AX4; Entdi, Entamoeba dispar SAW760; Plavi, Plasmodium vivax SaI-1; Leiin, Leishmania infantum JPCM5; Trycr, Trypanosoma cruzi strain CL Brener; Anoga, Anopheles gambiae str. PEST; Bruma, Brugia malayi; Caeel, Caenorhabditis elegans; Enccu, Encephalitozoon cuniculi GB-M1; Entbi, Enterocytozoon bieneusi H348; Monbr, Monosiga brevicollis MX1; Phano, Phaeosphaeria nodorum SN15; Strpu, Strongylocentrotus purpuratus; Trica, Tribolium castaneum; Trini, Trichoplusia ni ascovirus 2c; Afrsw, African swine fever virus; Infsp, Infectious spleen and kidney necrosis virus; Acatu, Acanthocystis turfacea Chlorella virus 1; Parbu, Paramecium bursaria Chlorella virus FR483; Emihu, Emiliania huxleyi virus 86; Ectsi, Ectocarpus siliculosus virus 1; Ostvi, Ostreococcus virus OsV5; E9, Viridiplantae; Ea, Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; c1, Asfarviridae; l4, Megalocytivirus; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus; q6, unclassified Phycodnaviridae. form a well-supported clade (along with many uncharac- Thymidine kinase (TK) is another major enzyme of terized environmental sequences), with the exception of dNTP biosynthesis that is present in a large subset of the Trichoplusia ni ascovirus which belonged to a clade the NCLDV. Phylogenetic analysis reveals three distinct with baculovirus and diverse bacteriophages (Figure 4E). clusters of viral TKs (Figure 6A); a constrained tree con- The constrained tree with an NCLDV clade was confi- taining an NCLDV clade could not be statistically dently statistically rejected (Additional file 2) indicating rejected but nevertheless had a lower likelihood than the that in the Trichoplusia ni ascovirus the RNA ligase original tree (Additional file 2). Qualitatively similar genes was displaced by a homolog from a baculovirus. results were obtained for Thymidylate kinase (TMPK), the Finally, the previously investigated case of the small, second enzyme of thymidylate biosynthesis (Figure 6B). regulatory subunit of polyA polymerase is unusual in ThedUTPase,anenzymethatfunctionsattheinterfaceof that this gene is present only in poxviruses which also nucleotide metabolism and repair, is present in poxviruses, encode the large, catalytic subunit, and in a single mimi- iridoviruses and many phycodnaviruses, but showed clear virus strain which lacks the catalytic subunit. Despite signs of polyphyletic origin, including statistically solid re- this sparse representation, the NCLDV small subunits jection of NCLDV monophyly (Figure 6C, Additional seem to form a strongly supported clade, suggestive of file 2). Similar results have been reported from a pre- the possibility that this gene was present in the ancestral vious phylogenetic analysis for some of the genes en- NCLDV [15]. coding enzymes of nucleotide metabolism [19]. The majority of the sequences of putative ancestral virion proteins and proteins involved in virus mor- Genes for enzymes of nucleotide metabolism phogenesis and virus-host interactions were too diver- Most of the NCLDV encode varying sets of enzymes gent to construct reliable phylogenetic trees. The only involved in metabolism of deoxyribonucleotides. These exception was the disulfide isomerase enzyme for genes are not strictly essential for virus reproduction, which monophyly of the NCLDV (including numer- given that knockouts are typically viable in cell cultures, ous environmental sequences that presumably belong but tend to be important in vivo. The most common en- to uncharacterized viruses) could be demonstrated zyme in this group is ribonucleotide reductase (RR) (Figure 7). which consists of a large and a small subunits encoded by two distinct genes. The phylogenetic trees of both the Discussion large and the small subunits show complex topologies Evolutionary reconstruction using patterns of gene that are incompatible with monophyly of the NCLDV presence-absence in viral genomes has led to the conclu- (Figure 5A,B). Moreover, for both subunits, poxviruses sion that the NCLDV represent a monophyletic group of and iridoviruses show clear, distinct affinities, with viruses that evolved from a common ancestor which was eukaryotic and bacteriophage homologs, respectively. a virus with genomic complexity comparable to that of The provenance of the RR in the other NCLDV is less the extant NCLDV. Approximately 50 genes that are clear. On the whole, it seems to be a safe conclusion that conserved in different subsets of the NCLDV have been the RR of the NCLDV are not monophyletic but rather assigned to the ancestral virus genome (Table 1), and it have evolved in a complex evolutionary scenario that appears likely that the ancestral virus additionally involved multiple acquisitions, losses and displacement. encompassed many lineage-specific genes. In this work Therefore, the results of the ML reconstruction notwith- we went beyond the phyletic patterns by analyzing phy- standing, it is difficult to ascertain whether the last com- logenies of the inferred ancestral NCLDV genes along mon ancestor of the NCLDV encoded RR. with their homologs from cellular organisms. In almost Yutin and Koonin Virology Journal 2012, 9:161 Page 13 of 18 http://www.virologyj.com/content/9/1/161

A Ribonucleotide reductase, large B Ribonucleotide reductase, small subunit subunit

Bp Vibha269963258 71 100 q3 Felsp197322443 55 q3 Ectsi13242650 98 q3 Ectsi13242599 65 El Lodel149238952 env135287498 92 E8 Thaps224001624 89 86 env144060058 Eukaryotes 87 79 100 q6 Ostvi163955126 Eukaryotes env144051596 91 Ea Dicdi66822029 100 Ek Leibr154340341 Ec Parte145537171 Ek Trybr71755889 52 q2 Emihu73852496 env144142045 51 84 57 Ek Leibr154337607 86 Ec Thepa71027825 95 82 Ec Parte145497945 Ek Trybr74025548 Ec Crymu209878422 Bacteria 96 92 9759 Poxviridae 78 env144044197 83 El Acypi193608391 71 env143932091 100 env143433140 61 100 q6 Ostvi163955147 93 Mimivirus 96 env142408079 El Ustma71020483 91 env142452833 59 E8 Phatr219110871 98 89 98 q3 Felsp197322445 env136160125 q2 Emihu73852902 54 100 q1 Parbu9632043 100 69 Baculoviridae q1 Acatu155370982 78 85 61 z3 Helze22788802 environmental sequences 84 97 vi Musdo187903102 Mimivirus 97 Marseillevirus 99 Eukaryotes k1 Cyphe131840167 85 Baculoviridae 85 k1 Anghe282174153 95 o1 Shrwh17158276 72 Eukaryotes 61 8447 88 98 q1 Acatu155371048 76 Eukaryotes 100 q1 ParFR155370864 84 98 Poxviridae q1 Parbu9632159 66 c1 Afrsw9628153 u1 Canvi40556174 100 Bp Frano118497573 75 z3 Helze22788780 env136094176 93 56 l4 Infsp19881429 78 env136607526 80 k1 Anghe282174133 env142360592 env135039280 100 92 env142151739 Bp Phoda269105083 100 l3 Lymch48843724 61 c1 Afrsw9628152 100 l3 Lymdi13358415 87 100 l5 Ambti45686076 Marseillevirus 100 l5 Frovi49237335 100 l5 Ambti45686047 l5 Singr56692701 100 l5 Frovi49237365 env143333574 100 96 l5 Singr56692684 l1 Aedta109287943 76 100 l3 Lymdi13358428 99 98 l2 Invir15078798 l3 Lymdi51869996 env142551493 100 100 Bp Frano118497575 90 env143288148 env144017272 97 l1 Invir109287926 99 Bacteria 94 l2 Invir15079087 67 93 Bacteria Bacteria 0.1

0.1 Figure 5 Phylogenetic trees of the ancestral NCLDV genes for ribonucleotide reductase subunits. (A). Ribonucleotide reductase, large subunit. (B). Ribonucleotide reductase, small subunit. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Acatu, Acanthocystis turfacea Chlorella virus 1; Acypi, Acyrthosiphon pisum; Aedta, Invertebrate iridescent virus 3; Afrsw, African swine fever virus; Ambti, Ambystoma tigrinum virus; Anghe, Anguillid herpesvirus 1; Canvi, Canarypox virus; Crymu, Cryptosporidium muris RN66; Cyphe, Cyprinid herpesvirus 3; Dicdi, Dictyostelium discoideum AX4; Ectsi, Ectocarpus siliculosus virus 1; Emihu, Emiliania huxleyi virus 86; Felsp, Feldmannia species virus; Frano, Francisella novicida U112; Frovi, Frog virus 3; Helze, Heliothis zea virus 1; Infsp, Infectious spleen and kidney necrosis virus; l1_Invir, Invertebrate iridescent virus 3; l2_Invir, Invertebrate iridescent virus 6; Leibr, Leishmania braziliensis MHOM/BR/75/M2904; Lodel, Lodderomyces elongisporus NRRL YB-4239; Lymch, Lymphocystis disease virus - isolate China; Lymdi, Lymphocystis disease virus 1; Musdo, Musca domestica salivary gland hypertrophy virus; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus NY2A; ParFR, Paramecium bursaria Chlorella virus FR483; Parte, Paramecium tetraurelia strain d4-2; Phatr, Phaeodactylum tricornutum CCAP 1055/1; Phoda, Photobacterium damselae subsp. damselae CIP 102761; Shrwh, Shrimp virus; Singr, Singapore grouper iridovirus; Thaps, Thalassiosira pseudonana CCMP1335; Thepa, Theileria parva strain Muguga; Trybr, Trypanosoma brucei; Ustma, Ustilago maydis 521; Vibha, Vibrio harveyi 1DA3; Bp, Proteobacteria; E8, Stramenopiles; Ea, Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; c1, Asfarviridae; k1, ; l1, Chloriridovirus; l2, Iridovirus; l3, Lymphocystivirus; l4, Megalocytivirus; l5, Ranavirus; o1, Nimaviridae; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; vi, Nudivirus; z3, unclassified dsDNA viruses. Yutin and Koonin Virology Journal 2012, 9:161 Page 14 of 18 http://www.virologyj.com/content/9/1/161

A Thymidine kinase B Thymidylate kinase 60 80 Eukaryotes 76 98 Poxviridae 87 El Lacth255712129 80 El Cryne58265708 52 60 Eukaryotes l2 Invir15078963 72 100 Eukaryotes 72 55 Ec Tetth118351574 100 Poxviridae 70 El Brafl260796149 91 q6 Ostvi163955199 97 u1 Canvi40556108 88 env136422265 El Homsa42544174 95 env142443153 98 E9 Arath145334853 69 83 env144065395 87 64 83 Ea Dicdi66804303 env143975708 E9 Ostlu145348882 82 79 env144021532 100 u1 Ectvi22164753 env142459479 51 84 u1 Varvi9627681 env136229246 u1 Vacvi66275971 Marseillevirus 71 54 Bp Psesy289676128 97 env142470004 Bb Chipi256419682 env136062509 62 97 100 Bf Desha89894606 100 Eukaryotes (plants) 50 Bf Clobo226947280 95 c1 Afrsw9628158 93 q2 Emihu73852905 o1 Shrwh17158497 51 84 Ek Trycr71423920 Mimivirus Ek Leima157865253 57 u2 Amsmo9964330 98 c1 Afrsw9628143 Bacteria El Caebr268573866 Asco/Iridoviridae 72 0.1 Phycodnaviridae environmental sequences C dUTPase 100 environmental sequences env142414951 81 74 95 E9 Phypa167999518 81 k1 Icthe9626826 51 Ea Dicdi66800487 l4 Infsp19881437 vi Musdo187903106 97 El Apime48097825 94 l2 Invir15079149 97 Ec Parte145528967 71 El Nasvi156541544 54 Ec Tetth118389890 77 El Cioin198419421 55 u1 Canvi40556137 61 99 74 l5 Singr56692686 u1 Fowvi9634729 72 85 El Homsa70906444 u1 Fowvi9634821 env138940489 a2 Fowad9628836 82 93 u1 Canvi40556151 94 86 a2 Fowad9633174 86 Eukaryotes 100 l5 Ambti45686051 68 67 l5 Frovi49237361 Bacteria/phages 70 d2 Spoli148368872 Eukaryotes z3 Helze22788776 70 75 u2 Amsmo9964316 67 0.2 88 Poxviruses

85 El Caeel71988561 d2 Agrse46309437 74 Ea Enthi67470091 env138142700 env143826648 68 env142423093 72 env143987160 96 69 q6 Ostta290343671 99 env143982771 env134673310 100 Bacteria Bp Rhosp146280038 88 env143370438 53 q2 Emihu73852871 99 q1 Acatu155371447 85 q1 Parbu157953045 86 Eukaryotes

0.1 Figure 6 (See legend on next page.) Yutin and Koonin Virology Journal 2012, 9:161 Page 15 of 18 http://www.virologyj.com/content/9/1/161

(See figure on previous page.) Figure 6 Phylogenetic trees of ancestral NCLDV genes encoding enzymes of nucleotide metabolism. (A). Thymidine kinase. (B). Thymidylate kinase. (C). dUTPase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Acatu, Acanthocystis turfacea Chlorella virus 1; Afrsw, African swine fever virus; Agrse, Agrotis segetum granulovirus; Ambti, Ambystoma tigrinum virus; Amsmo, Amsacta moorei entomopoxvirus 'L'; Apime, Apis mellifera; Arath, Arabidopsis thaliana; Brafl, Branchiostoma floridae; Caebr, Caenorhabditis briggsae; Caeel, Caenorhabditis elegans; Canvi, Canarypox virus; Chipi, Chitinophaga pinensis DSM 2588; Cioin, Ciona intestinalis; Clobo, Clostridium botulinum A2 str. Kyoto; Cryne, Cryptococcus neoformans var. neoformans JEC21; Desha, Desulfitobacterium hafniense Y51; Dicdi, Dictyostelium discoideum AX4; Ectvi, Ectromelia virus; Emihu, Emiliania huxleyi virus 86; Enthi, Entamoeba histolytica HM-1:IMSS; Fowad, Fowl adenovirus A; Fowvi, Fowlpox virus; Frovi, Frog virus 3; Helze, Heliothis zea virus 1; Homsa, Homo sapiens; Icthe, Ictalurid herpesvirus 1; Infsp, Infectious spleen and kidney necrosis virus; l2_Invir, Invertebrate iridescent virus 6; Lacth, Lachancea thermotolerans; Leima, Leishmania major; Musdo, Musca domestica salivary gland hypertrophy virus; Nasvi, Nasonia vitripennis; Ostlu, Ostreococcus lucimarinus CCE9901; Ostta, Ostreococcus tauri virus 1; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus NY2A; Parte, Paramecium tetraurelia strain d4-2; Phypa, Physcomitrella patens subsp. patens; Psesy, Pseudomonas syringae pv. syringae FF5; Rhosp, Rhodobacter sphaeroides ATCC 17025; Shrwh, Shrimp white spot syndrome virus; Singr, Singapore grouper iridovirus; Spoli, Spodoptera litura granulovirus; Tetth, Tetrahymena thermophila; Trybr, Trypanosoma brucei; Vacvi, Vaccinia virus; Varvi, Variola virus; Bb, Bacteroidetes; Bf, Firmicutes; Bp, Proteobacteria; E9, Viridiplantae; Ea, Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; a2, ; c1, Asfarviridae; d2, Baculoviridae; k1, Herpesvirales; l2, Iridovirus; l4, Megalocytivirus; l5, Ranavirus; o1, Nimaviridae; q1, Chlorovirus; q2, Coccolithovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; u2, Entomopoxvirinae; vi, Nudivirus; z3, unclassified dsDNA viruses. all cases where the information content of the respective Thus, the present analysis reveals layers of hidden multiple sequence alignments was sufficient for phylo- complexity in the history of the conserved gene core of genetic analysis, deviations from the simple pattern of the NCLDV that are not apparent from the analysis of vertical evolution were observed (summarized in patterns of gene presence-absence alone [5]. Although Table 1). Strikingly, phylogenetic trees for most of the deviations from simple vertical evolution probably oc- conserved NCLDV genes failed to show an NCLDV curred in the history of almost all core genes, the clade although it was not always possible to reject the results do not invalidate the conclusion on the evolu- monophyly of the NCLDV at a statistically significant tion of all known NCLDV from a single ancestral virus. level. The high prevalence of gene loss and xenologous gene The results of phylogenetic analysis of viral genes have displacement notwithstanding, for most of the core to be interpreted with caution given the generally fast genes, these events affected the evolution of only a few evolution of viral genomes and the ensuing possibility of lineages, consistent with the origin of all NCLDV from artifacts such as long branch attraction. Therefore, we a common ancestor, followed by isolated events compli- considered the monophyly of the NCLDV to be the most cating the evolutionary scenarios. However, for several appropriate null hypothesis and generally made conclu- genes that have been included in the core set on the sions on more complex evolutionary scenarios only basis of gene presence-absence patterns, such as the when this hypothesis could be rejected using conserva- enzymes of DNA precursor metabolism, multiple tive statistical tests of tree topology. Nevertheless, even sources are apparent, so that the ancestral status of with this conservative approach, phylogenomic analysis these genes becomes uncertain. suggests that evolution of the core NCLDV genes, with Phylogenetic analysis of the core NCDLV genes reveals only a few likely exceptions (DNA polymerase, disulfide multiple affinities with genes from eukaryotes, bacteria isomerase and several other genes; see Table 1), included and bacteriophages. The acquisition of genes from not only multiple gene losses that were apparent already eukaryotic hosts (that might not be the same for ances- from the examination of the phyletic patterns, but also tral and extant viruses) is not surprising. However, gene multiple cases of xenologous gene displacement [25], i.e. transfer to NCLDV, in particular those infecting unicel- displacement of the ancestral gene by a homologous lular eukaryotes, from bacteria and bacteriophages is (and functionally analogous) gene from a different plausible as well given that diverse parasites and sym- source such as bacteriophage or eukaryote. On some bionts often coexist within the same eukaryotic host. In- occasions, when a gene is missing in a large group of deed, in , with their large cells and phagocytic viruses (such as phycodnaviruses) except for a minority life style, such coexistence is the norm, making these of members of that group, in which it is phylogenetically organisms veritable ‘melting pots’ of virus evolution distinct from homologs in other NCLDV, the sequence [7,32]. of events leading to displacement can be inferred as loss The general trend seems to be that bona fide essential followed by regain. In other cases, such as RR and TK, genes, are rarely displaced because for these, intermedi- the course of evolution seems to have been too complex ates lacking the gene are most like non-viable, so if to reconstruct the specific scenario. intermediate forms existed, they should have encoded Yutin and Koonin Virology Journal 2012, 9:161 Page 16 of 18 http://www.virologyj.com/content/9/1/161

lineages seems to be the rule rather than an exception. Protein disulfide isomerase This pattern can be linked to the viability of evolutionary intermediates lacking the respective genes, at least in the 95 Marseillevirus short term. 60 env136543793 91 In addition to cases of xenologous gene displacement, 81 Iridoviridae environmental sequences phylogenies of a few core genes point to evolutionary env140203035 81 links with other large DNA viruses, such as herpes- env136808647 viruses and baculoviruses, as well as bacteriophages. 72 55 q1 ParAR157953760 env136606018 These observations are compatible with the virus world env136840138 concept under which viruses are linked through complex 84 68 Mimivirus networks of evolutionary connections at the level of in- 87 55 environmental sequences dividual genes and in some cases gene modules, and are 67 Mimivirus also involved in extensive gene exchange with cellular env136256796 72 life forms. environmental sequences 99 69 q3 Felsp197322406 67 env141253368 Methods env134921190 88 The NCLDV protein sequences were extracted from the 75 q3 Ectsi13242631 env134459082 RefSeq database (NCBI, NIH, Bethesda) [34]. The most 96 environmental sequences recently sequenced NCLDV genomes, namely the Cafe- 59 env139731803 teria roenbergensis virus [35], the megavirus [16], both of 99 74 l2 Invir15079059 the Mimiviridae family, and Lausannevirus [8], a close l1 Aedta109287974 relative of Marseille virus, were not included. For each 78 94 Ascoviridae 55 cluster of orthologous NCLDV genes (NCVOG) that has 65 c1 Afrsw9628181 93 environmental sequences been mapped to the last common ancestor of the 60 100 Poxviridae NCLDV [5], the following procedure was applied. A rep- 80 q2 Emihu73852597 resentative NCVOG sequence set was constructed by env141323596 clustering the complete collection of the respective Eukaryotes protein sequences using the Blastclust program (ftp://ftp. ncbi.nih.gov/blast/documents/blastclust.html) and select- 0.2 ing a representative from each cluster of closely related Figure 7 Phylogenetic tree of an ancestral NCLDV gene sequences). Two BLASTP runs, one against the Refseq encoding an enzyme involved in virion morphogenesis: protein database and the other one against the environmental disulfide isomerase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation (env_nr) database at the NCBI, were performed for and the gene identification numbers are indicated; env stands for each representative sequence with the e-value cutoff of sequences retrieved from env_nr database. Species abbreviations: 0.1. This liberal cutoff was used in order to incorpor- Afrsw, African swine fever virus; Aedta, Invertebrate iridescent virus 3; ate highly diverged homologs. To eliminate potential Invir, Invertebrate iridescent virus; ParAR, Paramecium bursaria false positives, all alignments were examined case by Chlorella virus AR158; Emihu, Emiliania huxleyi virus 86; Ectsi, Ectocarpus siliculosus virus 1; Felsp, Feldmannia species virus; c1, case for the conservation of domain architecture and Asfarviridae; l1, Chloriridovirus; l2, Iridovirus; q1, Chlorovirus; q2, presence of diagnostic motifs. All the sequences from Coccolithovirus; q3, Phaeovirus. the given NCVOG, the top 20–30 Refseq hits from each domain of the NCBI Taxonomy (Eukaryota, Bac- teria, Archaea, and Viruses), and top 10 environmental two forms of the respective genes, with the original gene hits for each query were combined, and nearly identi- subsequently lost and the xenologous gene retained. cal sequences were eliminated using Blastclust. The This evolutionary scenario might be rare due to the con- resulting sequences were aligned using MUSCLE [36]; straints imposed by the requirements for the formation gapped columns (more than 30% of gaps) and columns of multisubunit complexes (under the complexity hy- with low information content were removed from the pothesis of Lake and colleagues [33]), e.g. the replisome. alignment [37]. A preliminary tree was constructed However, the clear-cut xenologous displacement of the using PhyML [38], with the following parameters: primase-helicase gene in phycodnaviruses shows that WAG substitution matrix; four relative substitution these obstacles are not insurmountable. In contrast, for rate categories; the fraction of invariable sites and genes that are not strictly essential but are beneficial in the alpha parameter of the gamma distribution of most virus-host systems, such as the precursor biosyn- site-specific evolution rates) were automatically thesis enzymes, parallel loss and regain in multiple selected by PhyML. From this preliminary tree, a Yutin and Koonin Virology Journal 2012, 9:161 Page 17 of 18 http://www.virologyj.com/content/9/1/161

subset of sequences representing each clade were 9. Moss B: Poxviridae: the viruses and their replication.InFields Virology, selected and re-aligned. In the next step, 8 PhyML Volume 2. Edited by Knipe DM, Howley PM. Philadelphia: Lippincott Williams & Wilkins; 2007:2905–2946. runs were performed for each alignment, with 8 sub- 10. Claverie JM, Abergel C, Ogata H: Mimivirus. Curr Top Microbiol Immunol stitution models (Blosum62, Dayhoff, JTT, DCMut, 2009, 328:89–121. 11. Raoult D, Boyer M: Amoebae as genitors and reservoirs of giant viruses. RtREV, CpREV, VT, and WAG). The best topology Intervirology 2010, 53(5):321–329. was chosen by the maximum log likelihood among 12. Colson P, Raoult D: Gene repertoire of amoeba-associated giant viruses. the 8 trees. Bootstrap branch supports (expressed in Intervirology 2010, 53(5):330–343. 13. Van Etten JL, Lane LC, Dunigan DD: DNA viruses: the really big ones Expected-Likelihood Weights, ELW) were calculated (giruses). Annu Rev Microbiol 2010, 64:83–99. using TreeFinder [39] for the best PhyML tree using 14. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, the best matrix. For detailed phylogenetic analysis, Suzan M, Claverie JM: The 1.2-megabase genome sequence of Mimivirus. Science 2004, 306(5700):1344–1350. whenever appropriate, alternative (constrained) top- 15. Colson P, Yutin N, Shabalina SA, Robert C, Fournous G, La Scola B, Raoult D, ology ML trees were constructed using TreeFinder Koonin EV: Viruses with more than 1,000 genes: , a new Acanthamoeba polyphaga mimivirus strain, and reannotation of [39], with the substitution model found to be the Mimivirus genes. Genome Biol Evol 2011, 3:737–742. best for a given alignment in the first-round analysis. 16. Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM: Distant Mimivirus Tree topologies were compared with TreeFinder relative with a larger genome highlights the fundamental features of Megaviridae. Proc Natl Acad Sci U S A 2011, 108(42):17486–17491. using the approximately unbiased (AU) test P value 17. Van Etten JL: Unusual life style of giant chlorella viruses. Annu Rev Genet [40]. 2003, 37:153–195. 18. Yutin N, Koonin EV: EvolutionofDNAligasesofnucleo-cytoplasmiclarge DNA viruses of eukaryotes: a case of hidden complexity. Biol Direct 2009, 4:51. Additional files 19. Filee J, Pouget N, Chandler M: Phylogenetic evidence for extensive lateral acquisition of cellular genes by Nucleocytoplasmic large DNA viruses. BMC Evol Biol 2008, 8:320. Additional file 1: The phyletic patterns of absence-presence of the 20. Weynberg KD, Allen MJ, Ashelford K, Scanlan DJ, Wilson WH: From small NCLDV core genes in the sequenced NCLDV genomes. hosts come big viruses: the complete genome of a second Ostreococcus Additional file 2: Results of statistical analysis of constrained trees tauri virus, OtV-1. Environ Microbiol 2009, 11(11):2821–2839. for the core NCLDV genes. 21. Filee J, Forterre P, Laurent J: The role played by viruses in the evolution of Additional file 3: Phylogenetic trees for the DNA ligases of the their hosts: a view based on informational protein phylogenies. Res NCLDV. A. ATP-dependent ligases. B. NAD-dependent ligases. Microbiol 2003, 154(4):237–243. 22. Monier A, Claverie JM, Ogata H: Taxonomic distribution of large DNA viruses in the sea. Genome Biol 2008, 9(7):R106. Competing interests 23. Ogata H, Toyoda K, Tomaru Y, Nakayama N, Shirai Y, Claverie JM, Nagasaki K: The authors declare that they have no competing interests. Remarkable sequence similarity between the dinoflagellate-infecting marine girus and the terrestrial pathogen African swine fever virus. Virol J 2009, 6:178. Authors’ contributions 24. Felsenstein J: Inferring Phylogenies. Sunderland, MA: Sinauer Associates; 2004. EVK initiated and designed the study; NY collected and analyzed data; EVK 25. Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in wrote the manuscript that was read, edited and approved by both authors. prokaryotes: quantification and classification. Annu Rev Microbiol 2001, 55:709–742. Acknowledgements 26. Senkevich TG, Koonin EV, Moss B: Predicted poxvirus FEN1-like nuclease The authors’ research is supported by the DHHS intramural program (NIH, required for homologous recombination, double-strand break repair and national Library of Medicine). full-size genome formation. Proc Natl Acad Sci U S A 2009, 106(42):17921–17926. Received: 10 February 2012 Accepted: 8 August 2012 27. Everly DN Jr, Feng P, Mian IS, Read GS: mRNA degradation by the virion Published: 14 August 2012 host shutoff (Vhs) protein of herpes simplex virus: genetic and biochemical evidence that Vhs is a nuclease. J Virol 2002, References 76(17):8560–8571. 1. Forterre P: The origin of viruses and their possible roles in major 28. Sonntag KC, Darai G: Evolution of viral DNA-dependent RNA polymerases. evolutionary transitions. Virus Res 2006, 117(1):5–16. Virus Genes 1995, 11(2–3):271–284. 2. Koonin EV, Senkevich TG, Dolja VV: The ancient virus world and evolution 29. Lane WJ, Darst SA: Molecular evolution of multisubunit RNA polymerases: of cells. Biol Direct 2006, 1(1):29. sequence analysis. J Mol Biol 2010, 395(4):671–685. 3. Iyer LM, Aravind L, Koonin EV: Common origin of four diverse families of 30. Lackner CA, Condit RC: Vaccinia virus gene A18R DNA helicase is a large eukaryotic DNA viruses. J Virol 2001, 75(23):11720–11734. transcript release factor. J Biol Chem 2000, 275(2):1485–1494. 4. Iyer LM, Balaji S, Koonin EV, Aravind L: Evolutionary genomics of 31. Pascal JM: DNA and RNA ligases: structural variations and shared nucleo-cytoplasmic large DNA viruses. Virus Res 2006, 117(1):156–184. mechanisms. Curr Opin Struct Biol 2008, 18(1):96–105. 5. Yutin N, Wolf YI, Raoult D, Koonin EV: Eukaryotic large nucleo-cytoplasmic 32. Moliner C, Fournier PE, Raoult D: Genome analysis of microorganisms DNA viruses: clusters of orthologous genes and reconstruction of viral living in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev genome evolution. Virol J 2009, 6:223. 2010, 34:281–294. 6. Koonin EV, Yutin N: Origin and evolution of eukaryotic large 33. Jain R, Rivera MC, Lake JA: Horizontal gene transfer among genomes: the nucleo-cytoplasmic DNA viruses. Intervirology 2010, 53(5):284–292. complexity hypothesis. Proc Natl Acad Sci U S A 1999, 96(7):3801–3806. 7. Boyer M, Yutin N, Pagnier I, Barrassi L, Fournous G, Espinosa L, Robert C, 34. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Azza S, Sun S, Rossmann MG, et al: Giant Marseillevirus highlights the role Chetvernin V, Church DM, Dicuccio M, Federhen S, et al: Database of amoebae as a melting pot in emergence of chimeric microorganisms. resources of the National Center for Biotechnology Information. Nucleic Proc Natl Acad Sci U S A 2009, 106(51):21848–21853. Acids Res 2012, 40(Database issue):D13–D25. 8. Thomas V, Bertelli C, Collyn F, Casson N, Telenti A, Goesmann A, Croxatto A, 35. Fischer MG, Allen MJ, Wilson WH, Suttle CA: Giant virus with a remarkable Greub G: Lausannevirus, a giant amoebal virus encoding histone complement of genes infects marine zooplankton. Proc Natl Acad Sci doublets. Environ Microbiol 2011, 13(6):1454–1466. USA2010, 107(45):19508–19513. Yutin and Koonin Virology Journal 2012, 9:161 Page 18 of 18 http://www.virologyj.com/content/9/1/161

36. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792–1797. 37. Yutin N, Makarova KS, Mekhedov SL, Wolf YI, Koonin EV: The deep archaeal roots of eukaryotes. Mol Biol Evol 2008, 25(8):1619–1630. 38. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696–704. 39. Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol 2004, 4:18. 40. Shimodaira H: An approximately unbiased test of phylogenetic tree selection. Syst Biol 2002, 51(3):492–508.

doi:10.1186/1743-422X-9-161 Cite this article as: Yutin and Koonin: Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes. Virology Journal 2012 9:161.

Submit your next manuscript to BioMed Central and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit