Evolution of double-stranded DNA of : from to transposons to giant viruses Eugene V. Koonin, Mart Krupovic, Natalya Yutin

To cite this version:

Eugene V. Koonin, Mart Krupovic, Natalya Yutin. Evolution of double-stranded DNA viruses of eukaryotes: from bacteriophages to transposons to giant viruses. Annals of the New York Academy of Sciences, Wiley, 2015, DNA Habitats and Their RNA Inhabitants, 1341 (1), pp.10-24. ￿10.1111/nyas.12728￿. ￿pasteur-01977390￿

HAL Id: pasteur-01977390 https://hal-pasteur.archives-ouvertes.fr/pasteur-01977390 Submitted on 10 Jan 2019

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial| 4.0 International License Ann. N.Y. Acad. Sci. ISSN 0077-8923

ANNALS OF THE NEW YORK ACADEMY OF SCIENCES Issue: DNA Habitats and Their RNA Inhabitants

Evolution of double-stranded DNA viruses of eukaryotes: from bacteriophages to transposons to giant viruses

Eugene V. Koonin,1 Mart Krupovic,2 and Natalya Yutin1 1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland. 2Institut Pasteur, Unite´ Biologie Moleculaire´ du ` chez les Extremophiles,ˆ Paris, France

Address for correspondence: Eugene V. Koonin, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894. [email protected]

Diverse eukaryotes including and are hosts to a broad variety of viruses with double-stranded (ds) DNA , from the largest known viruses, such as and , to tiny polyomaviruses. Recent comparative genomic analyses have revealed many evolutionary connections between dsDNA viruses of eukaryotes, bacteriophages, transposable elements, and linear DNA . These findings provide an evolu- tionary scenario that derives several major groups of eukaryotic dsDNA viruses, including the proposed order “Megavirales,” adenoviruses, and from a group of large -like transposons known as (Mav- ericks). The Polintons have been recently shown to encode two proteins, suggesting that these elements lead a dual lifestyle with both a transposon and a viral phase and should perhaps more appropriately be named polin- toviruses. Here, we describe the recently identified evolutionary relationships between bacteriophages of the family Tectiviridae, polintoviruses, adenoviruses, virophages, large and giant DNA viruses of eukaryotes of the proposed order “Megavirales,” and linear mitochondrial and cytoplasmic plasmids. We outline an evolutionary scenario under which the polintoviruses were the first group of eukaryotic dsDNA viruses that evolved from bacteriophages and became the ancestors of most large DNA viruses of eukaryotes and a variety of other selfish elements. Distinct lines of origin are detectable only for herpesviruses (from a different root) and polyoma/papillomaviruses (from single-stranded DNA viruses and ultimately from plasmids). Phylogenomic analysis of giant viruses provides compelling evidence of their independent origins from smaller members of the putative order “Megavirales,” refuting the speculations on the evolution of these viruses from an extinct fourth domain of cellular .

Keywords: Polintons; Megavirales; virus evolution; capsid proteins;

Introduction in size from less than 2 kb to over 2 Mb), replication Viruses are the most common and abundant biolog- and expression mechanisms, and virion structure.5,6 ical entities on earth. Virome studies consistently Furthermore, the overall number of distinct show that in marine, soil, and -associated presentinthegenomesofselfishelementsappears environments, the number of virus particles typi- to substantially exceed the number of genes in all cally is 10–100 times greater than the number of cellular life forms. Viral genes typically evolve much cells.1–3 Viruses and/or other selfish elements, such faster than genes of cellular , and as a as transposons and plasmids, parasitize or enter result, most of the genetic diversity on earth is prob- symbiotic relationships with all cellular life forms, ably concentrated in the virus world.2,4,7 with the possible exception of some extremely Viruses and related genetic elements do not reduced intracellular parasites.4 The virus world dis- share a single common ancestor: indeed, there plays an enormous diversity of structures are no genes that would be conserved in all or and sizes (viral genomes consist of single-stranded even in the majority of viral genomes.8,9 However, (ss) or double-stranded (ds) RNA or DNA and range viruses and other selfish genetic elements form a

doi: 10.1111/nyas.12728

10 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. Koonin et al. Evolution of dsDNA viruses of eukaryotes complex network of evolutionary relationships in genomic studies and discuss the unresolved prob- which genomes are linked through different sets of lems in the evolution of eukaryotic dsDNA viruses.a shared genes.6 This evolutionary network appar- Diversity of dsDNA viruses in eukaryotes ently emerged owing to extensive gene exchange, often between widely different elements, as well as Altogether, eukaryotes are hosts to 18 recognized parallel acquisition of homologous genes from the families of dsDNA viruses that infect a broad spec- hosts. Viruses with large genomes contain numer- trum of unicellular and multicellular organisms, ous genes acquired from the hosts at different stages and many unclassified viruses, spanning almost the of evolution. However, a small group of virus hall- entire range of viral genome sizes, from approxi- mark genes that encode key proteins involved in mately 4 kb to almost 2.5 Mb (Table 1). By far the genome replication and virion formation, most largest and most common group of DNA viruses in notably capsid proteins, are represented in a broad eukaryotes consists of seven families of large viruses variety of elements, comprising a substantial frac- (including giant mimiviruses and pandoraviruses, tion of the edges in the evolutionary network.6,8,9 with genomes in the Mb range) that share a common Virus hallmark genes do not have obvious ancestors viral ancestry, as indicated by the conservation of in cellular life forms, suggesting that some types of approximately 50 (inferred) ancestral genes.15,18,19 virus-like elements evolved at a precellular stage of This assemblage of virus families is known as the evolution of life.10 nucleocytoplasmic large DNA viruses (NCLDV), or The viromes (i.e., the compendia of all viruses more recently, the proposed order “Megavirales.”20 and virus-like elements) of prokaryotes ( Notably, it is only the (putative) order “Megavi- and ) and eukaryotes are dramatically rales” that encompasses viruses infecting a broad different.6,11 In prokaryotes, the great majority of diversity of unicellular and multicellular eukaryotes; viruses have dsDNA genomes, mostly within the the other recognized families of eukaryotic dsDNA range of 10–100 kb. The second most abundant class viruses are limited in their spread to individual includes small ssDNA viruses. Retroelements com- eukaryotic kingdoms, primarily animals (Table 1). prise a small minority (no are known), The giant viruses of the family are whereas RNA viruses are rare. themselves “infected” with a distinct class of In stark contrast to bacteria and archaea, eukar- viruses, known as virophages, that reproduce within yotes are hosts to numerous, enormously diverse the “factories” inside cells and RNAviruses,aswellasretroelementsandretro- depend on the giant virus for their replication.21–24 viruses.12,13 Compared to RNA viruses and retroele- Recently, an evolutionary link has been identified ments, ssDNA and dsDNA viruses and mobile between the virophages and large eukaryotic dsDNA elements are less diverse and less abundant in transposons of the /Maverick family (here- eukaryotes, although both of these classes of selfish afterreferredtoasPolintons).25,26 The Polintons elements have been found in most eukaryotes.6 The are integrated within the genomes of diverse uni- dsDNA viruses of eukaryotes have recently enjoyed cellular protists and animals, suggesting an ancient much attention and even publicity beyond academic origin, perhaps coincident with the origin of eukary- circles, thanks to the unexpected discovery of giant otes, as well as substantial evolutionary success. viruses, with physical dimensions of the particles Unexpectedly, we have recently found that the and genome sizes exceeding those of many bac- majority of the Polintons encode two proteins teria and archaea and some parasitic unicellular homologous to the typical capsid proteins of other eukaryotes.14–17 In parallel with the study of giant viruses, strongly suggesting that, at least under some viruses, and in part stimulated by this discovery, conditions, these transposons actually produce viri- in the last few years, breakthroughs in phyloge- ons that could infect new hosts.27 Thus, Polintons, nomic analysis of eukaryotic dsDNA viruses have whichwebelieveshouldbemoreproperlydenoted been achieved, leading to the emergence of detailed, well-supported evolutionary scenarios for the evo- lution of the major groups of these viruses. Here, aThis review is largely based on two recent articles (Refs. 28 we focus on the key results from these evolutionary and 78).

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 11 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al.

Table 1. The major groups of eukaryotic dsDNA virusesa Virus family range (kb) Host range Comments

Proposed order “Megavirales” 130–375 Animals Asfarviridae 165–190 Mammals, protists 140–303 Animals, protists (?) 150–190 Insects 346–386 Amoebae 154–407 Algae, other protists (?) Mimiviridae 370–1.259 Amoebae, algae, other protists Pandoraviruses 1.908–2.473 Amoebae Currently unclassified but likely to become a new family 600 Amoebae Currently unclassified but likely to become a new family Polintoviruses 15–20 Vertebrates, insects, Currently unclassified but likely to protists become a new family once the existence of virions is validated 26–48 Vertebrates Virophages 17–26 Satellites/parasites of Currently unclassified but likely to protist-infecting become a new family mimiviruses Order 134–295 Vertebrates 125–241 Vertebrates Malacoherpsviridae 207 Molluscs

Baculoviridae 80–180 Insects 120–190 Insects Nimaviridae 300 Crustacea 125–220 Insects Polydnaviridae 150–500 Insects Encapsidated genomes of consist of multiple dsDNA circles of variable size. The genes encoding the constituents of viral particles are permanently integrated into the insect genome and are not packaged 6.8–8.4 Mammals 4.7–5.4 Mammals aWhen available, the data are from the latest report of the International Committee for Taxonomy of Viruses (ICTV). polintoviruses,appeartoleadaduallifestylecombin- that encode the predicted capsid proteins as polin- ing features of viruses and transposons. Although toviruses, while reserving the designation Polintons polintoviruses remain to be identified experimen- for those variants of these elements that appear to tally, in the rest of this paper, we refer to the elements have lost the genes for the capsid proteins.

12 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Koonin et al. Evolution of dsDNA viruses of eukaryotes

These findings prompted us to investigate in integrated with the host genomes and transmitted detail the evolutionary connections between the vertically across host generations. Some groups of Polintons and other viruses and mobile elements.28 polintoviruses have lost capsid protein genes and The results of this analysis, which we discuss in apparently became bona fide transposons, for which the first part of the present review, point to polin- the name polintons can be retained. Polintoviruses toviruses as the first group of eukaryotic dsDNA are the second major group of eukaryotic dsDNA viruses that evolved from prokaryotic viral ances- viruses, after the “Megavirales,” that is represented tors and played a central role in the evolution of in widely diverse unicellular eukaryotes along eukaryotic viruses and selfish elements. with many animals.28 Polintons/polintoviruses are missing in only one of the five major eukaryotic Polintoviruses at the root of eukaryotic superkingdoms, namely Archaeplastida, which dsDNA virus, transposon, and includeslandplantsaswellasredandgreenalgae. evolution Polintoviruses share homologous genes and gene The genomes of Polintons are large by transposon blocks with many other viruses, transposons, and standards, 15–20 kb, and encode arrays of diverse plasmids (Fig. 1). In the network of evolutionary proteins. However, several proteins are shared connections between all these groups of mobile ele- by all Polintons, including protein-primed type ments where the edges are homologous genes, Polin- B DNA polymerase (pPolB), RVE family inte- toviruses represent the central hub that shares the grase, FtsK-like DNA-packaging ATPase, and an maximum number of genes with other nodes.26,28 adenovirus-type cysteine protease implicated in the Of special interest are the multiple connections maturation of capsid proteins.26,29–31 Polintons are between bacteriophages of the family Tectiviridae, flanked by terminal inverted repeats (TIRs) and, polintoviruses, and the Mavirus , which given the universal presence of the pPolB gene, all share four genes encoding two capsid proteins, are thought to replicate via the protein-primed DNA-packaging ATPase, and protein-primed DNA mechanism. The terminal protein involved in DNA polymerase (pPolB) (Fig. 1). Polintoviruses share replication initiation has not been experimentally two additional genes with Mavirus, namely those identified but is predicted to be the N-terminal for the capsid maturation protease and the RVE domain of the polymerase.32 The presence of integrase; the rest of the virophages share the cap- genes for two capsid proteins, the putative virion- sid proteins, ATPase, and protease, but lack pPolB maturation protease and the genome-packaging and the integrase. The adenoviruses join the net- ATPase, that are components of the virion mor- work through pPolB, the two capsid proteins, and phogenesis apparatus in the members of the the protease, and the much larger “Megavirales” “Megavirales” implies that Polintons are virus-like connect through the capsid proteins, the ATPase, transposons. Initially, however, no genes for capsid and the protease. Thus, the morphogenetic mod- proteins were identified in the Polintons, rendering ule is the common denominator that joins all the presence of genes for proteins involved in capsid these diverse families of viruses into a polintovirus- maturation enigmatic. However, recent extensive centered assemblage. The yeast linear cytoplasmic searches for distant sequence similarity have plasmids bridge the elements with protein-primed resulted in the identification of two capsid protein replication and the much more complex “Megavi- genes, one for the major protein of icosahedral rales:” these plasmids obviously lack the morpho- (double jelly-roll protein) and the other genetic module but encode pPolB along with four one for the minor penton protein.27 The two key proteins that are conserved in most of the proteins are essential and, in principle, sufficient “Megavirales,” namely the two largest subunits (the  to form icosahedral capsids. This finding explains ␤-and␤ -subunits) of multisubunit RNA poly- the presence of other virion-maturation proteins merase, D11-like helicase, and the mRNA-capping that, with the capsid proteins, constitute a distinct enzyme, which show specific evolutionary rela- structural–morphogenetic module and suggest tionship to the homologues from “Megavirales” that Polintons are actually polintoviruses, a novel (Fig. 1). This suite of proteins is believed to be group of dsDNA viruses that can exist also in the essential for the cytoplasmic replication of DNA form of transposon-like elements that are genomes.

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 13 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al.

Figure 1. Gene sharing between polintoviruses/Polintons and other groups of viruses, plasmids, and transposons. Homologous genes are color coded and the color key is provided at the bottom of the figure. Hatched regions in the pPolB genes indicate the position of the (predicted) terminal protein domains. Hatching is also used to indicate the gene encoding the distinct adenoviral genome packaging ATPase IVa2. P1_DY, polintovirus 1 of Drosophila yakuba;P3_TC,polintovirus3ofTribolium castaneum; P1_TV, polinton 1 of Trichomonas vaginalis. Modified from Ref. 28.

From the connections between the polintoviruses acquired by the tectiviral ancestor of polintoviruses and various other groups of viruses and plasmids, we in a single event of recombination with a eukaryotic have inferred a unifying scenario under which polin- Ginger 1–like transposon.33 Indeed, in this group toviruses were the first group of eukaryotic dsDNA of transposons, the integrase and protease domains viruses and the font of eukaryotic virus, transpo- are fused in the same polypeptide (Fig. 1). Some son, and plasmid evolution (Fig. 2).28 Polintoviruses tectiviruses persist in bacteria in a linear plasmid appear to have evolved from bacteriophages of the form that is conducive for recombination.34 The family Tectiviridae, at the onset of eukaryogenesis. Ulp1-like deubiquitinases are characteristic eukary- This ancestral most likely entered the pro- otic enzymes, whereas bacteria encode only distantly toeukaryotic along with the ␣-proteobacterial related cysteine proteases, suggesting that the ances- that subsequently gave rise to the tor of polintoviruses had already acquired the pro- (Fig. 2). This route of evolution tease and integrase genes in the (proto)eukaryotic is compatible with the presence of linear pPolB- host. The protease was adopted for capsid matura- encoding plasmids in fungal mitochondria; more- tion and retained in all major virus lineages emerg- over, in phylogenetic trees of pPolB, the primary ing from polintoviruses, including virophages, split is observed between the pPolBs of mitochon- adenoviruses, and “Megavirales,” although some drial plasmids and the rest of the eukaryotic plas- viruses in the latter group have lost this gene.28,35 mids and viruses.28 The acquisition of the RVE integrase gene was a The key difference between tectiviruses and pivotal event in the evolution of the polintoviruses, polintoviruses is that the latter encompass genes for endowing them with the ability to lead two alter- the Ulp1-like cysteine protease and the RVE fam- native lifestyles, those typical of transposable ele- ily integrase. Both of these genes could have been ments and bona fide viruses. Such duality is also

14 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Koonin et al. Evolution of dsDNA viruses of eukaryotes

Figure 2. The polintovirus-centered scenario of evolution for eukaryotic dsDNA viruses and plasmids. INT, RVE family integrase; RNAP, DNA-dependent RNA polymerase; PRO, cysteine protease; PolB, family B DNA polymerase. Green color of the bidnavirus virion indicates that the capsid protein is unrelated to that of polintoviruses. See text for details. Modified from Ref. 28. embraced by certain bacteriophages and eukaryotic Some polintoviruses have given up the virus Ty1-copia (also known as pseu- lifestyle altogether by losing the genes involved doviruses) and Ty3-gypsy retrotransposons (meta- in virion formation and becoming pure trans- viruses).6,11,36 Conceivably, the ability to persist posons (Polintons).27 A striking example is the in the integrated form in the host genome and extraordinary expansion of capsid-less Polin- the ensuing flexibility of parasite–host interaction tons in Trichomonas vaginalis, where they con- strategies played a key role in the further diversifi- stitute up to 30% of the host genome.30 cation and spread of polintoviruses (Polintons) and Adenoviruses followed the opposite evolution- their derivatives in eukaryotes. ary route, losing the integrase gene together

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 15 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al. with the integration/transposition ability (Fig. 2) ensure efficient replication of genomes above a and thereby committing to a strictly viral repro- certain threshold (probably about 45 kb, as in duction strategy. Polintoviruses also contributed the adenoviruses), owing to the lack of a dedicated pPolB gene to the evolution of a remarkable family primase that would make multiple internal primers of ssDNA viruses, the Bidnaviridae, which emerged along the genome to ensure the completion of as a result of extensive gene shuffling between four lagging strand synthesis. Thus, to sustain the groups of selfish elements.32 reproduction of larger dsDNA genomes, the ability Most importantly, Polintoviruses seem to have to prime synthesis on multiple sites in the genome played a key role in the emergence of the “Megavi- appears to be required. Some polintoviruses rales,” the most abundant and diverse group of encode divergent D5-like primases–helicases eukaryotic dsDNA viruses. The “Megavirales”app- (Fig. 1) which often cluster with “Megavirales”in arently inherited from the Polintoviruses the virion phylogenetic trees,26 suggesting that “Megavirales” morphogenetic module, consisting of the DJR major inherited this key enzyme from Polintons. Sev- capsid protein (MCP), the genome-packaging eral other genes that are common and probably ATPase, the maturation protease, and likely also the ancestral in the “Megavirales” are also shared with minor capsid (penton) protein (Fig. 1). Notably, various Polintons.26 The PolB of “Megavirales” among all the numerous DJR proteins, the major was probably acquired from the eukaryotic host,37 capsid protein of the polintoviruses is most closely replacing the ancestral pPolB and, together with related to those of phycodnaviruses,27 suggesting the primase–helicase, providing an opportunity for a direct evolutionary connection between polin- almost unlimited genome expansion. This expan- toviruses and the “Megavirales.” The packaging sion, which involved massive acquisition of genes ATPases and the maturation proteases show high not only from eukaryotes but also from bacteria,38 levels of sequence divergence, complicating defini- was so dramatic that the genes apparently inherited tive phylogenetic analysis. However, the topologies from the polintoviruses constitute only a tiny of the respective phylogenetic trees are at least com- fraction of the gene complement of the giant patible with the existence of such a link.26 viruses. Nevertheless, a few losses notwithstanding, Polintoviruses reside in the nucleus of the host these proteins of polintovirus descent make up the cell and, accordingly, rely on host enzymes for morphogenetic module of the “Megavirales”aswell . A key event for the emergence of as important parts of the replication apparatus. the “Megavirales” was the escape from the nucleus, The virophages retain many features of polin- concomitant with the acquisition of the RNAP and toviruses but followed a different strategy to adapt the capping apparatus from the host. The escaped to reproduction in the of the host cells. element that would replicate in the cytoplasm using Instead of encoding their own machinery for the ancestral Polinton pPolB spawned two groups of cytoplasmic propagation, these viruses evolved to mobile elements (Fig. 2), namely cytoplasmic plas- parasitize their giant relatives by exploiting the mids (so far found only in fungi) and the “Megavi- transcription apparatus and other functions of the rales,” which share the unique trifunctional capping giant viruses.22–25 enzyme, RNAP, and D11-like helicase (Fig. 1). The cytoplasmic plasmids retain pPolB but have Evolution of eukaryotic dsDNA viruses lost the morphogenetic module, thus succumbing beyond the polintovirus-related to the exclusive intracellular lifestyle. assemblage By contrast, the “Megavirales” evolved via the route of increasing complexity and autonomy. There are 10 families of eukaryotic dsDNA viruses The essential events in the evolution of “Megavi- that do not show clear evolutionary relationship to rales” from the cytoplasmic polintovirus-like the polintoviruses or the “Megavirales”(Table1). ancestor include the replacement of pPolB with These viruses are characterized by rather narrow the RNA/DNA-primed PolB and acquisition of host ranges, most of them being found only in the D5-like helicase–primase (Fig. 2). It appears animals. The evolutionary-genomic analysis of likely that pPolB, which, by definition, starts DNA these viruses has not yet been performed in a replication at the end of the genome, cannot comprehensive manner as it was done in the case

16 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Koonin et al. Evolution of dsDNA viruses of eukaryotes of “Megavirales.” Nevertheless, a brief overview of of eukaryotic dsDNA viruses, whereas the narrowly what has become apparent is instructive. spread tectiviruses seeded the bulk of eukaryotic Five families of large eukaryotic dsDNA viruses dsDNA virus evolution. Conceivably, the key event (, Hytrosaviridae, Nimaviridae, Nudi- for the success of the polintoviruses that defined viridae,andPolydnaviridae) have been isolated the wide spread of their derivatives was the acquisi- exclusively from arthropods, mainly insects. These tion of the integrase (see above). Furthermore, the viruses, especially the latter three families, encode fact that herpesviruses (so far) are limited to animal highly diverged protein sequences and at this hosts might indicate that this virus family appeared time are represented by only a handful of virus relatively late in the course of eukaryotic evolution, species (a single species in the case of Nimaviri- with the ancestor bacteriophage coming not from dae), hampering phylogenomic analysis. All these the protomitochondrion but from a distinct (per- viruses seem to compose a monophyletic group haps transient) bacterial symbiont of early animals. that shares several signatures genes not found in Finally, the two families of dsDNA viruses with other viruses.39,40 Polydnaviruses are an extremely tiny, circular genomes, Papillomaviridae and Poly- unusual group of viruses that propagate vertically, omaviridae, appear to share an origin that is com- with the virus genomes permanently integrated in pletely distinct from the origins of all larger dsDNA the genomes of the insect hosts. Nevertheless, phylo- viruses of eukaryotes. The large replicative pro- genetic analysis of the retained viral genes indicates tein of these viruses, known as the large T antigen that polydnaviruses are highly derived descendants in polyomaviruses and the E1 protein in papillo- of nudiviruses.41 Preliminary phylogenetic study of maviruses, is homologous to the replication proteins highly conserved genes, such as the DNA poly- of ssDNA viruses, such as , parvoviruses, merase, RNAP subunits, helicase–primase, and thiol and geminiviruses. This large protein has a typi- oxidoreductase suggests that this entire assemblage cal domain architecture consisting of a superfamily of viruses could be a highly derived offshoot of the 3 helicase and a rolling-circle replication-initiation “Megavirales.”42 However, a comprehensive phy- endonuclease that, however, is inactivated in papil- logenomic analysis of these very interesting viruses lomaviruses and polyomaviruses, concomitant with remains to be performed. the switch from rolling circle to the “␪-like” repli- The highly diverse order Herpesvirales is of special cation mode.47,48 Thus, the small dsDNA viruses interest from an evolutionary standpoint because, in of eukaryotes are apparently derivatives of ssDNA this case, a distinct bacteriophage connection to the viruses that themselves evolved via recombination morphogenetic module is traceable. The bacterio- of bacterial rolling circle–replicating plasmids and phages involved, namely members of the order Cau- ssRNA viruses.49 dovirales, are unrelated to the tectiviruses that appar- Origin of giant viruses: dramatic inflation ently gave rise to the polintovirus-related majority of viral genomes, not degeneration of cells of eukaryotic dsDNA viruses (Figs. 1 and 2). Her- from a fourth domain of life pesviruses share with the homologous major capsid proteins of the HK97 fold that are The discovery of giant viruses (sometimes called unrelated to the jelly-roll fold present in the cap- “giruses”) infecting protists, pioneered by the iso- sid proteins of most icosahedral viruses (including lation of polyphaga ,is the polintovirus-centered assemblage), terminases one of the most unexpected and certainly the (packaging ATPase–nucleases), and capsid matu- most widely publicized breakthrough in virology ration proteases.43–46 The inheritance of the mor- in decades.14,22,50–54 The giant viruses defy the text- phogenesis module consisting of a capsid protein, book definition of viruses as “filterable” infectious ATPases, and protease from a bacteriophage closely agents because their enormous virions do not pass parallels the similar evolutionary route of the polin- bacterial filters, and once and for all invalidate tovirus ancestor, but the actual proteins involved the separation between viruses and cellular life are unrelated (or in the case of the ATPase, distantly forms based on size. Not only are the particles related). It appears somewhat paradoxical that the of giant viruses bigger than the cells of numer- most common bacteriophages of the order Cau- ous bacteria and archaea, but also the genomes dovirales gave rise to a single (even if diverse) family of pandoraviruses, the current virus world record

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 17 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al.

Table 2. Origin of universal cellular genes in giant viruses

Gene/protein Acanthamoeba polyphaga mimivirus Acanthamoeba castellanii Acanthamoeba polyphaga lentillevirus Acanthamoeba polyphaga moumouvirus Moumouvirus Moumouvirus Monve Moumouvirus goulette Courdo11 virus Terra1 v irus chiliensis Megavirus courdo7 Megavirus courdo11 Cafeteria roenbergensis virus BV-PW1 salinus Pandoravirus dulcis PithovirusOutcome sibericum of phylogenetic analysis

RNA polymerase, subunit ␣ Y YYYYYYYYYYYYYYYInsideeukaryotes,no4thdomain RNA polymerase, subunit ␤ Y YYYYYYYYYYYYYYYInsideeukaryotes,no4thdomain Arginyl-tRNAsynthetase Y YYY YYYYY Y Insideeukaryotes,no4thdomain Aspartyl/asparaginyl-tRNA Y Y Y Y Y Inside bacterial/eukaryotic group synthetase Cysteinyl-tRNAsynthetase Y YY YYYYYY Y Insideeukaryotes,no4thdomain Isoleucyl-tRNA synthetase Y Y Y Y Y Y Sister group to eukaryotes; formally compatible with 4th domain hypothesis Methionyl-tRNAsynthetase Y YY YYYYY Y Insideeukaryotes,no4thdomain Tryptophanyl-tRNA synthetase Y Y Y Inside eukaryotes; no 4th domain; different eukaryotic origins in mimiviruses and pandoraviruses Tyrosyl-tRNAsynthetase Y YYYYYYYYYYY YY Insideeukaryotes;no4th domain; different eukaryotic origins in mimiviruses and pandoraviruses Pseudouridine synthase Y Inside bacteria, no 4th domain Peptide chain release factor 1 Y Y Y Y Y Inside eukaryotes; no 4th (eRF1) domain; different eukaryotic origins in mimiviruses and pandoraviruses Putative translation initiation Y Y Insidebacteria,no4thdomain inhibitor, yjgF family Putative translation factor Y Inside eukaryotes, no 4th domain (SUA5) Translation elongation factor Y YY YYYYYY Y Insideeukaryotes,no4th EF-1␣ (GTPase) domain; polyphyletic origin among mimiviruses Translation initiation factor 1 Y Y Y Y Y Y Y Y Y Inside eukaryotes, no 4th (eIF-1/SUI1) domain; polyphyletic origin within the family Mimiviridae Continued

18 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Koonin et al. Evolution of dsDNA viruses of eukaryotes

Table 2. Continued

Gene/protein Acanthamoeba polyphaga mimivirus Acanthamoeba castellanii mamavirus Acanthamoeba polyphaga lentillevirus Acanthamoeba polyphaga moumouvirus Moumouvirus Moumouvirus Monve Moumouvirus goulette Courdo11 virus Terra1 v irus Megavirus chiliensis Megavirus courdo7 Megavirus courdo11 Cafeteria roenbergensis virus BV-PW1 Pandoravirus salinus Pandoravirus dulcis PithovirusOutcome sibericum of phylogenetic analysis

Translation initiation factor 2 Y Inside eukaryotes, no 4th domain (IF-2; GTPase) Translation initiation factor 2, ␣ Y Inside eukaryotes, no 4th domain subunit (eIF-2␣) Translation initiation factor 2, ␤ Y Inside eukaryotes, no 4th domain subunit (eIF-2␤)/eIF-5 N-terminal domain Translation initiation factor 2, ␥ Y Inside eukaryotes, no 4th domain subunit (eIF-2␥; GTPase) Translation initiation factor 3, Y Inside eukaryotes, no 4th domain subunit g (eIF-3g) Translation initiation factor 4F, YYYY YY YYYYYY Insideeukaryotes,no4th cap-binding subunit (eIF-4E) domain; polyphyletic origin and related cap-binding within the “Megavirales” proteins Translation initiation factor 4F, Y Y Y Y Y Y Insideeukaryotes,no4th helicase subunit (eIF-4A) and domain; polyphyletic origin related helicases within the family Mimiviridae The data are from Ref. 78. holders at approximately 2.5 Mb, are substan- eukaryotes.14,23,52,55–57 The fourth-domain concept tially larger and richer in gene content than many is sometimes presented as a general idea and on bacterial and archaeal genomes, from both parasites other occasions as a specific hypothesis on the evo- and free-living microbes, and even the genomes of lution of viruses.58 In general terms, the claim that some parasitic eukaryotes, such as Microsporidia.16 giant viruses represent a fourth domain of life sim- The recent identification of pandoraviruses and the ply refers to the cell-like character of these viruses pithovirus,17 which are not only huge, by viral stan- in terms of size of the virions and genomes and, dards, but also possess a novel, asymmetrical virion in addition, to the observation that many genes of structure, shows that the actual diversity of giant these viruses have no detectable homologs and so viruses remains to be uncovered. could have come from some unknown source. In The “cell-like” features of giant viruses prompted this general form, the fourth-domain concept does several researchers to propose the striking idea not make any falsifiable predictions and, accord- that giant viruses represent a “fourth domain ingly, is not a scientific hypothesis sensu Popper. of life” that is distinct from but comparable to In contrast, the specific fourth-domain hypothe- the three cellular domains, bacteria, archaea, and sis drives directly from the original definition of the

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 19 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al.

Figure 3. Origin of giant viruses. The schematic tree shows the phylogeny of the genes that are conserved across the “Megavirales,” with the three independently evolved groups of giant viruses highlighted in red. The numbers at internal branches indicated the estimated number of genes inferred to have been gained at the respective stage of evolution. The data are from Ref. 78. three domains of cellular life. These three domains, fourth-domain hypothesis, whereas their reliable Bacteria, Archaea, and Eukaryota, are the three placement inside any of the three domains would fal- major trunks in the unrooted phylogenetic tree of sify this hypothesis. Several analyses, starting with 16S ribosomal RNA,59–64 which is topologically con- the original study of the mimivirus genome, have sistent with the phylogenies of most of the other reported phylogenetic trees that appeared com- (nearly) universal genes that encode primarily com- patible with the fourth-domain hypothesis.14,68–70 ponents of the translation and the core transcrip- However, there is an inherent problem with such tion machineries.65–67 Unlike other viruses, the giant observations. The point is that rapid evolution of viruses encode several proteins that are universal viral genes, a common phenomenon that is likely among cellular life forms, in particular translation to have occurred in the history of the giant viruses, system components, such as aminoacyl-tRNA syn- especially immediately following the acquisition of thetases and translation factors (Table 2). The pres- the respective genes from the host, can obscure ence of these universal genes allows one to formally their affinity with homologs from cellular organ- include the giant viruses in the “tree of life” that isms within one of the recognized domains. This constitutes the basis of the three-domain system.14 effect of nonuniform evolutionary rates is a perva- The outcome of the phylogenetic analysis of sive problem in the analysis of deep phylogenies.71 the universal genes of giant viruses is (at least, Indeed, a subsequent reanalysis of the phylogenies in principle) readily interpretable: the placement of several universal genes present in giant viruses of the viral genes outside the three traditional has failed to find support for the fourth-domain domains of cellular life is compatible with the hypothesis.72

20 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Koonin et al. Evolution of dsDNA viruses of eukaryotes

Notwithstanding their unusual size, high genetic the same universal gene was detected in distantly complexity, and the presence of some universal cel- related giant viruses, the respective genes typically lular genes, all giant viruses contain the set of core were polyphyletic (e.g., tyrosyl-tRNA synthetase in genes that define the NCLDV15,18,73 or the pro- mimiviruses and pandoraviruses; Table 2). Thus, posed order “Megavirales.”20 Evolutionary recon- these genes have been acquired by the giant viruses structions have mapped about 50 genes encoding from the eukaryotic hosts at different stages of evo- essential viral functions to the putative common lution, not inherited from a fourth domain of cel- ancestor of the “Megavirales,” although some of lular life. these putative ancestral genes have been lost in cer- A complementary phylogenetic analysis of the tain groups of viruses.15,37,74 Importantly, this ances- genes that are conserved across the “Megavirales” tral gene set does not include genes for components clearly reveals the roots of the giant viruses (Fig. 3). of the translation system or any other genes that Combined with probabilistic reconstruction of might be considered suggestive of a cellular nature of ancestral gene sets,78 the phylogeny of the core the common ancestor of the “Megavirales”implied genes shows that evolution of the “Megavirales” by the fourth-domain hypothesis. was generally characterized by genome expansion, Phylogenetic analysis of the genes that are con- which was pushed to the extreme in at least three served across the “Megavirales” reveals apparent independently evolved groups of giant viruses: (1) evolutionary relationships between giant and Mimiviridae that are a distant sister group of the smaller viruses. Specifically, the mimiviruses cluster Phycodnaviridae, (2) pandoraviruses that evolved with the so-called organic lake phycodnaviruses from a common ancestor with coccolithoviruses and Phaeocystis globosa viruses,75,76 pandora- within the Phycodnaviridae, and (3) pithoviruses viruses cluster with phycodnaviruses, in partic- that evolved from a common ancestor with ular coccolithoviruses,77 and pithovirus clusters Iridoviridae and Marseilleviridae.17,38 Discovery of with marseilleviruses and iridoviruses17 (Fig. 3). additional groups of giant viruses appears quite Together with reconstructions of gene-repertoire likely. Remarkably, the ancestral icosahedral capsid evolution from the phyletic patterns (matrices of constructed from DJR MCPs was replaced with gene presence and absence) of “Megavirales”genes, less regular, ovoid, or brick-shaped virions in these relationships suggest that different groups of several groups within the “Megavirales,” namely giant viruses independently evolved from smaller in ascoviruses, poxviruses, pandoraviruses, and ancestral viruses. pithoviruses, indicating that all aspects of the viral The observation that giant viruses encompass life cycle are prone to dramatic change. many ancestral genes shared with the smaller mem- Conclusions bers of “Megavirales” constrains the fourth-domain hypothesis to a specific version under which a viral The world of viruses and selfish elements is enor- ancestor of the giant viruses reproduced in a host mously diverse, even when one considers only that belonged to a fourth domain of cellular life and viruses with a certain type of genomic acquired numerous genes, including some that are (dsDNA) that infect hosts from only one domain universal in cellular life forms. In this scenario, after of cellular life (eukaryotes). The recent discovery the fourth cellular domain went extinct, the giant of giant viruses infecting protists shows that we viruses remained the only surviving “living fossils” are still far from saturation in the study of viral of their original hosts. diversity. Nevertheless, paraphrasing Einstein, the In a recent study, we sought to formally and most astonishing finding about the virus world is comprehensively test this specific fourth-domain that its history appears to be tractable. As discussed hypothesis through phylogenetic analysis of all uni- in this paper, we are now close to having coher- versal genes identified in giant viruses.38 The results ent scenarios for the evolution of all major groups of this phylogenomic analysis effectively falsify the of dsDNA viruses of eukaryotes and perhaps deci- fourth-domain hypothesis by showing that the great phering certain general principles of this evolution. majority of universal genes in giant viruses confi- One such key principle is the unity of the evolution- dently cluster within the eukaryotic domain of the ary processes among viruses and capsid-less self- tree of life (Table 2). Furthermore, in cases where ish elements.6 However, the equally fundamental,

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 21 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al. complementary principle is the evolutionary per- 11. Krupovic, M. et al. 2011. Genomics of bacterial and archaeal sistence of viral morphogenetic modules.46,78 Cer- viruses: dynamics within the prokaryotic virosphere. Micro- tainly, the present reconstructions of the history of biol. Mol. Biol. Rev. 75: 610–635. 12. Kazazian, H.H., Jr. 2004. Mobile elements: drivers of genome the virus world are schematic and only cover a small evolution. Science 303: 1626–1632. 8 number of “hallmark” genes. However, these genes 13. Goodier, J.L. & H.H. Kazazian, Jr. 2008. Retrotransposons encode most of the key functions in viral morpho- revisited: the restraint and rehabilitation of parasites. Cell genesis, replication, and expression, whereas the rest 135: 23–35. of the genes, of which giant viruses possess thou- 14. Raoult, D. et al. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306: 1344–1350. sands, are involved in various aspects of virus–host 15. Koonin, E.V. & N. Yutin. 2010. Origin and evolution of interaction and widely differ between viruses. For eukaryotic large nucleo-cytoplasmic DNA viruses. Intervi- all dsDNA viruses of eukaryotes, we can detect clear rology 53: 284–292. ancestry among viruses (and in some cases, plas- 16. Philippe, N. et al. 2013. Pandoraviruses: viruses with mids) of bacteria, an observation that is compati- genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science 341: 281–286. ble with the key role of the (proto)mitochondrial 17. Legendre, M. et al. 2014. Thirty-thousand-year-old distant endosymbiont (and possibly other bacterial sym- relative of giant icosahedral DNA viruses with a pando- bionts) in the evolution of eukaryotes.79,80 Some of ravirus morphology. Proc.Natl.Acad.Sci.U.S.A. 111: 4274– the most exciting ideas on viral evolution, such as 4279. the fourth-domain hypothesis, do not survive phy- 18. Iyer, L.M., L. Aravind & E.V. Koonin. 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. logenomic scrutiny. However, the emerging picture Virol. 75: 11720–11734. of virus evolution is hardly less remarkable. 19. Iyer, L.M. et al. 2006. Evolutionary genomics of nucleo- cytoplasmic large DNA viruses. Virus Res. 117: 156–184. Acknowledgments 20. Colson, P. et al. 2013. “Megavirales,” a proposed new order E.V.K. and N.Y. are supported by intramural funds for eukaryotic nucleocytoplasmic large DNA viruses. Arch. Virol. 158: 2517–2521. of the U.S. Department of Health and 21. La Scola, B. et al. 2008. The virophage as a unique parasite Services (to the National Library of Medicine). of the giant mimivirus. Nature 455: 100–104. 22. Claverie, J.M. & C. Abergel. 2009. Mimivirus and its Conflicts of interest virophage. Annu. Rev. Genet. 43: 49–66. 23. Desnues, C., M. Boyer & D. Raoult. 2012. Sputnik, a The authors declare no conflicts of interest. virophage infecting the viral domain of life. Adv. Virus Res. 82: 63–89. 24. Krupovic, M. & V. Cvirkaite-Krupovic. 2011. Virophages or satellite viruses? Nat. Rev. Microbiol. 9: 762–763. References 25. Fischer, M.G. & C.A. Suttle. 2011. A virophage at the origin 1. Edwards, R.A. & F. Rohwer. 2005. Viral . Nat. of large DNA transposons. Science 332: 231–234. Rev. Microbiol. 3: 504–510. 26. Yutin, N., D. Raoult & E.V.Koonin. 2013. Virophages, polin- 2. Rohwer, F. 2003. Global phage diversity. Cell. 113: 141. tons, and : a complex evolutionary network of 3. Suttle, C.A. 2007. : major players in the global diverse selfish genetic elements with different reproduction ecosystem. Nat. Rev. Microbiol. 5: 801–812. strategies. Virol. J. 10: 158. 4. Koonin, E.V. & V.V. Dolja. 2013. A virocentric perspective 27. Krupovic, M., D.H. Bamford & E.V. Koonin. 2014. Conser- on the evolution of life. Curr. Opin. Virol. 3: 546–557. vation of major and minor jelly-roll capsid proteins in Polin- 5. Krupovic, M. & D.H. Bamford. 2010. Order to the viral ton (Maverick) transposons suggests that they are bona fide universe. J. Virol. 84: 12476–12479. viruses. Biol. Direct. 9: 6. 6. Koonin, E.V. & V.V. Dolja. 2014. Virus world as an evolu- 28. Krupovic, M. & E.V. Koonin. 2015. Polintons: the hotbed tionary network of viruses and capsidless selfish elements. of eukaryotic virus, transposon and plasmid evolution. Nat. Microbiol. Mol. Biol. Rev. 78: 278–303. Rev. Microbiol. 13: 105–115. 7. Kristensen, D.M. et al. 2013. Orthologous gene clusters and 29. Kapitonov, V.V. & J. Jurka. 2006. Self-synthesizing DNA taxon signature genes for viruses of prokaryotes. J. Bacteriol. transposons in eukaryotes. Proc. Natl. Acad. Sci. U.S.A. 103: 195: 941–950. 4540–4545. 8. Koonin, E.V.,T.G. Senkevich & V.V.Dolja. 2006. The ancient 30. Pritham, E.J., T. Putliwala & C. Feschotte. 2007. Mavericks, Virus World and evolution of cells. Biol. Direct. 1: 29. a novel class of giant transposable elements widespread 9. Holmes, E.C. 2011. What does virus evolution tell us about in eukaryotes and related to DNA viruses. Gene 390: virus origins? J. Virol. 85: 5247–5251. 3–17. 10. Koonin, E.V. 2009. On the origin of cells and viruses: pri- 31. Jurka, J. et al. 2007. Repetitive sequences in complex mordial virus world scenario. Ann. N. Y. Acad. Sci. 1178: genomes: structure and evolution. Annu.Rev.Genomics 47–64. Hum. Genet. 8: 241–259.

22 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Koonin et al. Evolution of dsDNA viruses of eukaryotes

32. Krupovic, M. & E.V. Koonin. 2014. Evolution of eukaryotic proteins: structural insights and new members. Nucleic Acids single-stranded DNA viruses of the Bidnaviridae family from Res. 33: 3875–3896. genes of four other groups of widely different viruses. Sci. 49. Krupovic, M. 2013. Networks of evolutionary interactions Rep. 4: 5347. underlying the polyphyletic origin of ssDNA viruses. Curr. 33. Bao, W., V.V.Kapitonov & J. Jurka. 2010. Ginger DNA trans- Opin. Virol. 3: 578–586. posons in eukaryotes and their evolutionary relationships 50. La Scola, B. et al. 2003. A giant virus in amoebae. Science with long terminal repeat retrotransposons. Mob. DNA. 299: 2033. 1: 3. 51. Koonin, E.V.2005. Virology: GulliveramongtheLilliputians. 34. Gillis, A. & J. Mahillon. 2014. Phages preying on Bacillus Curr. Biol. 15: R167–R169. anthracis, ,andBacillus thuringiensis:past, 52. Claverie, J.M. et al. 2006. Mimivirus and the emerging con- present and future. Viruses 6: 2623–2672. cept of “giant” virus. Virus Res. 117: 133–144. 35. Yutin, N. & E.V. Koonin. 2012. Hidden evolutionary com- 53. Van Etten, J.L. 2011. Another really, really big virus. Viruses plexity of nucleo-cytoplasmic large DNA viruses of eukary- 3: 32–46. otes. Virol. J. 9: 161. 54. Van Etten, J.L., L.C. Lane & D.D. Dunigan. 2010. DNA 36. Sandmeyer, S.B. & T.M. Menees. 1996. Morphogenesis at the viruses: the really big ones (giruses). Annu. Rev. Microbiol. - interface: gypsy and copia fami- 64: 83–99. lies in yeast and Drosophila. Curr. Top Microbiol. Immunol. 55. Colson, P. et al. 2011. Viruses with more than 1000 genes: 214: 261–296. Mamavirus, a new Acanthamoeba polyphaga mimivirus 37. Filee,´ J., P.Forterre, T. Sen-Lin & J. Laurent. 2002. Evolution strain, and reannotation of Mimivirus genes. Genome Biol. of DNA polymerase families: evidences for multiple gene Evol. 3: 737–742. exchange between cellular and viral proteins. J. Mol. Evol. 56. Yoosuf, N. et al. 2012. Related giant viruses in distant 54: 763–773. locations and different habitats: Acanthamoeba polyphaga 38. Yutin, N., Y.I. Wolf & E.V. Koonin. 2014. Origin of giant moumouvirus represents a third lineage of the Mimiviridae viruses from smaller DNA viruses not from a fourth domain that is close to the megavirus lineage. Genome Biol. Evol. 4: of cellular life. Virology 466–467: 38–52. 1324–1330. 39. Wang, Y. & J.A. Jehle. 2009. Nudiviruses and other large, 57. Legendre, M. et al. 2012. Genomics of Megavirus and the double-stranded circular DNA viruses of invertebrates: elusive fourth domain of Life. Commun. Integr. Biol. 5: 102– new insights on an old topic. J. Invertebr. Pathol. 101: 106. 187–193. 58. Forterre, P., M. Krupovic & D. Prangishvili. 2014. Cellular 40. Jehle, J.A., A.M. Abd-Alla & Y. Wang. 2013. Phylogeny and domains and viral lineages. Trends Microbiol. 22: 554–558. evolution of Hytrosaviridae. J. Invertebr. Pathol. 112(Suppl.): 59. Woese, C.R. 1987. Bacterial evolution. Microbiol. Rev. 51: S62–S67. 221–271. 41. Theze, J. et al. 2011. Paleozoic origin of insect large dsDNA 60. Woese, C.R. & G.E. Fox. 1977. Phylogenetic structure of viruses. Proc.Natl.Acad.Sci.U.S.A.108: 15931–15935. the prokaryotic domain: the primary kingdoms. Proc. Natl. 42. Wang, Y., O.R. P.Bininda-Emonds & J.A. Jehle. 2012. “Nudi- Acad. Sci. U.S.A. 74: 5088–5090. virus genomics and phylogeny.” In Molecular Structure, 61. Woese, C.R., O. Kandler & M.L. Wheelis. 1990. Towards Diversity, Gene Expression Mechanisms and Host-Virus Inter- a natural system of organisms: proposal for the domains actions. M. Garcia, Ed. Rijeka: InTech pp. 33–52. Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U.S.A. 43. Selvarajan Sigamani, S. et al. 2013. The structure of the 87: 4576–4579. DNA-packaging terminase pUL15 62. Woese, C.R., L.J. Magrum & G.E. Fox. 1978. Archaebacteria. nuclease domain suggests an evolutionary lineage among J. Mol. Evol. 11: 245–251. eukaryotic and prokaryotic viruses. J. Virol. 87: 7140–7148. 63. Pace, N.R. 1997. A molecular view of microbial diversity and 44. Pietila,¨ M.K. et al. 2013. Structure of the archaeal head- the biosphere. Science 276: 734–740. tailed virus HSTV-1 completes the HK97 fold story. Proc. 64. Pace, N.R. 2006. Time for a change. Nature 441: 289. Natl. Acad. Sci. U.S.A. 110: 10604–10609. 65. Brown, J.R. & W.F. Doolittle. 1997. Archaea and the prok- 45. Rixon, F.J. & M.F. Schmid. 2014. Structural similarities aryote-to- transition. Microbiolm Molm Biolm in DNA packaging and delivery apparatuses in Her- Rev. 61: 456–502. pesvirus and dsDNA bacteriophages. Curr. Opin. Virol. 5: 66. Brown, J.R. 2001. Genomic and phylogenetic perspectives 105–110. on the evolution of prokaryotes. Syst. Biol. 50: 497–512. 46. Krupovic, M. & D.H. Bamford. 2011. Double-stranded 67. Puigbo, P., Y.I. Wolf & E.V. Koonin. 2009. Search for a ‘Tree DNA viruses: 20 families and only five different architec- of Life’ in the thicket of the phylogenetic forest. J. Biol. 8: 59. tural principles for virion assembly. Curr. Opin. Virol. 1: 68. Colson, P. et al. 2012. Reclassification of giant viruses com- 118–124. posing a fourth domain of life in the new order Megavirales. 47. Ilyina, T.V.& E.V.Koonin. 1992. Conserved sequence motifs Intervirology 55: 321–332. in the initiator proteins for rolling circle DNA replication 69. Colson, P. et al. 2011. The giant Cafeteria roenbergensis virus encoded by diverse replicons from eubacteria, eucaryotes that infects a widespread marine phagocytic protist is a new and archaebacteria. Nucleic Acids Res. 20: 3279–3285. member of the fourth domain of Life. PLoS One 6: e18935. 48. Iyer, L.M. et al. 2005. Origin and evolution of the archaeo- 70. Nasir, A., K.M. Kim & G. Caetano-Anolles. 2012. Giant eukaryotic primase superfamily and related palm-domain viruses coexisted with the cellular ancestors and represent

Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by 23 Wiley Periodicals Inc. on behalf of The New York Academy of Sciences. Evolution of dsDNA viruses of eukaryotes Koonin et al.

a distinct supergroup along with superkingdoms Archaea, 76. Santini, S. et al. 2013. Genome of Phaeocystis globosa virus Bacteria and Eukarya. BMC Evol. Biol. 12: 156. PgV-16T highlights the common ancestry of the largest 71. Felsenstein, J. 2004. Inferring Phylogenies.Sunderland,MA: known DNA viruses infecting eukaryotes. Proc. Natl. Acad. Sinauer Associates. Sci. U.S.A. 110: 10800–10805. 72. Williams, T.A.,T.M. Embley & E. Heinz. 2011. Informational 77. Yutin, N. & E.V. Koonin. 2013. Pandoraviruses are highly gene phylogenies do not support a fourth domain of life for derived phycodnaviruses. Biol. Direct. 8: 25. nucleocytoplasmic large DNA viruses. PLoS One 6: e21080. 78. Yutin, N., Y.I. Wolf & E.V. Koonin. 2014. Origin of giant 73. Iyer, L.M. et al. 2006. Evolutionary genomics of nucleo- viruses from smaller DNA viruses not from a fourth domain cytoplasmic large DNA viruses. Virus Res. 117: 156–184. of cellular life. Virology 466–467: 38–52. 74. Yutin, N. et al. 2009. Eukaryotic large nucleo-cytoplasmic 79. Krupovic, M. & D.H. Bamford. 2008. Virus evolution: how DNA viruses: clusters of orthologous genes and reconstruc- far does the double beta-barrel viral lineage extend? Nat. tion of viral genome evolution. Virol. J. 6: 223. Rev. Microbiol. 6: 941–948. 75. Yutin, N. et al. 2013. Mimiviridae: clusters of ortholo- 80. Embley, T.M. & W. Martin. 2006. Eukaryotic evolution, gous genes, reconstruction of gene repertoire evolution changes and challenges. Nature 440: 623–630. and proposed expansion of the giant virus family. Virol. J. 81. Martin, W. & E.V. Koonin. 2006. Introns and the origin of 10: 106. nucleus-cytosol compartmentation. Nature 440: 41–45.

24 Ann. N.Y. Acad. Sci. 1341 (2015) 10–24 C 2015 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences.