<<

PERSpECTIVES

that form several distinct, seemingly unrelated groups16–18. The LUCA and its complex virome In another recent synthesis, we examined the origins of the replication and structural Mart Krupovic , Valerian V. Dolja and Eugene V. Koonin modules of and posited a ‘chimeric’ scenario of evolution19. Under this Abstract | The last universal cellular ancestor (LUCA) is the most recent population model, the replication machineries of each of of from which all cellular on Earth descends. The reconstruction of the four realms derive from the primordial the and of the LUCA is a major challenge in evolutionary pool of genetic elements, whereas the major biology. Given that all life forms are associated with viruses and/or other mobile virion structural were acquired genetic elements, there is no doubt that the LUCA was a host to viruses. Here, by from cellular hosts at different stages of evolution giving rise to bona fide viruses. projecting back in time using the extant distribution of viruses across the two In this Perspective article, we combine primary domains of life, and archaea, and tracing the evolutionary this recent work with observations on the histories of some key virus , we attempt a reconstruction of the LUCA virome. host ranges of viruses in each of the four Even a conservative version of this reconstruction suggests a remarkably complex realms, along with deeper reconstructions virome that already included the main groups of extant viruses of bacteria and of virus evolution, to tentatively infer archaea. We further present evidence of extensive virus evolution antedating the the composition of the virome of the last universal cellular ancestor (LUCA; also LUCA. The presence of a highly complex virome implies the substantial genomic referred to as the last universal common and pan-genomic​ complexity of the LUCA itself. ancestor of cellular organisms).

Viruses and other transcriptases (RTs). Monodnaviria includes The LUCA (MGEs) are involved in parasitic or single-​stranded DNA (ssDNA) viruses Evidently, to make any meaningful symbiotic relationships with all cellular together with small double-stranded​ DNA inferences regarding the viruses that life forms1–5, and theoretical models (dsDNA) viruses (papillomaviruses and infected the LUCA, we must have at least indicate that the emergence of such selfish polyomaviruses) that are unified by the a general notion of the characteristics elements is an intrinsic feature of replicator distinct endonuclease (or its inactivated of this ancestral life form. Considerable systems6–11. Thus, genetic parasites must derivative) involved in the initiation efforts have been undertaken over the have been inalienable components of life of genome replication. Duplodnaviria years to deduce the genetic composition from its very beginnings. Unlike cellular include tailed dsDNA and biological features of the LUCA from life forms, viruses employ all existing types and archaeal viruses along with comparative genome analyses combined of nucleic acids as replicating herpesviruses that are unified by the with biological reasoning20–22. These packaged into virions. This diversity of distinct morphogenetic module consisting inferences are challenged by the complex the replication and expression strategies of HK97-​fold major proteins evolutionary histories of most genes (with has been captured in a systematic form in (MCPs), homologous genome packaging partial exception for the core components the ‘’ of viruses12. ATPases-​nucleases (terminases), portal of the and systems) Recently, we undertook a comprehensive proteins and capsid maturation . that involved extensive horizontal transfer reappraisal of the findings of virus Varidnaviria is an enormously diverse and non-​orthologous displacement23–25. phylogenomics to assess the evolutionary assemblage of viruses infecting bacteria, Nevertheless, on the strength of combined status of each of the Baltimore classification archaea and that are unified by evidence, it appears likely that the LUCA groups13. This synthesis culminated in the the vertical jelly-roll​ MCPs (most groups was a -like​ (that is, like identification of four realms (the highest possess double jelly-roll​ (DJR) MCPs but bacteria or archaea) of considerable genomic rank in virus taxonomy) of viruses that are some have a single jelly-roll​ (SJR) and organizational complexity20,26–28. Formal monophyletic with respect to their core gene that is the likely ancestral form) along reconstructions of the ancestral gene sets and partially overlap with the Baltimore with a distinct type of genome packaging repertoires based on maximum parsimony classification: , Monodnaviria, ATPase present in most constituent and maximum likelihood approaches Duplodnaviria and Varidnaviria. Riboviria groups. This megataxonomy of viruses assign several hundred genes to the LUCA includes viruses with positive-sense,​ has been recently formally adopted by the that are responsible for most of the core negative-​sense and double-​stranded International Committee for the Taxonomy processes characteristic of prokaryotic RNA (dsRNA) genomes as well as of Viruses14,15. Apart from the four cells22,27,29, perhaps making it comparable to reverse-​transcribing viruses with RNA and monophyletic realms, several groups of the simplest extant free-living​ bacteria and DNA genomes. Members of this are viruses remain unaffiliated in the emergent archaea (~1,000 genes or even more complex unified by the homologous RNA-dependent​ megataxonomy, most notably the diverse given that the accessory gene repertoire RNA polymerases (RdRPs) and reverse dsDNA viruses of hyperthermophilic is not amenable to a straightforward

Nature Reviews | Perspectives

reconstruction). However, the of the with the results of recent experiments to the LUCA virome, indicating that replication and membrane machineries of demonstrating the viability of bacteria with (at least) this realm of viruses had already LUCA remains unclear owing to the drastic an engineered, mixed archaeal–bacterial reached considerable diversity prior to the differences between the respective systems membrane44. The same considerations apply radiation of archaea and bacteria (Fig. 3). of bacteria and archaea, the two primary to the wall, which is represented by the All viruses of this realm share homologous domains of life30–33. peptidoglycan in most bacteria45,46 and MCPs (HK97-​fold), large and small The fact that the replicative DNA the proteinaceous S-layer​ in archaea terminase subunits, prohead maturation polymerases of bacteria, archaea and and some bacteria47. Importantly, however, proteases and portal proteins, indicating eukaryotes are not homologous has in this case, the possibility of a wall-less​ that their morphogenetic modules are prompted ideas of an RNA-based​ LUCA cannot be dismissed. monophyletic53–58. Duplodnaviruses are LUCA31,34,35. However, the recent discovery The genetic composition of modern broadly distributed among both bacteria and of the structural similarity between the is best described in terms of archaea (Figs 1,2) and, crucially, comparative catalytic cores of the archaeal replicative the pangenome, that is, the entirety of the genomic analyses suggest that the archaeal D DNA polymerase (PolD) genes that are found in organisms with and bacterial viruses within Duplodnaviria, and the universal DNA-directed​ RNA closely related core genomes that are on a broad scale, have coevolved with their polymerase36,37 implies a common origin traditionally considered to constitute a respective hosts59 (see discussion below). of replication and transcription and species48–50. The accessory genes that are Tailed bacteriophages are nearly suggests an ‘archaeal-like’​ replication present in each strain in addition to the universal among bacteria60. In archaea, machinery in LUCA, with PolD serving core genome and collectively account duplodnaviruses or related as the replicative DNA polymerases38. for the bulk of the pangenome include (virus genome integrated into the cellular Evolutionary reconstructions point to a diverse anti-parasite​ defence systems, genes ) have been detected in fairly complex replication apparatus in the involved in inter-microbial​ conflicts, such many mesophilic as well as extremophilic LUCA, with processive replication aided as production and resistance, lineages of the phyla Euryarchaeota by the sliding clamp (proliferating cell and integrated MGEs. Given that genetic and Thaumarchaeota56,61. Furthermore, nuclear antigen), clamp loader, replicative parasites are intrinsic components of any HK97-​fold MCPs were identified in and the ssDNA-binding​ . replicator system, this pangenome structure uncultivated archaea of the proposed Similarly, the transcription system of the should necessarily have been established at phyla Aenigmarchaeota, Altiarchaeota, LUCA can be inferred to have already the earliest stages of cellular evolution. Thus, Nanoarchaeota, Micrarchaeota, included a multisubunit RNA polymerase although important features of the LUCA Iainarchaeota and Asgardarchaeota (Fig. 2). with the duplicated large subunits, some remain to be clarified, we can conclude However, given the potential artefacts smaller subunits and multiple transcription with reasonable confidence that it was a associated with the binning of contigs regulators39. prokaryotic population with a pangenomic from environmental genomics projects, The membranes of archaea and bacteria complexity comparable to that of the extant the host assignment for these (pro)viruses consist of different types of phospholipids, archaea and bacteria. The attempt on the should be considered with utmost caution. namely isoprenoid ethers and fatty reconstruction of the LUCA virome that we Nevertheless, the distribution of tailed acid esters, respectively, with different undertake here provides some insights into archaeal duplodnaviruses appears to chiralities of the glycerol-phosphate​ the pangenome of the LUCA that we discuss encompass highly diverse environments, moiety33. Although the possibility in the final section. mirroring the situation of their bacterial of a membrane-less​ LUCA has been relatives and consistent with the presence discussed40,41, the general considerations LUCA and the four viral realms of this group in the LUCA virome. on the essentiality of compartmentalization Examination of the host ranges of viruses in Although it is difficult to precisely map and the universal conservation of certain each of the four realms13 and, in particular, specific groups of duplodnaviruses to the key membrane-associated​ components, assessment of the relationships between LUCA virome, the presence of viruses such as the signal recognition particle, bacterial and archaeal members allows us with short tails (podovirus morphology), leave little doubt that the LUCA had to make inferences on the composition of long non-contractile​ tails (siphovirus membrane-bound​ cells42; however, the the LUCA virome. Widespread groups with morphology) and contractile tails nature of the membrane in the LUCA a clear dichotomy between archaeal and (myovirus morphology) in both bacteria and remains uncertain. Phylogenomic analyses bacterial viruses are the best candidates for archaea implies that all these morphologies indicate that the LUCA encoded the components of the virome of the LUCA. We were already represented in the LUCA biosynthetic pathways for both bacterial start by mapping the major groups of viruses virome (Fig. 3). The alternative possibility, and archaeal phospholipids, implying an to the evolutionary trees of bacterial and namely that all three major groups of ancestral mixed membrane, with subsequent archaeal hosts51 (omitting eukaryotes as a duplodnaviruses (that is, siphoviruses, differentiation30–33. Notably, preliminary data derived domain of life that emerged at a later myoviruses and podoviruses) were suggest that bacteria of the Fibrobacteres– stage of evolution and is hence irrelevant transferred between bacteria and archaea at Chlorobi–Bacteroidetes group superphylum as far as the LUCA is concerned52), thereby later stages of evolution, cannot be formally and related candidate phyla encode a inferring their likely presence or absence excluded but appears less parsimonious. complete pathway for archaeal membrane in the LUCA virome. The results of this Furthermore, the observation that many lipid biosynthesis, in addition to the reconstruction (Figs 1,2) suggest that the bacterial members of the Duplodnaviria bacterial fatty acid membrane pathway, LUCA virome was dominated by dsDNA encode archaeal-like​ genome replication suggesting that certain contemporary viruses. More specifically, several groups modules62, which are not homologous to bacterial lineages have mixed heterochiral of tailed dsDNA viruses (Duplodnaviria) the bacterial functional counterparts, also membranes43. Such a possibility is consistent were assigned argues in favour of the origin of this virus

www.nature.com/nrmicro Perspectives

Bacteria Other Tubulavirales Varidnaviria Duplodnaviria

Elusimicrobiota GCA-001730085

Omnitrophota

Acidobacteriota

CG03

Desulfobacterota_A

Firmicutes_A

Firmicutes_D Firmicutes_I,K

Firmicutes_G

Bdellovibrionota Firmicutes_E UBA10199 Bdellovibrionota_B DTU030

Firmicutes_B

Firmicutes_C Myxococcota Fusobacteriota Cyanobacteria Margulisbacteria Actinobacteriota SchekmanbacteriaBMS3Abin14

Nitrospinota Eremiobacterota Firmicutes_H CG2-30-53-67 Nitrospirota Armatimonadota

UBP13

Desulfobacterota Patescibacteria

Aquificota Chloroflexota Campylobac. Deinococcota Synergistota Caldisericota Poribacteria Thermotogota Fermentibac. Verrucomic. UBP3 WOR-3_B Verrucomic_A

Planctomycetota

Spirochaetota

UBP7

Latescibacterota Key CloacimonadotaFibrobacterota Duplodnaviria Gemmatimonadota Marinisomatota Bacteroidota Varidnaviria Cystoviridae Tubulavirales Leviviridae

Fig. 1 | Distribution of known viruses across the evolutionary tree Plectroviridae); and the fourth ring includes all other virus groups, of bacteria. The figure shows the latest phylogenetic tree of bac­ namely Microviridae, Leviviridae, Cystoviridae and Plasmaviridae. teria, with all phyla indicated, and the major groups of viruses known The phylo­geny and taxo­nomic nomenclature were retrieved from the to infect members of these phyla. Virus groups are represented by Genome Taxonomy Data­base51 and visualized with AnnoTree109. symbols depicting­ the corresponding virions. Coloured and open sym- The information on virus distribution for virus isolates with comple­ bols represent virus isolates and virus genomes or putative , tely sequenced genomes was obtained from GenBank. The respectively. The symbols are arranged on four concentric rings: the distribution was retrieved from previously published work on innermost ring depicts the distribution of members of the realm Duplodnaviria110, Tubulavirales68 and Varidnaviria63,64,111,112. Supple­ Duplodnaviria (families , , , mentary Data 1 shows the known virus–host associations across the and Herelleviridae); the second ring shows members domains Bacteria and Archaea. In the spreadsheet, a name of the realm Varidnaviria (families Tectiviridae, Corticoviridae, is indicated if a virus is known to infect (or be associated as a provirus Finnlakeviridae, Sphaerolipoviridae and Autolykiviridae); the third ring with) any member of the or of Bacteria and Archaea, shows members of the Tubulavirales (families Inoviridae and respectively.

Nature Reviews | Microbiology Perspectives

Archaea Other Fusello-like viruses Varidnaviria and Portogloboviridae Duplodnaviria

Altiarchaeota

Iainarchaeota

Micrarchaeota

Methanosarcinia Methanocellia Halobacteria Methanomicrobia Syntrophoarchaeia

Archaeoglobi

NRA6 UAP2 Methanonatron- archaeia Aenigmarchaeota Poseidoniia

Nanoarchaeia Thermoplasmata

Pacearchaeales

E2 Woesearchaeales SW-10-69-26 UBA186 Heimdallarchaeia Thermococci

Methanobacteria Lokiarchaeia Methanococci Hadarchaeia

Korarchaeia Methano- methylicia Nitrososphaeria

Bathyarchaeia

Thermoprotei

Key Duplodnaviria Spindle-shaped viruses Varidnaviria and Portogloboviridae Tubulavirales?

Ampullaviridae ‘Tokiviricetes’ Ovaliviridae MSV

Guttaviridae group antedating the archaeal–bacterial and Finnlakeviridae), one family of archaeal sensitive computational methods resulted in divide. viruses () and the family the discovery of a vast diversity of previously The second realm of dsDNA viruses, Sphaerolipoviridae, in which different genera unknown viruses of this realm that, in all Varidnaviria, is represented in prokaryotes include viruses infecting either bacteria or likelihood, infect prokaryotes63,64. Actual by four families of bacterial viruses archaea. However, mining metagenomic host assignments await but some of these (Tectiviridae, Corticoviridae, Autolykiviridae data for homologues of the DJR MCP using virus genomes were found in geothermal

www.nature.com/nrmicro Perspectives

◀ Fig. 2 | Distribution of known viruses across the evolutionary tree of archaea. The figure shows currently characterized distribution (Fig. 1) the latest phylogenetic tree of archaea, with major classes or orders indicated, and the major groups and is a distinct possibility. However, given of viruses known to infect members of these taxa. Virus groups are represented by symbols depicting the lack of obvious direct ancestors of the the corresponding virions. Coloured and open symbols represent virus isolates and virus genomes or RdRP among RTs of bacterial retroelements putative proviruses, respectively. The symbols are arranged on four concentric rings: the innermost ring depicts the distribution of members of the realm Duplodnaviria; the second ring shows members and the ever expanding diversity of 74,75 of the realm Varidnaviria (families Turriviridae and Sphaerolipoviridae) and family Portogloboviridae; leviviruses through , the third ring shows spindle-​shaped viruses (families , Halspiviridae and Thaspiviridae we consider that the origin of levivirus and unclassified viruses of Thermococcales); and the fourth ring consists of all other virus groups, ancestors at the pre-LUCA​ stage of evolution including the candidate class ‘Tokiviricetes’ (Rudiviridae, and Tristromaviridae)113 and and their presence in the virome of the families , Ovaliviridae, Bicaudaviridae, , Globuloviridae, Clavaviridae, LUCA cannot be ruled out, even if not Spiraviridae and Pleolipoviridae, and unclassified Methanosarcina spherical virus (MSV)114. Putative supported by currently available data. proviruses related to bacterial members of the Tubulavirales were identified in some archaeal Conceivably, at the LUCA stage and later, 68 genomes and are indicated with open symbols. The phylogeny and taxonomic nomenclature were primordial RNA viruses were losing the retrieved from the Genome Taxonomy Database51 and visualized with AnnoTree109. The information evolutionary competition with the more on virus distribution for virus isolates with completely sequenced genomes was obtained from GenBank. The provirus distribution was retrieved from previously published work on Duplodnaviria56,66, efficient dsDNA viruses and went extinct Varidnaviria63,66 and Pleolipoviridae115,116. Additional information was obtained by performing BLASTP in many lines of descent, including archaea. searches117 queried with the major capsid proteins of the corresponding viruses against the archaeal Under this scenario, the renaissance of the genome database at the NCBI. Supplementary Data 1 shows the known virus–host associations across RNA viruses occurred only in eukaryotes, the domains Bacteria and Archaea. In the spreadsheet, a genus name is indicated if a virus is known to arguably due to the combination of barriers infect (or be associated as a provirus with) any member of the phylum or class of Bacteria and Archaea, for DNA virus replication created by the respectively. nucleus and the emergence of the cytosolic endomembrane system that became a niche habitats, strongly suggesting archaeal and circular ssDNA genomes are favourable to RNA virus reproduction72. hosts63,64. Perhaps even more informative has nearly ubiquitous in the environment and are Furthermore, unlike the LUCA, for which been the analysis of bacterial and archaeal genetically highly diverse69–71. Although for most evolutionary reconstructions suggest genomes for the presence of proviruses the vast majority of these viruses the hosts a mesophilic or a moderate thermophilic encoding DJR MCPs, which has substantially are unknown, the few known isolates infect lifestyle81,82, the last common ancestors of expanded the reach of Varidnaviria in both broadly diverse bacteria from five different bacteria and archaea are inferred to have prokaryotic domains65–67. Phylogenetic phyla (Fig. 1). It is thus likely that microviruses been thermophiles or hyperthermophiles83,84. analysis of the concatenated DJR MCP and have a long-standing​ evolutionary history in Extremely high temperatures might be genome packaging ATPases of archaeal bacteria, which probably dates back at least to restrictive for the propagation of RNA varidnaviruses suggested coevolution of this the LBCA (Fig. 3). viruses and thus could represent a bottleneck group of viruses with the major archaeal In the extant biosphere, RNA viruses associated with the demise of the ancestral lineages rather than recent horizontal dominate the eukaryotic virome but are rare RNA virome (and potentially explain transfer from bacteria66. Thus, most likely, in bacteria (compared with DNA viruses) why RNA viruses are unknown in archaea)85. the LUCA virome also included multiple and unknown in archaea72. Bacterial RNA The family Cystoviridae, which includes groups of dsDNA viruses with vertical (both viruses are represented by two families, dsRNA viruses, has an even narrower host single and double) jelly-roll​ MCPs (Fig. 3). the positive-sense​ RNA Leviviridae and range than the leviviruses, suggesting a Furthermore, reconstruction of DJR MCP dsRNA Cystoviridae60,73. The host range of later origin. Thus, of the realm Riboviria, evolution sheds light on the pre-LUCA​ experimentally identified members of both positive-​sense RNA viruses are a putative stages of virus evolution as discussed in the families is limited to a narrow range of component of the LUCA virome, whereas next section. bacteria (almost exclusively Proteobacteria). dsRNA viruses, negative-sense​ RNA viruses Among the ssDNA viruses (realm However, recent metagenomics efforts have and all reverse-transcribing​ viruses appear to Monodnaviria), only members of a single drastically expanded the known diversity of be subsequent additions to the virus world, the order, Tubulavirales (until recently known leviviruses, indicating that their share in the latter two taxa emerging only in eukaryotes. as the family Inoviridae), consisting of prokaryotic virome had been substantially The ancestral status of many archaea- ​ filamentous or rod-shaped​ viruses, appear under-appreciated​ 74,75. specific virus groups is difficult to ascertain. to be hosted by both bacteria and archaea. Reverse-transcribing​ viruses are However, some monophyletic virus However, whereas tubulaviruses are conspicuously confined to eukaryotes assemblages, such as those with spindle- ​ ubiquitous in bacteria, their association with although prokaryotes carry a substantial shaped virions16,86, infect hosts from all major archaea was inferred from putative proviruses diversity of non-packaging​ (that is, archaeal lineages (Fig. 2) and thus can be present in several archaeal lineages, namely non-​viral) retroelements, for example, traced to the last archaeal common ancestor. methanogens and aenigmarchaea68. Such group II introns76,77. The extant distribution Therefore, their presence in the LUCA distribution has been judged best compatible of the viruses of the realm Riboviria, virome, with subsequent loss in the bacterial with horizontal virus transfer from bacteria with its drastic display of eukaryotic over lineage, cannot be ruled out either. to archaea68. Given their ubiquity in bacteria, prokaryotic host ranges, might appear the origin of filamentous bacteriophages paradoxical given the broadly accepted Virus evolution before the LUCA concomitantly or soon after the emergence of RNA world concept of the origin of life78–80, Likely cellular ancestors are identifiable for the last bacterial common ancestor (LBCA) implying the early origin of RdRP and RT many major virion proteins on the basis of appears likely, whereas their presence in and, as a consequence, the primordial status phylogenomic analyses of the corresponding LUCA cannot be ruled out either (Fig. 3). of RNA viruses. The origin of leviviruses protein families87. The reconstruction of Similarly, microviruses with icosahedral within bacteria is best compatible with their the evolutionary paths from ancestral host

Nature Reviews | Microbiology Perspectives

family Portogloboviridae91 contain one SJR MCP92 and thus appear to represent an even earlier evolutionary intermediate (Fig. 4). Indeed, structural comparisons of the SJR MCPs from RNA and DNA viruses show that the portoglobovirus MCP is most closely related to the MCPs of sphaerolipoviruses92. Combined with Bacteria Archaea the inferred presence in the LUCA virome of multiple groups of Varidnaviria, the discovery of the intermediate MCP forms in capsids of extant viruses implies extensive evolution of varidnaviruses predating the LUCA. The families Portogloboviridae and LBCA LACA Sphaerolipoviridae appear to be relics of the pre-LUCA​ evolution of varidnaviruses ? and, accordingly, must have been part of the LUCA virome. For the members of the second realm of LUCA dsDNA viruses, Duplodnaviria, no cellular ancestor was detected in the dedicated ? ? comparative analyses of the sequences and structures of virion proteins87. However, a recent structural comparison has shown that Duplodnaviria Varidnaviria the main scaffold of the HK97-like​ MCP belongs to the strand-helix-​ strand-​ strand​ DJR MCP viruses (SHS2) fold (with the insertion of an Sphaerolipoviridae additional, uncharacterized domain of the DUF1884 (PF08967) family93) and Portogloboviridae appears to be specifically related to the dodecin family of the SHS2-fold​ proteins94. Key Dodecins are widespread proteins in bacteria and archaea that form Tubulavirales DJR MCP viruses Sphaerolipoviridae Microviridae dodecameric compartments involved in Leviviridae Duplodnaviria Plasmaviridae Cystoviridae flavin sequestration and storage95 and are thus plausible ancestors for the HK97-fold​ Bicaudaviridae Pleolipoviridae Globuloviridae Ovaliviridae MSV MCP. Although, in this case, there are no Guttaviridae Spindle-shaped viruses Portogloboviridae Spiraviridae detectable evolutionary intermediates Ampullaviridae ‘Tokiviricetes’ Clavaviridae among viruses, the inferred presence of multiple groups of duplodnaviruses in the LUCA virome implies that the recruitment Fig. 3 | reconstruction of the lUcA virome from the divergence of the bacterial and archaeal of dodecin and the insertion of DUF1884 viromes. This figure shows the hypothetical complex virome of the last universal cellular ancestor (LUCA) as reconstructed from the distribution of viruses among extant phyla of bacteria and archaea. are ancient events. Consistently, viruses Also schematically depicted are the split of the LUCA virome into the viromes of the last bacterial with short tails (podovirus morphology), common ancestor (LBCA) and the last archaeal common ancestor (LACA) as well as the subsequent long non-contractile​ tails (siphovirus diversification that resulted in the extant viromes. The divergence of bacteria and archaea from the morphology) and long contractile tails LUCA is depicted as a bifurcation. Viruses predicted to be associated with the LBCA and the LACA are (myovirus morphology) are all found in indicated next to the corresponding grey spheres. Dotted arrows indicate the possibility that the both bacteria and archaea, indicating that respective viruses might have been represented in the LUCA virome. DJR, double jelly-roll;​ MCP, major the morphogenetic toolkit of viruses with capsid proteins; MSV, Methanosarcina spherical virus. HK97-​fold MCPs attained considerable versatility in the pre-LUCA​ era. proteins to viral capsids sheds light on the the virus world87. Thus, the DJR MCP, in all early stages of evolution of both realms likelihood evolved from the SJR MCP early Virus replication modules of dsDNA viruses (Fig. 4). The DJR MCP of in the evolution of viruses. Remarkably, Each virus genome includes two major the Varidnaviria appears to be a unique apparent evolutionary intermediates are functional modules, one for virion virus feature, with no potential cellular detectable in two virus families. Viruses formation (morphogenetic module) and ancestors detected. By contrast, the SJR in the family Sphaerolipoviridae encode one for genome replication96. The two MCP of numerous RNA viruses that was two ‘vertically’ oriented SJR MCPs that modules rarely display congruent histories also acquired by ssDNA viruses through are likely to represent the ancestral over long evolutionary spans and are instead recombination can be traced to ancestral duplication preceding the fusion that gave exchanged horizontally between different cellular carbohydrate-binding​ proteins, rise to the DJR MCP88–90. The recently groups of viruses through recombination, with several probable points of entry into discovered archaeal dsDNA viruses in the continuously producing new virus lineages.

www.nature.com/nrmicro Perspectives

Cellular SJR proteins SPV1 P23-77 SJR MCP SJR MCP1 Pro-Pd Nucleoplasmin PM2 CBM SJR MCP2 DJR MCP TNF

Acquisition of Duplication Fusion of 2 SJR cellular SJR gene of the SJR mcp genes by an MGE mcp gene (pre-)LUCA cellular genome Portogloboviridae Sphaerolipoviridae DJR MCP viruses Varidnaviria

Duplodnaviria

DUF1884 Insertion of DUF1884 into dodecin and HK97 MCP acquisition by an MGE

Dodecin

Fig. 4 | Evolution of double-stranded DNA viruses antedating the lUcA. carbohydrate-binding​ module (CBM) (PDB ID: 4d3l); dodecin family protein The figure shows the origin of single and double jelly-​roll (SJR and DJR, (PDB ID: 3qkb); and DUF1884 family protein (PDB ID: 2pk8). The SJR and DJR respectively) and HK97-​fold major capsid proteins (MCPs) from cellular MCPs and the corresponding virion symbols of members of the realm ancestors. The capture of the vertical SJR MCP precipitated the emergence Varidnaviria are coloured with different shades of blue. MCPs of duplodna- of the virus realm Varidnaviria, whereas the acquisition of the HK97 MCP viruses are represented by the gp5 protein of HK97 (PDB ID: gave rise to the realm Duplodnaviria. Major evolutionary events are 1ohg); Portogloboviridae is represented by VP4 of polyhedral described next to the corresponding arrows. The likely cellular ancestors of virus 1 (SPV1; PDB ID: 6oj0); Sphaerolipoviridae is represented by a the MCPs are shown with thick coloured lines, and the structures of the sim- heterodimer of MCPs, VP16 and VP17 , of Thermus bacteriophage P23-77 ilarly coloured corresponding proteins are shown next to them: TNF super- (PDB ID: 3zn6); and DJR MCP viruses are represented by the P2 protein of family protein (Protein Data Bank (PDB) ID: 2hey); nucleoplasmin (PDB ID: Pseudoalteromonas bacteriophages PM2 (family Corticoviridae; PDB ID: 2vvf). 1nlq); P domain of a subtilisin-​like (Pro-​Pd) (PDB ID: 3afg); LUCA, last universal cellular ancestor; MGE, mobile genetic elements.

In the previous sections, we show that the and diverse varidnavirus genomes identified appear to represent major components of the morphogenetic modules including the in metagenomic data) as well as in several replication modules of the LUCA virome. vertical jelly-roll​ and HK97-fold​ MCPs can families of archaeal viruses (Halspiviridae, More generally, contemporary be traced to the LUCA virome. Thaspiviridae, Ovaliviridae and duplodnaviruses display a remarkable One of the most widespread replication Pleolipoviridae). In phylogenetic analyses, diversity of genome replication modules, modules in the virosphere is the rolling pPolBs split into two separate from minimalist initiators that recruit circle replication endonuclease (RCRE) corresponding to bacterial and archaeal cellular DNA replisomes for viral genome of the HUH superfamily97. Homologous viruses99,100, strongly suggesting that they replication to near-complete​ virus-encoded​ RCREs are encoded by viruses with SJR have coevolved with bacterial and archaeal DNA replication machineries62,103. In many and DJR MCPs, HK97-like​ MCPs and lineages ever since their divergence from the cases, these DNA replication proteins do not morphologically diverse ssDNA viruses and LUCA. have close cellular homologues, suggesting are also found in many families of bacterial Two other key replication proteins that a long evolutionary history within the virus and archaeal and transposons98. are among the most common in bacterial world. Notably, some of the phage proteins, Thus, RCRE can be confidently assigned and archaeal viruses and MGEs are primases such as helicase loaders, have replaced their to the LUCA virome or (that is, of the archaeo-eukaryotic​ primase (AEP) cellular counterparts at the onset of certain all the MGEs of the LUCA). superfamily101,102 and superfamily 3 bacterial lineages for the replication of Protein-​primed family B DNA (S3H)62. Whereas S3H are exclusive to cellular chromosomes104. Although some polymerases (pPolBs) represent another viruses and MGEs, the viral AEP form tailed bacterial dsDNA viruses encode replication module with a broad distribution specific families that are not closely related replication factors of apparent bacterial spanning several families of viruses and to the cellular homologues. Notably, bacteria origin, in archaeal duplodnaviruses57, the non-viral​ MGEs62. pPolB is present in do not employ AEP for primer synthesis, proteins involved in informational processes, bacteria-​infecting members of the realms and thus bacterial viruses could not have including components of the genome Duplodnaviria (phi29-​like podoviruses) and recruited this protein from their hosts. Thus, replication machinery, DNA repair and RNA Varidnaviria (Tectiviridae, Autolykiviridae AEP and S3H, along with RCRE and pPolB, , are of archaeal type, with none

Nature Reviews | Microbiology Perspectives

of the known archaeal viruses encoding varidnaviruses to the LUCA virome — 3. Moreira, D. & Lopez-​Garcia, P. Ten reasons to exclude viruses from the tree of life. Nat. Rev. Microbiol. 7, components of the bacterial-type​ replication several such events occurred in the earliest 306–311 (2009). machinery61,62. Finally, tailed archaeal phase of the evolution of life, from the 4. Koonin, E. V. & Dolja, V. V. A virocentric perspective on the evolution of life. Curr. Opin. Virol. 3, 546–557 viruses carry archaeal or eukaryotic-like​ primordial pool of replicators to the LUCA. (2013). promoters105,106, consistent with the fact that Moreover, virus evolution during that early 5. Raoult, D. & Forterre, P. Redefining viruses: lessons from . Nat. Rev. Microbiol. 6, 315–319 none of the known archaeal viruses encode era went through multiple, distinct stages as (2008). RNA polymerases16,107, further pointing demonstrated by the reconstructed histories 6. Koonin, E. V., Wolf, Y. I. & Katsnelson, M. I. Inevitability of the emergence and persistence of to long-term​ coevolution with the hosts. of the capsid proteins of the two realms of genetic parasites caused by evolutionary instability These considerations argue against (recent) dsDNA viruses. The cellular SJR-containing​ of parasite-free​ states. Biol. Direct 12, 31 (2017). 7. Szathmary, E. & Demeter, L. Group selection of early horizontal transfers of duplodnaviruses carbohydrate-​binding or nucleoplasmin-​like replicators and the origin of life. J. Theor. Biol. 128, between bacteria and archaea accounting proteins (the ancestors of the varidnavirus 463–486 (1987). 8. Takeuchi, N. & Hogeweg, P. The role of complex for the observed distribution of these DJR MCPs) and the dodecins (the ancestors formation and deleterious for the stability viruses, even though some such transfers of the duplodnavirus MCPs) belong to of RNA-like​ replicator systems. J. Mol. Evol. 65, 668–686 (2007). might have occurred. Thus, analyses of expansive protein families that have already 9. Takeuchi, N. & Hogeweg, P. Evolutionary dynamics of duplodnavirus and varidnavirus genome undergone substantial diversifying evolution RNA-like​ replicator systems: a bioinformatic approach to the origin of life. Phys. Life Rev. 9, 219–263 replication modules complement those of prior to the origins of the two realms of (2012). the morphogenetic modules and suggest viruses. The respective protein families do not 10. Eigen, M. The origin of genetic information: viruses as models. Gene 135, 37–47 (1993). extensive divergence of both groups of belong to the universal core of cellular life, 11. Eigen, M. Selforganization of matter and the evolution viruses in the pre-LUCA​ era. so their apparent pre-LUCA​ diversification of biological macromolecules. Naturwissenschaften 58, 465–523 (1971). further emphasizes the substantial 12. Baltimore, D. Expression of genomes. Conclusions pangenomic, organizational and functional Bacteriol. Rev. 35, 235–241 (1971). 13. Koonin, E. V. et al. Global organization and proposed The informal reconstructions attempted complexity of the LUCA. This conclusion megataxonomy of the virus world. Microbiol. Mol. here suggest a remarkably diverse, complex is indeed compatible with the previous Biol. Rev. 84, e00061-19 (2020). 14. International Committee on Taxonomy of Viruses LUCA virome. This ancestral virome was inferences on the LUCA made from the Executive Committee. The new scope of virus likely dominated by dsDNA viruses from analysis of coalescence in different families taxonomy: partitioning the virosphere into 15 hierarchical ranks. Nat. Microbiol. 5, 668–674 the realms Duplodnaviria and Varidnaviria. of ancient genes, namely that a common (2020). In addition, two groups of ssDNA viruses ancestor containing all the genes shared by 15. Siddell, S. G. et al. Additional changes to taxonomy 108 ratified in a special vote by the international (realm Monodnaviria), namely Microviridae the three domains of life has never existed . committee on taxonomy of viruses (October 2018). and Tubulavirales, can be traced to the Straightforward thinking on the Arch. Virol. 164, 943–946 (2019). 16. Prangishvili, D. et al. The enigmatic archaeal LBCA, whereas spindle-shaped​ viruses, LUCA virome might have envisaged it virosphere. Nat. Rev. Microbiol. 15, 724–739 most likely infected the last archaeal as a domain of RNA viruses descending (2017). 17. Munson-​McGee, J. H., Snyder, J. C. & Young, M. J. common ancestor. The possibility that these from the primordial RNA world. However, Archaeal viruses from high-​temperature environments. virus groups were present in the LUCA the reconstructions suggest otherwise, Genes 9, E128 (2018). 18. Iranzo, J., Koonin, E. V., Prangishvili, D. & virome but were subsequently lost in one indicating that the LUCA was similar Krupovic, M. Bipartite network analysis of the of the two primary domains cannot be to the extant prokaryotes with respect to archaeal virosphere: evolutionary connections between viruses and capsidless mobile elements. dismissed. The point of origin of the extant the repertoire of viruses it hosted. These J. Virol. 90, 11043–11055 (2016). bacterial positive-sense​ RNA viruses (realm findings do not defy the RNA world scenario 19. Krupovic, M., Dolja, V. V. & Koonin, E. V. Origin of Riboviria) remains uncertain, with both but mesh well with the conclusion that viruses: primordial replicators recruiting capsids from hosts. Nat. Rev. Microbiol. 17, 449–458 (2019). bacterial and primordial origins remaining DNA viruses have evolved and diversified 20. Koonin, E. V. Comparative genomics, minimal gene-​ viable scenarios. Further virus prospecting extensively already in the pre-LUCA​ era. sets and the last universal common ancestor. Nat. Rev. Microbiol. 1, 127–136 (2003). efforts could shed light on the history of The RNA viruses, after all, might have been 21. Glansdorff, N., Xu, Y. & Labedan, B. The last universal these viruses. Although the inferred LUCA the first to emerge but, by the time the common ancestor: emergence, constitution and genetic legacy of an elusive forerunner. Biol. Direct 3, virome in all likelihood did not include LUCA lived, they had already been largely 29 (2008). members of many extant groups of viruses supplanted by the more efficient DNA 22. Weiss, M. C. et al. The physiology and habitat of the last universal common ancestor. Nat. Microbiol. 1, of prokaryotes, its apparent complexity virosphere. 16116 (2016). 23. Doolittle, W. F. Uprooting the tree of life. Sci. Am. 282, seems to exceed the typical complexity of 1 ✉ 2 Mart Krupovic , Valerian V. Dolja and 90–95 (2000). well-​characterized viromes of bacterial or Eugene V. Koonin 3 ✉ 24. Puigbo, P., Wolf, Y. I. & Koonin, E. V. Search for a ‘Tree of Life’ in the thicket of the phylogenetic forest. J. Biol. archaeal . These observations imply 1Archaeal Unit, Institut Pasteur, Paris, France. 8, 59 (2009). that the LUCA was not a homogenous 2Department of Botany and Pathology, Oregon 25. Berkemer, S. J. & McGlynn, S. E. A new analysis microbial population but rather a State University, Corvallis, OR, USA. of archaea-bacteria​ domain separation: variable phylogenetic distance and the tempo of early community of diverse microorganisms, 3National Center for Biotechnology Information, evolution. Mol. Biol. Evol. https://doi.org/10.1093/ with a shared gene core that was inherited National Library of Medicine, Bethesda, MD, USA. molbev/msaa089 (2020). 26. Fournier, G. P., Andam, C. P. & Gogarten, J. P. ✉ by all descendant life-forms​ and a diversified e-mail:​ [email protected]; Ancient and the last common pangenome that included various genes [email protected] ancestors. BMC Evol. Biol. 15, 70 (2015). 27. Ouzounis, C. A., Kunin, V., Darzentas, N. & involved in virus–host interactions, in https://doi.org/10.1038/s41579-020-0408-​x Goldovsky, L. A minimal estimate for the gene content particular multiple defence systems. Published online xx xx xxxx of the last universal common ancestor–exobiology from a terrestrial perspective. Res. Microbiol. 157, According to the ‘chimeric’ scenario of 1. Forterre, P. & Prangishvili, D. The great billion- 57–68 (2006). virus origins, different groups of viruses ​year war between - and capsid-​encoding 28. Gogarten, J. P. & Taiz, L. Evolution of proton pumping organisms (cells and viruses) as the major source of ATPases: rooting the tree of life. Photosynth. Res. 33, evolved through recruitment of cellular evolutionary novelties. Ann. N. Y. Acad. Sci. 1178, 137–146 (1992). proteins as virion components19. Here, 65–77 (2009). 29. Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. 2. Koonin, E. V., Senkevich, T. G. & Dolja, V. V. The Algorithms for computing parsimonious evolutionary we present evidence that — contingent on ancient virus world and evolution of cells. Biol. Direct scenarios for genome evolution, the last universal our mapping of both duplodnaviruses and 1, 29 (2006). common ancestor and dominance of horizontal gene

www.nature.com/nrmicro Perspectives

transfer in the evolution of prokaryotes. BMC Evol. themes for virion architecture and assembly with 83. Boussau, B., Blanquart, S., Necsulea, A., Lartillot, N. Biol. 3, 2 (2003). tailed viruses of bacteria. J. Mol. Biol. 397, 144–160 & Gouy, M. Parallel to high temperatures 30. Brown, J. R. & Doolittle, W. F. Archaea and the (2010). in the Archaean eon. Nature 456, 942–945 (2008). prokaryote-to-​ transition. Microbiol. Mol. 57. Sencilo, A. et al. Snapshot of haloarchaeal tailed virus 84. Groussin, M. & Gouy, M. to environmental Biol. Rev. 61, 456–502 (1997). genomes. RNA Biol. 10, 803–816 (2013). temperature is a major determinant of molecular 31. Leipe, D. D., Aravind, L. & Koonin, E. V. Did DNA 58. Cheng, H., Shen, N., Pei, J. & Grishin, N. V. Double-​ evolutionary rates in archaea. Mol. Biol. Evol. 28, replication evolve twice independently? Nucleic Acids stranded DNA bacteriophage prohead protease is 2661–2674 (2011). Res. 27, 3389–3401 (1999). homologous to herpesvirus protease. Protein Sci. 13, 85. Forterre, P. The common ancestor of archaea and 32. Pereto, J., Lopez-​Garcia, P. & Moreira, D. Ancestral 2260–2269 (2004). eukarya was not an archaeon. Archaea 2013, lipid biosynthesis and early membrane evolution. 59. Low, S. J., Dzunkova, M., Chaumeil, P. A., Parks, D. H. 372396 (2013). Trends Biochem. Sci. 29, 469–477 (2004). & Hugenholtz, P. Evaluation of a concatenated protein 86. Krupovic, M., Quemin, E. R., Bamford, D. H., 33. Lombard, J., Lopez-​Garcia, P. & Moreira, D. The early phylogeny for classification of tailed double-​stranded Forterre, P. & Prangishvili, D. Unification of the evolution of lipid membranes and the three domains DNA viruses belonging to the order . globally distributed spindle-​shaped viruses of the of life. Nat. Rev. Microbiol. 10, 507–515 (2012). Nat. Microbiol. 4, 1306–1315 (2019). Archaea. J. Virol. 88, 2354–2358 (2014). 34. Mushegian, A. R. & Koonin, E. V. A minimal gene set 60. Dion, M. B., Oechslin, F. & Moineau, S. Phage 87. Krupovic, M. & Koonin, E. V. Multiple origins of viral for cellular life derived by comparison of complete diversity, genomics and phylogeny. Nat. Rev. capsid proteins from cellular ancestors. Proc. Natl bacterial genomes. Proc. Natl Acad. Sci. USA 93, Microbiol. 18, 125–138 (2020). Acad. Sci. USA 114, E2401–E2410 (2017). 10268–10273 (1996). 61. Philosof, A. et al. Novel abundant oceanic viruses of 88. Santos-Perez,​ I. et al. Structural basis for assembly 35. Forterre, P. Three RNA cells for ribosomal lineages uncultured marine group II euryarchaeota. Curr. Biol. of vertical single β-barrel​ viruses. Nat. Commun. 10, and three DNA viruses to replicate their genomes: 27, 1362–1368 (2017). 1184 (2019). a hypothesis for the origin of cellular domain. Proc. 62. Kazlauskas, D., Krupovic, M. & Venclovas, C. The logic 89. De Colibus, L. et al. Assembly of complex viruses Natl Acad. Sci. USA 103, 3669–3674 (2006). of DNA replication in double-​stranded DNA viruses: exemplified by a halophilic euryarchaeal virus. 36. Raia, P. et al. Structure of the DP1-DP2 PolD complex insights from global analysis of viral genomes. Nucleic Nat. Commun. 10, 1456 (2019). bound with DNA and its implications for the Acids Res. 44, 4551–4564 (2016). 90. Rissanen, I. et al. Bacteriophage P23-77 capsid evolutionary history of DNA and RNA polymerases. 63. Yutin, N., Bäckström, D., Ettema, T. J. G., Krupovic, M. protein structures reveal the archetype of an ancient PLoS Biol. 17, e3000122 (2019). & Koonin, E. V. Vast diversity of prokaryotic virus branch from a major virus lineage. Structure 21, 37. Sauguet, L. The extended “two-​barrel” polymerases genomes encoding double jelly-​roll major capsid 718–726 (2013). superfamily: structure, function and evolution. J. Mol. proteins uncovered by genomic and metagenomic 91. Liu, Y. et al. A novel type of polyhedral viruses Biol. 431, 4167–4183 (2019). sequence analysis. Virol. J. 15, 67 (2018). infecting hyperthermophilic archaea. J. Virol. 91, 38. Koonin, E. V., Krupovic, M., Ishino, S. & Ishino, Y. 64. Kauffman, K. M. et al. A major lineage of non-​tailed e00589-17 (2017). The replication machinery of LUCA: common origin of dsDNA viruses as unrecognized killers of marine 92. Wang, F. et al. A packing for A-​form DNA in an DNA replication and transcription. BMC Biol. 18, 61 bacteria. Nature 554, 118–122 (2018). icosahedral virus. Proc. Natl Acad. Sci. USA 116, (2020). 65. Krupovic, M. & Bamford, D. H. Archaeal proviruses 22591–22597 (2019). 39. Werner, F. & Grohmann, D. Evolution of multisubunit TKV4 and MVV extend the PRD1-adenovirus lineage 93. Kelley, L. L. et al. Structure of the hypothetical RNA polymerases in the three domains of life. Nat. to the phylum euryarchaeota. Virology 375, 292–300 protein PF0899 from pyrococcus furiosus at 1.85 Rev. Microbiol. 9, 85–98 (2011). (2008). a resolution. Acta Crystallogr. Sect. F. Struct. Biol. 40. Martin, W. & Russell, M. J. On the origins of cells: 66. Krupovic, M. et al. Integrated mobile genetic elements Cryst. Commun. 63, 549–552 (2007). a hypothesis for the evolutionary transitions from in Thaumarchaeota. . Microbiol. 21, 2056–2078 94. Holm, L. DALI and the persistence of protein shape. abiotic geochemistry to chemoautotrophic (2019). Protein Sci. 29, 128–140 (2020). prokaryotes, and from prokaryotes to nucleated cells. 67. Jalasvuori, M. & Koskinen, K. Extending the hosts of 95. Grininger, M., Zeth, K. & Oesterhelt, D. Dodecins: Philos. Trans. R. Soc. Lond. B Biol. Sci. 358, 59–83 Tectiviridae into four additional genera of Gram-​ a family of lumichrome binding proteins. J. Mol. Biol. (2003). positive bacteria and more diverse Bacillus species. 357, 842–857 (2006). 41. Koonin, E. V. & Martin, W. On the origin of genomes Virology 518, 136–142 (2018). 96. Krupovic, M. & Bamford, D. H. Order to the viral and cells within inorganic compartments. Trends 68. Roux, S. et al. Cryptic inoviruses revealed as universe. J. Virol. 84, 12476–12479 (2010). Genet. 21, 647–654 (2005). pervasive in bacteria and archaea across earth’s 97. Ilyina, T. V. & Koonin, E. V. motifs 42. Mulkidjanian, A. Y., Galperin, M. Y. & Koonin, E. V. biomes. Nat. Microbiol. 4, 1895–1906 (2019). in the initiator proteins for rolling circle DNA Co-​evolution of primordial membranes and membrane 69. Roux, S., Krupovic, M., Poulet, A., Debroas, D. & replication encoded by diverse replicons from proteins. Trends Biochem. Sci. 34, 206–215 (2009). Enault, F. Evolution and diversity of the microviridae eubacteria, eucaryotes and archaebacteria. Nucleic 43. Villanueva, L. et al. Bridging the divide: bacteria viral family through a collection of 81 new complete Acids Res. 20, 3279–3285 (1992). synthesizing archaeal membrane lipids. Preprint at genomes assembled from virome reads. PLoS One 7, 98. Kazlauskas, D., Varsani, A., Koonin, E. V. & Krupovic, M. bioRxiv https://doi.org/10.1101/448035 (2018). e40418 (2012). Multiple origins of prokaryotic and eukaryotic single-​ 44. Caforio, A. et al. Converting Escherichia coli into 70. Creasy, A., Rosario, K., Leigh, B. A., Dishaw, L. J. stranded DNA viruses from bacterial and archaeal an archaebacterium with a hybrid heterochiral & Breitbart, M. Unprecedented diversity of ssDNA plasmids. Nat. Commun. 10, 3425 (2019). membrane. Proc. Natl Acad. Sci. USA 115, phages from the family Microviridae detected 99. Redrejo-​Rodríguez, M. et al. Primer-​independent DNA 3704–3709 (2018). within the gut of a protochordate model organism synthesis by a family B DNA polymerase from self-​ 45. Rajagopal, M. & Walker, S. Envelope structures of (Ciona robusta). Viruses 10, 404 (2018). replicating mobile genetic elements. Cell Rep. 21, gram-positive​ bacteria. Curr. Top. Microbiol. Immunol. 71. Tisza, M. J. et al. Discovery of several thousand highly 1574–1587 (2017). 404, 1–44 (2017). diverse circular DNA viruses. eLife 9, e51971 (2020). 100. Kim, J. G. et al. Spindle-​shaped viruses infect marine 46. Auer, G. K. & Weibel, D. B. Bacterial cell mechanics. 72. Koonin, E. V., Dolja, V. V. & Krupovic, M. Origins and ammonia-oxidizing​ thaumarchaea. Proc. Natl Acad. Biochemistry 56, 3710–3724 (2017). evolution of viruses of eukaryotes: the ultimate Sci. USA 116, 15645–15650 (2019). 47. Sleytr, U. B., Schuster, B., Egelseer, E. M. & Pum, D. modularity. Virology 479-480, 2–25 (2015). 101. Iyer, L. M., Koonin, E. V., Leipe, D. D. & Aravind, L. S-layers:​ principles and applications. FEMS Microbiol. 73. Wolf, Y. I. et al. Origins and evolution of the global Origin and evolution of the archaeo-​eukaryotic Rev. 38, 823–864 (2014). RNA virome. mBio 9, e02329-18 (2018). primase superfamily and related palm-​domain 48. Vernikos, G., Medini, D., Riley, D. R. & Tettelin, H. 74. Krishnamurthy, S. R., Janowski, A. B., Zhao, G., proteins: structural insights and new members. Ten years of pan-​genome analyses. Curr. Opin. Barouch, D. & Wang, D. Hyperexpansion of RNA Nucleic Acids Res. 33, 3875–3896 (2005). Microbiol. 23, 148–154 (2015). bacteriophage diversity. PLoS Biol. 14, e1002409 102. Kazlauskas, D. et al. Novel families of archaeo-​ 49. Medini, D., Donati, C., Tettelin, H., Masignani, V. & (2016). eukaryotic primases associated with mobile genetic Rappuoli, R. The microbial pan-​genome. Curr. Opin. 75. Callanan, J. et al. Expansion of known ssRNA phage elements of bacteria and archaea. J. Mol. Biol. 430, Genet. Dev. 15, 589–594 (2005). genomes: from tens to over a thousand. Sci. Adv. 6, 737–750 (2018). 50. Puigbo, P., Lobkovsky, A. E., Kristensen, D. M., eaay5981 (2020). 103. Weigel, C. & Seitz, H. Bacteriophage replication Wolf, Y. I. & Koonin, E. V. Genomes in turmoil: 76. Gladyshev, E. A. & Arkhipova, I. R. A widespread class modules. FEMS Microbiol. Rev. 30, 321–381 quantification of genome dynamics in prokaryote of -​related cellular genes. Proc. (2006). supergenomes. BMC Biol. 12, 66 (2014). Natl Acad. Sci. USA 108, 20311–20316 (2011). 104. Brezellec, P. et al. Domestication of lambda phage 51. Parks, D. H. et al. A standardized bacterial taxonomy 77. Zimmerly, S. & Wu, L. An unexplored diversity of genes into a putative third type of replicative helicase based on genome phylogeny substantially revises the reverse transcriptases in bacteria. Microbiol. Spectr. matchmaker. Genome Biol. Evol. 9, 1561–1566 tree of life. Nat. Biotechnol. 36, 996–1004 (2018). 3, MDNA3-0058-2014 https://doi.org/10.1128/ (2017). 52. Embley, T. M. & Martin, W. Eukaryotic evolution, microbiolspec.MDNA3-0058-2014 (2015). 105. Pfister, P., Wasserfallen, A., Stettler, R. & Leisinger, T. changes and challenges. Nature 440, 623–630 78. Gilbert, W. The origin of life: the RNA world. Nature Molecular analysis of methanobacterium phage (2006). 319, 618 (1986). psiM2. Mol. Microbiol. 30, 233–244 (1998). 53. Kristensen, D. M., Cai, X. & Mushegian, A. 79. Joyce, G. F. The antiquity of RNA-​based evolution. 106. Iro, M. et al. The lysogenic region of virus phiCh1: Evolutionarily conserved orthologous families in Nature 418, 214–221 (2002). identification of a repressor-​operator system and phages are relatively rare in their prokaryotic hosts. 80. Bernhardt, H. S. The RNA world hypothesis: the worst determination of its activity in halophilic Archaea. J. Bacteriol. 193, 1806–1814 (2011). theory of the early evolution of life (except for all the Extremophiles 11, 383–396 (2007). 54. Iranzo, J., Krupovic, M. & Koonin, E. V. The double-​ others)(a). Biol. Direct 7, 23 (2012). 107. Dellas, N., Snyder, J. C., Bolduc, B. & Young, M. J. stranded DNA virosphere as a modular hierarchical 81. Catchpole, R. J. & Forterre, P. The evolution of reverse Archaeal viruses: diversity, replication, and structure. network of gene sharing. mBio 7, e00978-16 (2016). gyrase suggests a nonhyperthermophilic last universal Annu. Rev. Virol. 1, 399–426 (2014). 55. Duda, R. L. & Teschke, C. M. The amazing HK97 fold: common ancestor. Mol. Biol. Evol. 36, 2737–2747 108. Zhaxybayeva, O. & Gogarten, J. P. Cladogenesis, versatile results of modest differences. Curr. Opin. (2019). coalescence and the evolution of the three domains Virol. 36, 9–16 (2019). 82. Cantine, M. D. & Fournier, G. P. Environmental of life. Trends Genet. 20, 182–187 (2004). 56. Krupovic, M., Forterre, P. & Bamford, D. H. adaptation from the origin of life to the last universal 109. Mendler, K. et al. AnnoTree: visualization and Comparative analysis of the mosaic genomes of tailed common ancestor. Orig. Life Evol. Biosph. 48, 35–54 exploration of a functionally annotated microbial tree archaeal viruses and proviruses suggests common (2018). of life. Nucleic Acids Res. 47, 4442–4448 (2019).

Nature Reviews | Microbiology Perspectives

110. Roux, S., Hallam, S. J., Woyke, T. & Sullivan, M. B. 115. Liu, Y. et al. Identification and characterization of and E.V.K. wrote the manuscript. M.K., V.V.D. and E.V.K. Viral dark matter and virus-​host interactions resolved SNJ2, the first temperate pleolipovirus integrating reviewed and edited the manuscript before submission. from publicly available microbial genomes. eLife 4, into the genome of the SNJ1-lysogenic archaeal strain. e08490 (2015). Mol. Microbiol. 98, 1002–1020 (2015). Competing interests 111. Krupovic, M. & Bamford, D. H. Putative prophages 116. Wang, J. et al. A novel family of tyrosine The authors declare no competing interests. related to lytic tailless marine dsDNA phage PM2 are encoded by the temperate pleolipovirus SNJ2. Nucleic widespread in the genomes of aquatic bacteria. BMC Acids Res. 46, 2521–2536 (2018). Peer review information Genomics 8, 236 (2007). 117. Altschul, S. F. et al. Gapped BLAST and PSI-​BLAST: Nature Reviews Microbiology thanks J. P. Gogarten, 112. Pawlowski, A., Rissanen, I., Bamford, J. K., Krupovic, M. a new generation of protein database search programs. P. López-​García and the other, anonymous, reviewer(s) for & Jalasvuori, M. Gammasphaerolipovirus, a newly Nucleic Acids Res. 25, 3389–3402 (1997). their contribution to the peer review of this work. proposed bacteriophage genus, unifies viruses of halophilic archaea and thermophilic bacteria within Acknowledgements Publisher’s note the novel family Sphaerolipoviridae. Arch. Virol. 159, E.V.K. is supported by the funds of the Intramural Research Springer Nature remains neutral with regard to jurisdictional 1541–1554 (2014). Program of the National Institutes of Health of the USA. M.K. claims in published maps and institutional affiliations. 113. Wang, F. et al. Structure of a filamentous virus was supported by l’Agence Nationale de la Recherche grant uncovers familial ties within the archaeal virosphere. ANR-17-CE15-0005-01.​ Supplementary information Virus Evol. 6, veaa023 (2020). Supplementary information is available for this paper at 114. Weidenbach, K. et al. Methanosarcina spherical virus, Author contributions https://doi.org/10.1038/s41579-020-0408-​x. a novel archaeal lytic virus targeting methanosarcina M.K. and E.V.K. researched data for article. M.K., V.V.D. and strains. J. Virol. 91, e00955-17 (2017). E.V.K. substantially contributed to discussion of content. M.K. © Springer Nature Limited 2020

www.nature.com/nrmicro