<<

Yaravirus: A novel 80-nm infecting castellanii

Paulo V. M. Borattoa,b,1, Graziele P. Oliveiraa,b,1, Talita B. Machadoa, Ana Cláudia S. P. Andradea, Jean-Pierre Baudoinb,c, Thomas Klosed, Frederik Schulze, Saïd Azzab,c, Philippe Decloquementb,c, Eric Chabrièreb,c, Philippe Colsonb,c, Anthony Levasseurb,c, Bernard La Scolab,c,2, and Jônatas S. Abrahãoa,2

aLaboratório de Vírus, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil; bMicrobes, Evolution, Phylogeny and Infection, Aix-Marseille Université UM63, Institut de Recherche pour le Développement 198, Assistance Publique–Hôpitaux de Marseille, 13005 Marseille, France; cInstitut Hospitalo-Universitaire Méditerranée Infection, Faculté de Médecine, 13005 Marseille, France; dDepartment of Biological Sciences, Purdue University, West Lafayette, IN 47907; and eDepartment of Energy Joint Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

Edited by James L. Van Etten, University of Nebraska-Lincoln, Lincoln, NE, and approved June 2, 2020 (received for review January 29, 2020) Here we report the discovery of Yaravirus, a lineage of amoebal others. NCDLVs have dsDNA and were proposed to virus with a puzzling origin and evolution. Yaravirus presents share a monophyletic origin based on criteria that include the 80-nm-sized particles and a 44,924-bp dsDNA genome encoding sharing of a set of ancestral vertically inherited (17, 18). for 74 predicted . Yaravirus genome annotation showed From this handful set of genes, a core cluster is found to be that none of its genes matched with sequences of known organ- present in almost all members of the NCLDVs, being composed isms at the level; at the level, six predicted by five distinct genes, namely a DNA polymerase B, a proteins had distant matches in the nr database. Complimentary primase-helicase, a packaging ATPase, a factor, and prediction of three-dimensional structures indicated possible func- a major (MCP) for which the double jelly-roll tion of 17 proteins in total. Furthermore, we were not able to (DJR) fold constitutes the main protein architectural (19, retrieve viral genomes closely related to Yaravirus in 8,535 publicly 20). Recently, the International Committee on of available metagenomes spanning diverse around the globe. (ICTV) brought forward a proposal for megataxonomy of The Yaravirus genome also contained six types of tRNAs that did not viruses (21). The DJR major capsid protein (MCP) supermodule match commonly used codons. Proteomics revealed that Yaravirus of DNA viruses includes NCLDVs and other icosahedral viruses particles contain 26 viral proteins, one of which potentially represent- that infect prokaryotes and . In addition to the signa- ing a divergent major capsid protein (MCP) with a predicted double ture DJR-MCPs, the majority of these viruses also encode for jelly-roll . Structure-guided phylogeny of MCP suggests that additional single jelly-roll minor capsid proteins (e.g., penton Yaravirus groups together with the MCPs of Pleurochrysis endemic proteins) and genome packaging ATPases of the FtsK-HerA su- viruses. Yaravirus expands our knowledge of the diversity of DNA perfamily. viruses. The phylogenetic distance between Yaravirus and all other According to this proposal, the evolutionary conservation of viruses highlights our still preliminary assessment of the genomic the three genes of the morphogenetic module in the DJR-MCP diversity of eukaryotic viruses, reinforcing the need for the isolation of new viruses of . Significance Yaravirus | ORFan | NCLDV | | capsid Most of the known viruses of have been seen to share iral evolution and classification have been subjects of an some features that eventually prompted authors to classify Vintense debate, especially after the discovery of giant viruses them into common evolutionary groups. Here we describe that infect protists (1–4). These viruses are predominantly Yaravirus, an entity that could represent either the first iso- characterized by the large size of their virions and genomes lated virus of Acanthamoeba spp. out of the group of NCLDVs or, in an alternative scenario, a distant and extremely reduced encoding hundreds to thousands of proteins, of which a large virus of this group. Contrary to what is observed in other iso- proportion currently remain without homologs in public se- lated viruses of amoeba, Yaravirus does not have a large/giant quence databases (5–10). These coding sequences are commonly particleoracomplexgenome,butatthesametimecarriesa referred as ORFans, and due to the lack of phylogenetic in- number of previously undescribed genes, including one encoding formation, their origin and function still represent a mystery – a divergent major capsid protein. Metagenomic approaches also (11 14). Strikingly, the increasing number of available viral ge- testified for the rarity of Yaravirus in the environment. nomes demonstrates that there is a huge set and great diversity of genes without homologs in current databases, which needs to Author contributions: F.S., P.D., E.C., P.C., A.L., B.L.S., and J.S.A. designed research; be further explored (11). Importantly, many amoebal virus P.V.M.B., G.P.O., T.B.M., A.C.S.P.A., J.-P.B., T.K., F.S., S.A., P.D., P.C., A.L., B.L.S., and ORFan genes have already been proven to be functional, being J.S.A. performed research; E.C., B.L.S., and J.S.A. contributed new reagents/analytic tools; P.V.M.B., G.P.O., J.-P.B., T.K., F.S., S.A., P.D., E.C., P.C., A.L., B.L.S., and J.S.A. analyzed data; expressed and encoding for components of the viral particles (6, and P.V.M.B., G.P.O., T.K., F.S., P.C., A.L., B.L.S., and J.S.A. wrote the paper. 15). However, the large set of ORFans makes it difficult to The authors declare no competing interest. predict the biology of viruses discovered through cultivation- This article is a PNAS Direct Submission. independent methods, such as metagenomics analysis, reinforc- Published under the PNAS license. ing the need for the complementary isolation and experimental Data deposition: Yaravirus genome data have been deposited at GenBank (https://www. characterization of new viruses. All currently known isolated ncbi.nlm.nih.gov/genbank/) under accession number MT293574. amoebal viruses are related to nucleocytoplasmic large DNA 1P.V.M.B. and G.P.O. contributed equally to this work. viruses (NCLDVs) (16). This group comprises families of eukaryotic 2To whom correspondence may be addressed. Email: [email protected] or viruses (, Asfarviridae, , , Phycodna- [email protected]. viridae, , and ) as well as other amoebal This article contains supporting information online at https://www.pnas.org/lookup/suppl/ virus lineages including pithoviruses, , molliviruses, doi:10.1073/pnas.2001637117/-/DCSupplemental. medusaviruses, pacmanviruses, faustoviruses, klosneuviruses, and

www.pnas.org/cgi/doi/10.1073/pnas.2001637117 PNAS Latest Articles | 1of8 Downloaded by guest on September 25, 2021 supermodule, to the exclusion of all other viruses, justifies the plasma membrane, suggesting the participation of a host re- establishment of a named Varidnaviria (21). Although ceptor in to internalize the virions (Fig. 1B, SI Appendix, some NCLDVs, as pandoraviruses, seem to have lost the DJR- Fig. S2, blue arrows, and Movie S1). The replication cycle is then MCP gene, their genome harbors a large set of genes that sup- followed by the incorporation of individual or grouped Yaravirus port their classification into NCLDVs (and Varidnaviria, conse- particles inside endocytic vesicles, which, in a later stage of in- quently). Here we describe the discovery of Yaravirus, an fection, are found next to a region occupied by the nucleus amoeba virus with a puzzling origin and evolution. This virus has (Fig. 1 B–D and SI Appendix, Fig. S2, red arrows). The viral a genome that mainly consists of a near full set of genes that are factory then takes place and completely develops into its mature ORFans. Yaravirus could represent either the first isolated virus form, replacing the region formerly occupied by the nucleus of Acanthamoeba spp. out of the group of NCLDVs, inaugu- and recruiting mitochondria around its boundaries, likely to rating a group belonging to Varidnaviria, or, in an alternative optimize the availability of energy to construct the virions evolutive scenario, a distant and extremely reduced virus of this (Fig. 1E). The step corresponding to viral morphogenesis hap- group. Viral particles have a size of 80 nm, escaping the concept pens similarly as how it is observed for other viruses of amoeba. of large and giant viruses. Thus, Yaravirus expands our knowl- First, it starts by the appearance of small crescents in the edge about viral diversity and evolution. electron-lucent region of the factory (Fig. 1E and 2A). Next, step by step, the virions gain an icosahedral symmetry by the se- Results quential addition of more than one layer of protein or mem- Yaravirus Isolation and Replication Cycle. A prospecting study was branous components around its structure (Fig. 2A). The conducted by collecting samples of muddy water from creeks of constructed virions, with a capsid still empty, start then to mi- an artificial urban lake called Pampulha, located at the city of grate to the periphery of the viral factory, where there is the Belo Horizonte, Brazil. Here, by using a protocol of direct in- accumulation of corpuscular electron-dense material (Figs. 1E oculation of water samples on cultures of Acanthamoeba cas- and 2 B and C, SI Appendix, Fig. S3 A–C, and Movie S2). These tellanii (Neff strain, ATCC 30010), we have managed to isolate enucleations are scattered throughout the periphery of the in- an amoebal virus that we named Yaravirus brasiliensis,asa fected cell and seem to represent different regions or morpho- tribute to an important character (Yara, the mother of waters) of genesis points where the final step for Yaravirus maturation the mythological stories of the Tupi-Guarani indigenous tribes occurs. In these regions, the capsid of Yaravirus is filled with (22). Negative staining revealed the presence of small icosahe- electron-dense material and the virus is finally ready to be re- dral particles on the supernatant of infected amoebal cells, leased (Fig. 2C, red arrow, and SI Appendix, Fig. S4). Sometimes measuring about 80 nm in diameter (Fig. 1A). Cryoelectron it is also possible to observe several particles of Yaravirus being microscopy images of purified particles suggest that Yaravirus packed in the interior of vesicle-like structures, suggesting a particles present two capsid shells, as previously described for potential release by exocytosis, as observed for other viruses of , although future studies are needed to confirm this amoeba (24, 25) (Fig. 2D and Movie S3). Most of the viral (23) (SI Appendix, Fig. S1). shedding, however, still occurs by lysis of the amoebal cell, fol- At the beginning of infection in A. castellanii, Yaravirus par- lowed by the release of Yaravirus particles, which later reach the ticles are found attached to the outside part of the amoebal supernatant of the infected culture, or sometimes might get

Fig. 1. Yaravirus particle and the beginning of the viral cycle. (A) Negative staining of an isolated Yaravirus virion. (Scale bar: 100 nm.) (B) Transmission electron microscopy (TEM) representing the beginning of the viral cycle, in which one particle is associated to the host cell membrane and the second one was already incorporated by the amoeba inside an endocytic vesicle. (Scale bar: 200 nm.) (C) Detailed image of an incorporated Yaravirus particle in the interior of an endocytic vesicle. (Scale bar: 100 nm.) (D) Viral uptake by the amoeba may occur individually but also in groups of particles, as observed in the micrograph. (Scale bar: 250 nm.) (E) The viral factory completely develops, occupying the nuclear region and recruiting mitochondria around it. Two different regions can be distinct: an electron-lucent region where the virions are assembled as empty shells and a second region formed by several electron-dense points where the genome is packaged inside the particles. (Scale bar: 500 nm.)

2of8 | www.pnas.org/cgi/doi/10.1073/pnas.2001637117 Boratto et al. Downloaded by guest on September 25, 2021 A CD

B

E

F

Fig. 2. Yaravirus morphogenesis and release. (A) The virions are assembled by the addition of more than one layer of protein or membranous components around its structure. (Scale bar: 70 nm.) (B) The particles then start to migrate to the periphery of the cell, where there is the presence of several electron- dense points that function as morphogenetic structures to package the DNA inside the Yaravirus particles (regions inside dashed lines). (Scale bar: 1,000 nm.) (C) Detailed image of the morphogenetic regions where the DNA (red arrow) is incorporated inside the Yaravirus virion (black arrow). (Scale bar: 150 nm.)(D) Sometimes, the final step of is marked by the particles being packaged inside vesicle-like structures, suggesting a potential release by exocytosis. (Scale bar: 500 nm.) (E) Most of the particles, however, are released by cellular lysis and have a high affinity to the membranes of cellular debris. MICROBIOLOGY (Scale bar: 150 nm.) (F) Graph comparing concomitantly the decrease of host cell numbers (red bars) with the increase of Yaravirus genome during the infection (black line). Replication of viral genome was measured by qPCR and calculated by delta-delta Ct.

attached to the debris of the cellular membranes (Fig. 2E). We conserved promoter, as opposed to what is observed in many have also evaluated Yaravirus replication by concomitantly in- other NCLDV members (26). vestigating the decrease of the host cell numbers together with By considering only the portions of genome that are part of the increase of viral genome during infection. Interestingly, coding regions, Yaravirus also has a similar coding capacity as during the first hours of infection, the A. castellanii cultures seem observed in other viruses when their genome was first annotated, to progressively grow until 24 h.p.i., showing a fastidious char- ∼90% (SI Appendix, Fig. S6B). Surprisingly, Yaravirus genome acter for Yaravirus replication (Fig. 2F). The cells then start to annotation showed that none of its genes matched with se- suffer lysis induced by the virus only after 72 h.p.i. (Fig. 2F). On quences of known when we compared them at the the same level, from 96 h.p.i. to 7 d post infection, there is no nucleotide level. When we looked for homology at the amino change of the detection levels of Yaravirus genome and the lysis acid levels, we found that only two predicted proteins had hits in seems to stop, and the remaining trophozoites turn to cysts. the Pfam-A database and, in total, six had distant matches in the nr database. Therefore, considering the same criteria that have ’ Genome. Sequencing of the Yaravirus genome has shown the been used to analyze other giant viruses genomes, about 90% = presence of a double-stranded DNA molecule with a length of (n 68) of the Yaravirus predicted genes are ORFans. The six 44,924 bp and harboring a total of 74 predicted genes (Fig. 3A). genes whose product has some homology with known protein Although two of these genes (73 and 74) seemed to be truncated sequences (Fig. 3A and Table 1) are homologous to fragments of and do not start with codons belonging to the methionine amino proteins predicted to have different functions, such as an exo- acid, both were detected in the proteomic analysis (Yaravirus nuclease/recombinase bacterial protein (gene 2; best hit, Tim- onella senegalensis), a hypothetical protein (gene 3; best hit, A. Proteomics). Despite a smaller genome than other viruses of castellanii), a hypothetical protein (gene 28; best hit, Acytoste- amoeba, Yaravirus encodes for six tRNA genes: tRNA-Ser (gct), lium subglobosum LB1, a ), a packaging ATPase tRNA-Ser (tga), tRNA-Cys (gca), tRNA-Asn (gtt), tRNA-His (gene 40; best hit, Pleurochrysis endemic virus), a conserved hy- A D SI Appendix (gtg), and tRNA-Ile (aat) (Fig. 3 and and , pothetical protein (gene 46; best hit, Melbournevirus, a marseil- Tables S1 and S2 and Dataset S1). All of them are colocated on levirus strain), and a bifunctional DNA primase/polymerase an intergenic region between genes 29 and 30 (Fig. 3 A and D). (gene 69; best hit, Marinobacter sp., Alphaproteobacteria; Ta- In contrast to tRNA genes in Tupanviruses, we did not observe a ble 1). Complimentary prediction of three-dimensional struc- correlation between the Yaravirus tRNA isoacceptors and the tures of these proteins indicated a potential function of 11 more codons most frequently used by the virus or its A. castellanii host genes (Table 2 and Yaravirus Proteomics). If we consider an (SI Appendix, Fig. S5). The genome has a GC content of 57.9%, additional 11 genes whose functions were predicted by structural which is one of the highest found in any amoebal virus discov- analyses, the Yaravirus proportion of ORFans would be 80% ered to date (SI Appendix, Fig. S6A). When analyzed gene by (n = 57; still a high number, considering that, usually, in other gene, Yaravirus has a spectrum of GC content that varies be- studies involving giant viruses, methodologies based on protein tween 46% and 65% (SI Appendix, Table S3). The analysis of the structures are not used for this accounting) (27, 28). Phylogenetic intergenic regions of the genome (46% GC content) did not analyses were then performed for genes 02, 40, 46, and 69 after reveal any enriched sequence motifs that might indicate a aligning them with protein sequences of similar function belonging

Boratto et al. PNAS Latest Articles | 3of8 Downloaded by guest on September 25, 2021 to different members of the virosphere and to organisms of the ATPase, the Yaravrus branched within the Mimiviridae as part three cellular domains of . Other identified genes (Table 2) did of a highly supported made up by its distant metagenomic nothaveenoughgeneticinformationtobeincludedinaphyloge- relatives and Pleurochrysis sp. endemic viruses (Fig. 4 and SI netic analysis. For analysis corresponding to the putative exo- Appendix, Fig. S11). In contrast to known members of the nuclease/recombinase (gene 02), three major groups were observed Mimiviridae, viral contigs and viral genomes in this clade fea- to construct the morphology of the tree. Yaravirus was placed in tured a high GC content, with up to 62%. Adding sequences of one of those , clustering with some members of Eukarya, and ATPases to this dataset resulted in a specifically with stony coral and insects (SI Appendix,Fig.S7). slightly altered topology, but the position of Yaravirus remained Analyses of gene 40 (virion packing ATPase) revealed that Yar- stable, not grouping with these additional elements (SI Appendix, avirusclusteredinapolyphyleticbranch, with members belonging Fig. S12). We also searched for proteins similar to the Yaravirus to Mimiviridae family, (although many of these sequences putative MCP but were not able to retrieve closely related se- seem to represent misclassified NCLDVs from metagenome- quences in the metagenomic data. In parallel, we used 18 hidden assembled genomes), and Pleurochrysis sp. endemic virus 1a and Markov models to detect MCPs in 235 NCLDV reference ge- 2(SI Appendix,Fig.S8). For the phylogenetic analysis corre- nomes (plus contigs of three Pleurochrysis endemic viruses). Af- sponding to gene 46 (hypothetical protein conserved in Marseille- ter dereplicating the hits by clustering at 90% sequence similarity virus), Yaravirus was clustered with strains (SI using accurate mode in cd-hit, we prepared a structure-guided Appendix,Fig.S9). For the last tree, representing analysis for gene alignment with Expresso in T-Coffee (using PDB structures), and 69 (bifunctional DNA primase/polymerase), we have observed that also a conventional alignment with mafft-ginsi (–unalignlevel 0.8– Yaravirus was clustered with members of eukaryotes corresponding allowshift). Both alignments returned surprisingly good results for to the Streblomastix and Phytophthora groups (SI Appendix,Fig. Yaravirus MCP, and then phylogenetic trees were calculated with S10). However, it should be noted that, in a previous study, the iq-tree LG+F+R8. Yaravirus MCP was shown to group together authors detected sequences of genes among the Phy- with the MCPs of Pleurochrysis endemic viruses in a well-supported tophthora parasitic strain INRA-310 genome (29). After all those clade; however, in contrast to the ATPase tree, they were affiliated analyses, it is important to note that, although Yaravirus has some with (chlorella viruses in particular) and not with genes with representatives in the genome of other organisms, their the Mimiviridae (Fig. 5 and SI Appendix, Figs. S13 and S14 and homology with orthologs is very low (25.24 to 44.12%), highlighting Supplementary Files). that Yaravirus genome content is essentially divergent among the Finally, trying to investigate a potential relationship between other members of the virosphere (Table 1). Yaravirus and different members of the Pleurochrysis sp. en- In order to detect sequences related to Yaravirus, we surveyed demic virus group, we have looked for protein similarities be- 8,535 publicly available metagenomes in the IMG/M database tween these organisms. Comparisons were made using the that have been generated from samples from diverse habitats BLASTp database and have suggested some ortholog candi- across our planet (30). We discovered distant homologs of the dates, but with a low percentage of protein identity (around 24 to Yaravirus ATPase (NCVOG0249) with an amino acid homology 33%). For Pleurochrysis sp. endemic virus 1a and Pleurochrysis sp. of up to 33.9% in the metagenomic data, while the closest ho- endemic virus 1b (Genbank accession codes KY131436 and molog in the NCBI nr database was that of Pleurochrysis sp. KY203336, respectively), there is some similarity with genes 28, endemic virus 1a, with 33.1%. In a phylogenetic tree of the viral 40, and 41 of Yaravirus. For Pleurochrysis sp. endemic virus 2

Fig. 3. Yaravirus genome features. (A) Circular representation of Yaravirus genome highlighting the predicted ORFs (arrows). Red arrows represent ORFs predicted by analyses of similarity of amino acid sequences with information regarding their best hits. Black dots indicate ORFs encoding predicted proteins whose functions were suggested by HHPred (structural analyses). Yellow stars indicate proteins found in virion proteomics. (B) The percentage of ORFan genes among the complete genome of different viruses of amoeba is represented by the graph with red scale bars. (C) The graph with greenish scale bars represents the absolute number of genes with homologs in databases (non-ORFan genes) for each of the same amoebal viruses previously analyzed. (D)Allof the six Yaravirus predicted tRNAs, as well as their corresponding sequences, are pictured with information about their anticodon (in parentheses), their nucleotide length, the percentage of GC content, and the position in the intergenic regions of genes 29 and 30. Regarding the asterisk, note that the number of ORFan/non-ORFan genes represented in C and D do not take into account the structural annotation made by the HHpred servers, as for most of the amoebal viruses represented in the graphs.

4of8 | www.pnas.org/cgi/doi/10.1073/pnas.2001637117 Boratto et al. Downloaded by guest on September 25, 2021 Table 1. Yaravirus genes with similarity on current databases and their best-hits (BLASTp) Yaravirus gene ID Best hit Total score Query cover, % E-value Identity, % Annotation of best BLASTp hit

2 Timonella senegalensis WP_019148817.1 67.0 87 1e-09 27.88 Exonuclease/recombinase 3 A. castellanii XP_004339080.1 51.6 80 1e-06 44.12 Hypothetical protein 28 Acytostelium subglobosum LB1 57.8 85 4e-07 27.59 Hypothetical protein XP_012747655.1 40 Pleurochrysis sp. endemic virus 1a 121 66 4e-28 33.05 Virion packaging ATPase AUD57256.1 46 Melbournevirus YP_009094634.1 181 88 5e-47 32.63 Hypothetical protein conserved in marseilleviruses 69 Marinobacter sp. MAB50943.1 106 35 8e-20 25.24 Bifunctional DNA primase/polymerase

(Genbank accession code KY346835), two genes are related to sequences represented by genes 41 and 46, we observed frag- gene 28 in Yaravirus and two others are similar to genes 40 and ments of protein resembling the three-dimensional structure of 02. Finally, for Pleurochrysis sp. endemic virus unk (accession the capsid of other viruses (Dataset S2 and Table 2). With a code KY203337), we have found similarity with gene 69 of confidence of 97%, a relevant portion (65%) of gene 41 was Yaravirus. found to have a structural similarity with the double jelly-roll domain of the MCP of the Paramecium bursaria Chlorella virus Yaravirus Proteomics. As aforementioned, most Yaravirus pro- type 1 (Dataset S2). Gene 46-encoded predicted protein was teins had no detectable homologs in public databases. This pe- found to have structural similarity with bacterial secreted protein culiarity prompted us to have a closer look at the proteins pcsB and tail needle protein (Table 2 and Dataset S2), a portion responsible to form the mature particles of Yaravirus. Proteo- composed of a long alpha-helix. The function of protein encoded mics revealed a total of 26 viral proteins present in purified by gene 46 remains to be investigated. Therefore, we were not particles (SI Appendix, Table S4). We then analyzed the pre- able to convincingly find any minor capsid protein. It is also

dicted three-dimensional structures of those 26 proteins by using important to note that sequences represented by gene 46 are the MICROBIOLOGY three platforms for domain comparison, the HHpred, the same described earlier to be highly conserved in marseilleviruses Phyre2, and the Swiss-model tools (30–37). Only five sequences and in medusaviruses (Table 1). Finally, the last two proteins (genes 11, 12, 28, 41, and 46) were observed to have structural observed in the proteome that had matches with structural de- features similar to known proteins (Table 2 and Dataset 2). That posits in public databases are represented by genes 11 and 12, means that about 80% of its virion proteome consists of the first one predicted by HHpred to encode for an adiponectin ORFans. It is important to mention that the same approach (in and the second one for a protein called cerebellin-1 (Table 2). silico structure prediction) has been used in parallel to evaluate all of the 74 predicted genes on the genome of Yaravirus, Discussion resulting in total in the discovery of 17 gene-encoded products In recent years, amoebal large and giant viruses have frequently with structural resemblance to other proteins in public databases been found around the world (5–10, 22, 24, 27, 38–41). Here, (Table 2). Proteomics data revealed that the most abundant we describe Yaravirus brasiliensis, an 80-nm-sized virus with a proteins in the viral particles corresponded to genes 41, 46, and genome containing a notable proportion of genes that have 51 (from most to least abundant; SI Appendix, Table S4). While, never been observed before. Using standard protocols, our very for the third highest expressed protein, we were not able to find first genetic analysis was unable to find any recognizable se- any structural candidates with known biological function, for quences of capsid or other classical viral genes in Yaravirus. This

Table 2. Annotation of Yaravirus proteins based on the predicted tridimensional structure of the proteins coded by the virus Gene Best hit Functional prediction Probability E-value

2 Laribacter hongkongensis Exonuclease 99.98 2.5e-30 3 Eubacterium eligens Uncharacterized protein 94.49 0.88 11 Homo sapiens Adiponetctin 95.76 0.096 12 Rattus norvegicus Cerebellin-1; synapse protein 97.54 5,00e-03 17 Rattus norvegicus prkc apoptosis wt1 regulator protein 95.86 0.17 26 Homo sapiens Retinoblastoma-binding protein 6 94.71 0.026 28 Faba Bean Necrotic Yellows Replication-associated protein; endonuclease 95.07 0.11 40 Sulfolobus turreted icosahedral virus 2 Genome packaging NTPase B204; FtsK-HerA superfamily 99.67 7.9e-15 41 Singapore grouper iridovirus Major capsid protein 98.89 3.1e-8 43 Arabidopsis thaliana HY5 96.6 0.016 46 Enterobacteria phage HK620 DNA stabilization protein; tail needle, viral genome-ejection 86.58 0.033 54 Haloarcula marismortui Ribosome 50S 90.84 0.81 57 H. sapiens BEN domain-containing protein 3 86.72 0.43 58 Oryctolagus cuniculus Potential copper-transporting atpase 91.25 0.62 67 Xenopus laevis DNA-(apurinic or apyrimidinic site) lyase 92.31 0.45 69 Bovine papillomavirus type 1 Replication protein E1/DNA Complex; DNA helicase, AAA+, ATPase 99.18 4.5e-10 70 Holliday junction resolvase 99.85 9.2e-20

Note: The bold genes represent proteins observed on the viral proteomics.

Boratto et al. PNAS Latest Articles | 5of8 Downloaded by guest on September 25, 2021 the perspective of a selective pressure forcing to maintain these genes in such a small genome when compared to larger viruses of amoeba. Even more interestingly, none of the isoacceptors re- lated to the Yaravirus tRNAs correspond to codons of amino acids abundantly used by the virus or the amoeba. Considering the fastidious infection cycle of Yaravirus in Acanthamoeba,itis conceivable that, in , a different might act as the preferred host of Yaravirus. Some genes were found to be shared between Yaravirus and members of the Pleurochrysis endemic virus group; however, the low coverage supporting their putative orthologs make difficult a close relationship between these organisms. Most members of the to-date isolated giant viruses of amoeba show a capsid specially composed by copies of an MCP related to the D13L of Vaccinia virus (15, 45). Pandoraviruses are an ex- ception, as they seem to lack a protein shell to protect their genomes (28). Interestingly, even some of the amoeba hosts of these viruses may carry copies of MCP genes, suggesting possible between virus and host (46, 47). By the structure-guided alignment and analysis of the con- structed phylogenetic trees, Yaravirus does seem to share a common origin of MCP capsid with other NCLDVs, even though their sequences show an incredibly low similarity (48). Taken together, we can conclude that Yaravirus represents a divergent lineage of viruses isolated from A. castellanii. The large amount of unknown proteins encoded by Yaravirus reflects the variability existing in the viral world and the astonishing coding potential of new viral genomes yet to be discovered. Methods Origin of Samples and Viral Isolation. In 2017, searching to isolate novel variants of viruses infecting , we collected samples of muddy water from a creek of Lake Pampulha, an artificial lagoon located at the city of Belo Horizonte, Brazil (19 51 0.60S and 43 58 18.90W). As soon as they were Fig. 4. Phylogenetic position of Yaravirus and related viral sequences in the collected, the samples were quickly taken to our lab and stored at 4°C until × 4 Mimiviridae based on the viral ATPase (NCVOG0249). The Yaravirus ATPase they were further processed. Following the protocol, 4 10 amoebas of the is highlighted in yellow. Branch support is indicated as colored circles for A. castellanii Neff strain (ATCC 30010) were seeded in each well of a 96-well μ support values of 90 or below. The tree is rooted at the Poxviruses. (Scale plate, inoculating to each one a volume of around 100 L of the collected bar: substitutions per site.) GC content of viral genomes and contigs con- samples, originally diluted 1:10 in PBS buffer. The plates were then taining NCVOG0249 is shown together with the average GC content of collapsed clades. In addition, environmental origin and assembly sizes of Yaravirus and related viral contigs and genomes are shown.

is a relevant feature to highlight the importance of studies re- lated to the isolation of new viral samples, as, by following the current metagenomic protocols for viral detection, Yaravirus would not even be recognized as a viral agent (42, 43). According to our knowledge, Yaravirus represents the first virus isolated in Acanthamoeba spp. that is potentially not part of the complex group of NCLDVs. Several characteristics unite previously discovered amoebal viruses: large-sized virions, ge- nomes coding for hundreds to thousands of genes, and pre- sumably a monophyletic origin that is reflected in the presence of a set of about 20 most likely vertically inherited genes (17, 18). None of these features are present in Yaravirus, and that makes it potentially the first isolate of a novel bona fide group of amoebal virus. Of course, we cannot exclude the possibility that Yaravirus may represent a reduced NCLDV, presenting highly divergent or even absent NCLDV hallmark proteins. Recently, a similar case was described for three small crustacean viruses (44). However, despite their reduced genome when compared to Fig. 5. Structure-guided phylogeny based on hmmsearch employed to other members of the NCLDV, an important number of hall- identify the Yaravirus MCP in 235 NCDLV reference genomes using specific mark genes were shared with this group, differently as observed hidden Markov models. The alignments were made with Expresso in the software T-Coffee, using PDB structures. After, the phylogenetic trees were for Yaravirus (44). In this not less exciting scenario, supported by built using IQ-tree (v1.6.12; ref. 61) with LG+F+R8 based on the built-in the analysis regarding the ATPase and MCP phylogenetic trees, model select feature (62) and 1,000 ultrafast bootstrap replicates (63). The Yaravirus would represent the to-date smallest member of the Yaravirus MCP is highlighted in yellow. Branch support is indicated as col- NCLDVs, both in particle and . The presence of six ored circles for support values of 90 or below. The tree is unrooted. (Scale copies of tRNAs in Yaravirus also impresses when analyzed by bar: substitutions per site.)

6of8 | www.pnas.org/cgi/doi/10.1073/pnas.2001637117 Boratto et al. Downloaded by guest on September 25, 2021 incubated for 7 d at 32°C and observed daily for the appearance of cyto- sequences were downloaded from the NCBI database and analyzed by using pathic effect, which may indicate a probable viral infection. All of the con- the software Artemis 18.0.3. The percentage of GC content and GC skew tent from the wells was then collected and submitted to three processes of have also been analyzed by using the same software. Transfer RNA (tRNA) freezing and thawing and analysis of the possible isolates by negative sequences were identified using the ARAGORN tool. Phylogenetic analyses staining technique. By the end, the collected content was submitted to another were performed for the six proteins of Yaravirus holding similarities with two blind passages in fresh cultures of amoeba, but this time in 25-cm2 Nunc other organisms on the NCBI database (Table 1). By using the ClustalW tool Cell Culture Treated Flasks with Filter Caps (Thermo Fisher Scientific) containing in the Mega 10.0.5 software program, amino acid sequences of these Yar- around 1 million amoebal cells. After viral isolation, all of the following ex- avirus proteins were previously aligned with the corresponding sequences of periments were made by infecting A. castellanii cells in a low multiplicity of representatives of the virosphere and from other cellular organisms be- infection (MOI), given the Yaravirus’s fastidious replication cycle. longing to the three domains of life. The analysis involved 55 amino acid sequences for gene 40, 49 amino acid sequences for gene 02, 13 amino acid Transmission Electron Microscopy (TEM), TEM Tomography, Cryo-Electron sequences for gene 46, and 34 amino acid sequences for gene 69. The per- Microscopy. For resin embedding and transmission electron microscopy centage of trees in which the associated taxa clustered together is shown (TEM), A. castellanii cells infected with Yaravirus were fixed at 20 h post- next to the branches. Initial tree(s) for the heuristic search were obtained infection with 2.5% glutaraldehyde in 0.1 M sodium cacodylate buffer. Cells automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of were washed three times with a solution of 0.2 M saccharose in 0.1 M so- pairwise distances estimated using a JTT model, and then selecting the to-

dium cacodylate. Cells were postfixed for 1 h with 1% OsO4 diluted in 0.2 M pology with superior log likelihood value. The tree is drawn to scale, with potassium hexa-cyanoferrate (III)/0.1 M sodium cacodylate. After washes branch lengths measured in the number of substitutions per site. All of the with distilled water, cells were gradually dehydrated with ethanol by suc- trees were constructed by using the maximum likelihood evolution method, cessive 10-min baths in 30, 50, 70, 96, 100, and 100% ethanol. with the JTT matrix-based model and a bootstrap of 1,000 replicates (54). Substitution was achieved by successively placing the cells in 25, 50, and 75% Epon solutions for 15 min. Cells were placed for 1 h in 100% Epon Yaravirus Proteomics. In order to identify the proteins that make up Yaravirus 2 6 solution and in fresh Epon 100% overnight at room temperature. Poly- particles, thirty 75-cm cell culture flasks (Nunc), containing 7 × 10 A. cas- merization took place with cells in fresh 100% Epon for 48 h at 60°. Ultrathin tellanii cells per flask, were infected with the isolated virus, and the cyto- 70- or 300-nm-thick sections were cut with a UC7 ultramicrotome (Leica) and pathic effect was followed up to 7 d.p.i. After severe amoebal lysis, the placed on HR25 300 Mesh Copper/Rhodium grids (TAAB). Ultrathin sections content was collected and submitted to a first round of centrifugation, at were contrasted according to Reynolds (49). Electron micrographs were 1,500 × g for 10 min, looking to pellet the cell debris from the virus present obtained on a Tecnai G20 TEM operated at 200 keV equipped with a 4,096 × on the supernatant. Then, this viral portion was submitted to a second round 4,096-pixel resolution Eagle camera (FEI). For tomography, gold nano- of centrifugation, and the virus was concentrated by ultracentrifugation at particles 10 nm in diameter (ref. 741957; Sigma-Aldrich) were deposited on 100,000 × g for 2 h. To finish, viral pellet was then prepared for a two- both faces of the sections prior to contrasting. Tomography tilt series were dimensional gel electrophoresis and analysis by matrix-assisted laser de- acquired on the G20 Cryo TEM (FEI) with the Explore 3D software (FEI) for tilt sorption/ionization and liquid chromatography-tandem mass spectrometry MICROBIOLOGY ranges of 110° with 1° increments. The mean applied defocus was −2 μm. as described before by Reteno and colleagues (55). The magnification ranged between 3,500 and 29,000 with pixel sizes be- tween 3.13 and 0.37 nm, respectively. The image size was 40,962 pixels. The Metagenomic Survey. The Yaravirus ATPase (NCVOG0249) and the putative tilt series were aligned using ETomo from the IMOD software package major capsid protein (MCP) were used to query 8,535 publicly available (University of Colorado) by cross-correlation (50). The tomograms were metagenomes in the IMG/M database (30) using diamond BLASTp (v0.9.25.126; reconstructed using the weighted back-projection algorithm in ETomo from ref. 56). Resulting protein hits with more than 30% query and subject coverage IMOD. The average thickness of the obtained tomograms was 268.40 ± and an e-value of at least 1e-5 were extracted from the metagenomic data. In 64 nm (n = 16). Fiji/ImageJ (NIH) was used for making tomography movies parallel, hmmsearch (version 3.1b2; hmmer.org) was employed to identify and (51). For cryoelectron microscopy assays, the supernatant of infected cultures extract ATPases (NCVOG0249) and MCPs (multiMCP) from 235 NCDLV reference of A. castellanii was collected after 7 d postinfection and submitted to a first genomes using specific hidden Markov models (https://bitbucket.org/berkeley- round of centrifugation at 1,500 × g for 10 min, looking to pellet the cell lab/mtg-gv-exp/). Extracted proteins were then combined with the Yaravirus debris from the virus present on the supernatant. Next, the portion con- queries. To remove most redundant sequences, the MCP data set was clustered taining the Yaravirus was then submitted to a second round of centrifuga- in cd-hit (57) at an amino acid similarity level of 90% using the accurate mode. tion, and the virus was concentrated by ultracentrifugation at 100,000 × g ATPases were then aligned with MAFFT-linsi (v7.294b; ref. 58) and MCPs in for 2 h. The following steps were previously described by Klose et al. (23). Expresso (59) (most accurate mode in t-coffee v_13.41.0.28, alignment guided Briefly, 3 μL of virus solution was placed on glow-discharged C-Flat 2/2 grids by PDB structures), and the resulting alignments trimmed with trimal (v1.4, -gt (EMS) and plunge-frozen into liquid ethane using a Gatan Cryoplunge 3. 0.1; ref. 60). Phylogenetic trees were built using IQ-tree (v1.6.12; ref. 61) with Samples were then imaged on a Talos F200C (ThermoFisher Scientific) LG+F+R5 (ATPase) and LG+F+R8 (MCP) based on the built-in model select fea- equipped with a Ceta camera (ThermoFisher Scientific). ture (62) and 1,000 ultrafast bootstrap replicates (63). Phylogenetic trees were visualized with iTol [v5.8 (61)] and ete3 (64). Genome Sequence and Analysis. The Yaravirus genome was sequenced two times by using the Illumina MiSeq platform (Illumina) with the paired-end Data Availability. Data for the Yaravirus genome have been deposited to application. The generated reads were then assembled de novo by using the GenBank under accession number MT293574. software ABYSS and SPADES, with the resulting contigs ordered by the Python-based CONTIGuator.py software. After, gene predictions were made ACKNOWLEDGMENTS. We thank our colleagues from IHU (Aix Marseille by using the GeneMarkS tool (52). The functional annotation for the Yar- University) and from Laboratório de Vírus (Universidade Federal de Minas Ger- avirus predicted proteins was made through searches against the GenBank ais) for their assistance, especially Said Mougari, Issam Hasni, Lina Barrassi, NCBI nonredundant protein sequence database (nr), considering as homol- Priscilla Jardot, Erna Kroon, Claudio Bonjardim, Paulo Ferreira, Giliane Trindade, − ogous proteins only the sequences that presented an e-value < 1 × 10 3. The and Betania Drumond. In addition, we thank the Méditerranée Infection Foun- annotation was refined by comparing the in silico-predicted protein struc- dation, Centro de Microscopia da Universidade Federal de Minas Gerais tures of Yaravirus with domains of proteins present in different databases (UFMG), Pro Reitorias de Pesquisa e Pós-graduação da UFMG, Programa de Pós-graduação em Microbiologia da UFMG, CNPq (Conselho Nacional de Desen- using three platforms: the HHpred, the Phyre2, and the Swiss-model tools volvimento Científico e Tecnológico), CAPES (Coordenação de Aperfeiçoa- – (30 37). It is important to note that, for the HHpred, we considered true- mento de Pessoal de Nível Superior), and FAPEMIG (Fundação de Amparo à positive protein structures only matches that had a probability >80% and Pesquisa do estado de Minas Gerais) for their financial support. J.S.A. is a CNPq e-value ∼1, as suggested by Söding and colleagues (37). For the qPCR assays, researcher. B.L.S., J.S.A., P.C., P.V.M.B. and G.P.O. are members of a CAPES- the increase in genome replication was assessed in cultures of A. castellanii Comitê Francês de Avaliação da Cooperação Universitária project. This work cells infected by Yaravirus in different time points (H4, H6, H8, H12, H24, was supported by the Agence Nationale de la Recherche (ANR), including the ‘ ’ ’ ’ H72, H96, and H168), using primers which were constructed based on the Programme d investissement d avenir under the reference Méditerranée In- sequence of the gene 69 of Yaravirus (primers, 5′TGCAGCAAGTCGGTCAA- fection 10-1AHU-03 and European funding FEDER PRIMI. The work that was conducted by F.S. at the US Department of Energy Joint Genome Institute, a GAT3′ and 5′AACTTCCACATGCGAAACGC3′). Conditions used in the assay DOE Office of Science User Facility, is supported under Contract DE-AC02–05 were previously described (53). CH11231. It made use of resources of the National Energy Research Scientific The amino acid and codon usage data were compared to those presented Computing Center, which is also supported by the DOE Office of Science under by A. castellanii and by different strains of amoebal viruses. For this, the Contract DE-AC02–05CH11231.

Boratto et al. PNAS Latest Articles | 7of8 Downloaded by guest on September 25, 2021 1. J. Guglielmini, A. C. Woo, M. Krupovic, P. Forterre, M. Gaia, Diversification of giant 34. P. Benkert, M. Biasini, T. Schwede, Toward the estimation of the absolute quality of and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. Proc. individual models. Bioinformatics 27, 343–350 (2011). Natl. Acad. Sci. U.S.A. 116, 19585–19592 (2019). 35. M. Bertoni, F. Kiefer, M. Biasini, L. Bordoli, T. Schwede, Modeling protein quaternary 2. E. V. Koonin, N. Yutin, Multiple evolutionary origins of giant viruses. F1000 Res. 7, structure of homo- and hetero-oligomers beyond binary interactions by homology. 1840 (2018). Sci. Rep. 7, 10480 (2017). 3. P. Colson et al., Ancestrality and mosaicism of giant viruses supporting the definition 36. L. A. Kelley, S. Mezulis, C. M. Yates, M. N. Wass, M. J. Sternberg, The Phyre2 web of the fourth TRUC of microbes. Front. Microbiol. 9, 2668 (2018). portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015). 4. P. Colson, Y. Ominami, A. Hisada, B. La Scola, D. Raoult, Giant escape 37. J. Soding, A. Biegert, A. N. Lupas, The HHpred interactive server for protein homology many canonical criteria of the virus definition. Clin. Microbiol. Infect. 25, 147–154 detection and structure prediction. Nucleic Acids Res. 33, W244–W248 (2005). (2019). 38. M. Boyer et al., Giant Marseillevirus highlights the role of amoebae as a melting pot 5. J. Abrahão et al., Tailed giant possesses the most complete translational in emergence of chimeric microorganisms. Proc. Natl. Acad. Sci. U.S.A. 106, apparatus of the known virosphere. Nat. Commun. 9, 749 (2018). 21848–21853 (2009). 6. M. Legendre et al., In-depth study of sibericum, a new 30,000-y-old giant 39. P. V. Boratto et al., Niemeyer virus: A new mimivirus group A isolate harboring a set virus infecting Acanthamoeba. Proc. Natl. Acad. Sci. U.S.A. 112, E5327–E5335 (2015). of duplicated aminoacyl-tRNA synthetase genes. Front. Microbiol. 6, 1256 (2015). 7. J. Andreani et al., Pacmanvirus, a new giant icosahedral virus at the crossroads be- 40. R. A. L. Rodrigues et al., Morphologic and genomic analyses of new isolates reveal a tween Asfarviridae and Faustoviruses. J. Virol. 91, e00212-17 (2017). second lineage of cedratviruses. J. Virol. 92, e00372-18 (2018). 8. J. Andreani et al., Cedratvirus, a double-cork structured , is a distant rela- 41. L. K. D. S. Silva et al., Cedratvirus getuliensis replication cycle: An in-depth morpho- tive of pithoviruses. Viruses 8, 300 (2016). logical analysis. Sci. Rep. 8, 4000 (2018). 9. L. H. Bajrai et al., Kaumoebavirus, a new virus that clusters with faustoviruses and 42. Y. Liang et al., Metagenomic analysis of the diversity of DNA viruses in the surface Asfarviridae. Viruses 8, 278 (2016). and deep sea of the South China sea. Front. Microbiol. 10, 1951 (2019). 10. F. Schulz et al., Giant virus diversity and host interactions through global meta- 43. D. De Corte et al., Viral communities in the global deep ocean conveyor belt assessed – genomics. Nature 578, 432 436 (2020). by targeted viromics. Front. Microbiol. 10, 1801 (2019). 11. M. Boyer, G. Gimenez, M. Suzan-Monti, D. Raoult, Classification and determination of 44. K. Subramaniam et al., A new family of DNA viruses causing disease in Crustaceans possible origins of ORFans through analysis of nucleocytoplasmic large DNA viruses. from diverse aquatic biomes. MBio 11, e02938-19 (2020). – Intervirology 53, 310 320 (2010). 45. S. W. Wilhelm et al., A student’s guide to giant viruses infecting small eukaryotes: 12. N. Siew, D. Fischer, Twenty thousand ORFan microbial protein families for the bi- From Acanthamoeba to Zooxanthellae. Viruses 9, 46 (2017). – ologist? Structure 11,7 9 (2003). 46. N. Chelkha et al., A phylogenomic study of Acanthamoeba polyphaga draft genome 13. Y. Yin, D. Fischer, Identification and investigation of ORFans in the viral world. BMC sequences suggests genetic exchanges with giant viruses. Front. Microbiol. 9, 2098 Genomics 9, 24 (2008). (2018). 14. N. Siew, D. Fischer, Unravelling the ORFan puzzle. Comp. Funct. Genomics 4, 432–441 47. F. Maumus, G. Blanc, Study of gene trafficking between Acanthamoeba and giant (2003). viruses suggests an undiscovered family of amoeba-infecting viruses. Genome Biol. 15. P. Renesto et al., Mimivirus giant particles incorporate a large fraction of anonymous Evol. 8, 3351–3363 (2016). and unique gene products. J. Virol. 80, 11678–11685 (2006). 48. M. Krupovic, D. H. Bamford, Double-stranded DNA viruses: 20 families and only five 16. N. A. Khan, Acanthamoeba: Biology and Pathogenesis (Caister Academic Press, ed. 2, different architectural principles for virion assembly. Curr. Opin. Virol. 1, 118–124 2015). (2011). 17. E. V. Koonin, N. Yutin, Evolution of the large nucleocytoplasmic DNA viruses of eu- 49. E. S. Reynolds, The use of lead citrate at high pH as an electron-opaque stain in karyotes and convergent origins of viral gigantism. Adv. Virus Res. 103, 167–202 electron microscopy. J. Cell Biol. 17, 208–212 (1963). (2019). 50. J. R. Kremer, D. N. Mastronarde, J. R. McIntosh, Computer visualization of three- 18. L. M. Iyer, S. Balaji, E. V. Koonin, L. Aravind, Evolutionary genomics of nucleo- dimensional image data using IMOD. J. Struct. Biol. 116,71–76 (1996). cytoplasmic large DNA viruses. Virus Res. 117, 156–184 (2006). 51. J. Schindelin et al., Fiji: An open-source platform for biological-image analysis. Nat. 19. M. Krupovic, E. V. Koonin, Multiple origins of viral capsid proteins from cellular an- Methods 9, 676–682 (2012). cestors. Proc. Natl. Acad. Sci. U.S.A. 114, E2401–E2410 (2017). 52. J. Besemer, A. Lomsadze, M. Borodovsky, GeneMarkS: A self-training method for 20. N. Yutin, Y. I. Wolf, D. Raoult, E. V. Koonin, Eukaryotic large nucleo-cytoplasmic DNA prediction of gene starts in microbial genomes. Implications for finding sequence viruses: Clusters of orthologous genes and reconstruction of viral genome evolution. motifs in regulatory regions. Nucleic Acids Res. 29, 2607–2618 (2001). Virol. J. 6, 223 (2009). 53. L. H. Bajrai et al., Isolation of yasminevirus, the first member of klosneuvirinae iso- 21. E. V. Koonin et al., Global organization and proposed megataxonomy of the virus lated in coculture with vermamoeba vermiformis, demonstrates an extended arsenal world. Microbiol. Mol. Biol. Rev. 84, e00061-19 (2020). 22. A. C. D. S. P. Andrade et al., Ubiquitous giants: A plethora of giant viruses found in of translational apparatus components. J. Virol. 94, e01534-19 (2019). Brazil and Antarctica. Virol. J. 15, 22 (2018). 54. D. T. Jones, W. R. Taylor, J. M. Thornton, The rapid generation of mutation data – 23. T. Klose et al., Structure of faustovirus, a large dsDNA virus. Proc. Natl. Acad. Sci. matrices from protein sequences. Comput. Appl. Biosci. 8, 275 282 (1992). U.S.A. 113, 6206–6211 (2016). 55. D. G. Reteno et al., Faustovirus, an asfarvirus-related new lineage of giant viruses – 24. A. C. D. S. Pereira Andrade et al., New isolates of pandoraviruses: Contribution to the infecting amoebae. J. Virol. 89, 6585 6594 (2015). study of replication cycle steps. J. Virol. 93, e01942-18 (2019). 56. B. Buchfink, C. Xie, D. H. Huson, Fast and sensitive protein alignment using DI- – 25. M. Legendre et al., Diversity and evolution of the emerging Pandoraviridae family. AMOND. Nat. Methods 12,59 60 (2015). Nat. Commun. 9, 2285 (2018). 57. L. Fu, B. Niu, Z. Zhu, S. Wu, W. Li, CD-HIT: Accelerated for clustering the next- – 26. G. P. Oliveira et al., Promoter motifs in NCLDVs: An evolutionary perspective. Viruses generation sequencing data. Bioinformatics 28, 3150 3152 (2012). 9, 16 (2017). 58. K. Katoh, D. M. Standley, A simple method to control over-alignment in the MAFFT – 27. D. Raoult et al., The 1.2-megabase genome sequence of Mimivirus. Science 306, multiple sequence alignment program. Bioinformatics 32, 1933 1942 (2016). 1344–1350 (2004). 59. P. Tommaso et al., T-Coffee: A web server for the multiple sequence alignment of 28. N. Philippe et al., Pandoraviruses: Amoeba viruses with genomes up to 2.5 Mb protein and RNA sequences using structural information and homology extension. reaching that of parasitic eukaryotes. Science 341, 281–286 (2013). Nucleic Acids Res. 39, W13–W17 (2011). 29. V. Sharma, P. Colson, R. Giorgi, P. Pontarotti, D. Raoult, DNA-dependent RNA poly- 60. S. Capella-Gutiérrez, J. M. Silla-Martínez, T. Gabaldón, trimAl: A tool for automated merase detects hidden giant viruses in published databanks. Genome Biol. Evol. 6, alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1603–1610 (2014). 1972–1973 (2009). 30. I. A. Chen et al., IMG/M v.5.0: An integrated data management and comparative 61. L. T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: A fast and effective analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 47, stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. D666–D677 (2019). 32, 268–274 (2015). 31. A. Waterhouse et al., SWISS-MODEL: Homology modelling of protein structures and 62. S. Kalyaanamoorthy, B. Q. Minh, T. K. F. Wong, A. von Haeseler, L. S. Jermiin, Mod- complexes. Nucleic Acids Res. 46, W296–W303 (2018). elFinder: Fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 32. S. Bienert et al., The SWISS-MODEL Repository-new features and functionality. Nu- 587–589 (2017). cleic Acids Res. 45, D313–D319 (2017). 63. D. T. Hoang, O. Chernomor, A. von Haeseler, B. Q. Minh, L. S. Vinh, UFBoot2: Im- 33. N. Guex, M. C. Peitsch, T. Schwede, Automated comparative protein structure mod- proving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018). eling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electropho- 64. J. Huerta-Cepas, F. Serra, P. Bork, ETE 3: Reconstruction, analysis, and visualization of resis 30 (suppl. 1), S162–S173 (2009). phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).

8of8 | www.pnas.org/cgi/doi/10.1073/pnas.2001637117 Boratto et al. Downloaded by guest on September 25, 2021