Exploring the Caudovirales: Evaluation of Their Internal Classification and Potential Relationships with the Tectiviridae Juan S
Total Page:16
File Type:pdf, Size:1020Kb
Exploring the Caudovirales: Evaluation of their Internal Classification and Potential Relationships with the Tectiviridae Juan S. Andrade-Martínez1,2, Alejandro Reyes1,2,3 1. Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia. 2. Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia. 3. Center for Genome Sciences and Systems Biology, Department of Pathology and Immunology, Washington University in Saint Louis, Saint Louis, MO, 63108, USA. Abstract The Caudovirales are the most abundant dsDNA viruses, infecting both Bacteria and Archaea. Recently developed distance and network-based approaches have put into question the morphology-based classification of the three traditional Caudovirales families: Podoviridae, Siphoviridae, and Myoviridae, and suggested an evolutionary relationship between such order and the phage family Tectiviridae. In that context, the present work aimed to, using of clusters of viral domain orthologous groups (VDOGs) and k-mers, determine whether the current Caudovirales classification is evolutionarily reasonable and explore the possibility of a common ancestry between Caudovirales and Tectiviridae. For this, we employed over 4000 Caudovirales and 15 Tectiviridae complete genomes obtained from the NCBI Assembly Database. These entries were dereplicated at the genome and protein level, yielding a set of representative proteomes. The latter were screened through a Hidden Markov Model search against a viral domain orthologous groups database to determine which proteomes harbored which VDOGs. A k-mer search was also conducted to establish which k-mers with lengths between 6 and 15 were abundant in the clades of interest. The representative features, k-mers or VDOGs, of the clades were determined, and dendrograms constructed based on them using a Neighbor-joining approach. All dendrograms based on k-mers generated an almost perfect distinction between the outgroups and the Caudovirales and Tectiviridae. On the contrary, the VDOG only dendrogram showed that most Caudovirales subfamilies are monophyletic, while none of the dendrograms showed monophyletic Caudovirales families. Overall, our results support the hypothesis that the classification of the three traditional Caudovirales families needs to be revised, suggest the existence of a common ancestry for Caudovirales and Tectiviridae, and benchmarks the use of VDOGs and k-mers for phylogenetic analyses. Keywords Caudovirales, Tectiviridae, orthologous protein clusters, viral phylogenetics Introduction It is estimated that, for any given environment, the quantity of viral particles is up to 10 times as many as that of prokaryotic cells (Koonin, Dolja, & Krupovic, 2015). More notorious than their abundance, however, is the high viral diversity, which manifests itself in the plurality of structures, genome sizes, strategies of replication and expression, and virion morphologies (Koonin, Dolja, et al., 2015; Koonin, Krupovic, & Yutin, 2015). Viral evolutionary patterns are difficult to elucidate since such biological entities have high rates of mutation and infection (Hendrix, 2008). For instance, it is estimated that phages are responsible for up to 1024 productive infections per second in marine ecosystems (Hendrix, 2008). Additionally, in phages, vertical transmission is the primary way of genetic information transfer only in highly related viruses, whilst at greater evolutionary distances horizontal transmission predominates (Kristensen et al., 2013). In spite of this, significant advances have recently been made in the reconstruction of the evolutionary history of the main viral clades. For the ARN viruses, a draft tree is available which connects the orders Mononegavirales, Tymovirales, Picornavirales, and the family Flaviviridae (Koonin, Dolja, et al., 2015; Koonin, Krupovic, et al., 2015). For the DNA viruses, there are two recognized relationships: that of the bacteria-infecting family Tectiviridae and its descendants, the Polintoviruses, which include the proposed eukaryote-infecting order Megavirales; and that of the Caudovirales, and their putative descendants in eukaryotes, the Herpesvirales (Koonin, Dolja, et al., 2015; McGeoch, Davison, Dolan, Gatherer, & Sevilla-Reyes, 2008; Selvarajan Sigamani, Zhao, Kamau, Baines, & Tang, 2013). The Caudovirales, or tailed-phages, are the most abundant dsDNA viruses, infecting both Bacteria and Archaea (H.-W. Ackermann, 1998). Their non-enveloped virion is composed of a head, a protein shell with protects the DNA molecule, and a tail, a protein tube involved in DNA delivery to the host cytoplasm (King, Adams, Carstens, & Lefkowitz, 2011a). The virion (Figure 1a) can also harbor additional attachments, such as terminal fibers or base plates in their tails (H.-W. Ackermann, 1998; King et al., 2011a). Under its traditional classification, this group was comprised of 3 families, defined by their virion morphology (Figure 1b): Myoviridae, with long contractile tails, Siphoviridae, with long non-contractile tails, and Podoviridae, with short non-contractile tails (H.-W. Ackermann, 1998). Phages of the order can harbor as few as 27 and as many as over 600 genes, which are usually clustered in operons based on their functions (King et al., 2011a). However, due to the low number of fully annotated genomes a general architecture for the order has not been defined yet (King et al., 2011a). Until now, evolutionary analyses of this group have suggested that they arose shortly after the origin of cellular life (H. W. Ackermann, 2003). Nevertheless, they are two yet unanswered questions regarding this clade. The first one deals with their evolutionary relationship with the aforementioned Tectiviridae and other related clades: its elucidation would lead to a first draft of a tree which delineates the transition of the main groups of dsDNA phages to eukaryote hosts. Nonetheless, at this point the only common characteristic identified, apart from the use of dsDNA (Koonin, Dolja, et al., 2015), is the mutual presence of an icosahedral capsid in their non-enveloped virions (H. W. Ackermann, 2003). Morphologically however, the Tectiviridae are tailless (Figure 2), producing instead a tail-like structure during injection into the host, and harbor spikes in the vertices of their capsid (King, Adams, Carstens, & Lefkowitz, 2011b). The second unanswered question is related to the classification of the Caudovirales: over the past years the taxonomy of the tailed phages has changed dramatically, including the creation of subfamilies for the family Siphoviridae (Adriaenssens et al., 2017), and, under the last and yet unpublished ICTV 2017 release (available at: http://ictv.global/taxonomyReleases.asp) the creation of an additional family: the Ackermannviridae, whose members come from unclassified Caudovirales phages (Figure 3a). The Tectiviridae have also experienced changes, with the creation of a second genus for the order: Betatectivirus (Figure 3b). With the objective of overcoming the hurdles generated by high mutation rates, two approaches have been proposed in recent years for the construction of viral phylogenies. The first one calculates k-mer frequencies in complete genomes, which are then used to generate a distance matrix (Zhang, Jun, Leuze, Ussery, & Nookaew, 2017). Based on the latter, a dendrogram can be created through a neighbor-joining approach (Zhang et al., 2017). On the contrary, the second one starts with the generation of viral domain orthologous groups (or VDOGs), created using best reciprocal BLAST hits (Moreno-Gallego & Reyes, 2016). Once defined, the information of presence/absence of these clusters in each clade of interest can be employed to determine representative (characteristic) clusters for a given group, and hence can be used for taxonomic analyses, such as the definition of core genomes, or the generation of distance-based dendrograms or phylogenetic trees (Moreno- Gallego & Reyes, 2016; Andrade-Martínez & Reyes, 2017). Methods based on bipartite networks of homologous genes and representative genomes (Iranzo, Krupovic, & Koonin, 2016), and distance- based procedures which incorporate both homologous gene groups and synteny, in particular the GRAViTy pipeline (Aiewsakun, Adriaenssens, Lavigne, Kropinski, & Simmonds, 2018), have also been proposed with promising results. In fact, based on the findings from these studies the morphology- based classification of the Caudovirales into the three traditional families, Siphoviridae, Myoviridae, and Podoviridae, has recently been put into question. It has been determined that members of these families do not constitute robust, monophyletic groups in network or distance-based analyses (Aiewsakun et al., 2018; Iranzo et al., 2016; Andrade-Martínez & Reyes, 2017). These techniques have also provided indirect evidence for a relationship between the Caudovirales and Tectiviridae: the Megavirales, members of the Polintoviruses group, seem to harbor ribonucleotide reductase and helicase genes similar to those of Caudovirales (Iranzo et al., 2016). Moreover, a dendrogram produced through GRAViTy analysis of all dsDNA phages clusters the Tectiviridae in the same branch as an offshoot of the Podoviridae, which is nonetheless located outside the main Caudovirales branch (Aiewsakun et al., 2018). Unfortunately, Zhang et al. do not provide information regarding the location of Tectiviridae in their k-mer dendrogram (2017), so no conclusions