The Missing Tailed Phages: Prediction of Small Capsid Candidates
Total Page:16
File Type:pdf, Size:1020Kb
microorganisms Article The Missing Tailed Phages: Prediction of Small Capsid Candidates Antoni Luque 1,2,3,* , Sean Benler 4 , Diana Y. Lee 1,2, Colin Brown 1,5 and Simon White 6 1 Viral Information Institute, San Diego State University, San Diego, CA 92182, USA; [email protected] (D.Y.L.); [email protected] (C.B.) 2 Computational Science Research Center, San Diego State University, San Diego, CA 92182, USA 3 Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA 4 National Center for Biotechnology Information (NCBI), Bethesda, MD 20894, USA; [email protected] 5 Department of Physics, San Diego State University, San Diego, CA 92182, USA 6 Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA; [email protected] * Correspondence: [email protected] Received: 29 October 2020; Accepted: 5 December 2020; Published: 8 December 2020 Abstract: Tailed phages are the most abundant and diverse group of viruses on the planet. Yet, the smallest tailed phages display relatively complex capsids and large genomes compared to other viruses. The lack of tailed phages forming the common icosahedral capsid architectures T = 1 and T = 3 is puzzling. Here, we extracted geometrical features from high-resolution tailed phage capsid reconstructions and built a statistical model based on physical principles to predict the capsid diameter and genome length of the missing small-tailed phage capsids. We applied the model to 3348 isolated tailed phage genomes and 1496 gut metagenome-assembled tailed phage genomes. Four isolated tailed phages were predicted to form T = 3 icosahedral capsids, and twenty-one metagenome-assembled tailed phages were predicted to form T < 3 capsids. The smallest capsid predicted was a T = 4/3 1.33 architecture. No tailed phages were predicted to form the smallest ≈ icosahedral architecture, T = 1. We discuss the feasibility of the missing T = 1 tailed phage capsids and the implications of isolating and characterizing small-tailed phages for viral evolution and phage therapy. Keywords: bacteriophage; tailed phages; icosahedral capsids; capsid modeling; statistical learning; isolated genomes; metagenome-assembled genomes 1. Introduction Tailed phages are viruses that infect bacteria and are the most abundant biological entity on Earth [1]. They are responsible for the regulation of biogeochemical processes at a planetary scale [2,3], the control of microbial populations [4,5], and the mobility of genes across hosts and ecosystems [6,7]. The broad functionality of tailed phages is facilitated by their vast reservoir of genes, resulting in a large range of genome lengths, from 10 to 500 kilobase pairs (kbp) [8,9]. Tailed phages can accommodate these genomes because the proteins building their protective capsids display a high degree of plasticity, allowing for a broad range of different size capsids, from about 30 to 160 nm in diameter [10]. The majority of tailed phages, about 80–90%, form capsids with icosahedral symmetry [11], while the remaining 10–20% form elongated capsids with icosahedral caps [12,13]. The formation of icosahedral capsids is not unique to tailed phages—it is the most common viral capsid architecture in the virosphere [14]. However, despite the abundance and diversity of tailed phages, even the smallest tailed phages form far more complex capsids and store far larger genomes Microorganisms 2020, 8, 1944; doi:10.3390/microorganisms8121944 www.mdpi.com/journal/microorganisms Microorganisms 2020, 8, 1944 2 of 18 Microorganisms 2020, 8, x FOR PEER REVIEW 2 of 19 than most icosahedral viral families [15]. Furthermore, the major capsid proteins of tailed phages adoptthe theHK97-fold, HK97-fold, which which is also is found also found in the inbuilding the building blocks of blocks bacterial of bacterial and archeal and enzymatic archeal enzymatic cellular cellularnanocompartments nanocompartments called called encapsulins encapsulins [16,17]. [16 ,17Intriguingly,]. Intriguingly, encapsulins encapsulins can canform form the the smallest smallest icosahedralicosahedral protein protein shells shells that that are are absent absent among among tailed tailedphages phages (Figure(Figure1 1).). Figure 1. HK97-fold protein compartments. The left side of the figure focuses on encapsulins, nanocompartmentsFigure 1. HK97-fold responsible protein forcompartments. chemical storage The andleft biochemicalside of the reactionsfigure focuses in bacteria on encapsulins, and archaea. Thenanocompartments right side of the figure responsible focuses onfor viruses chemical and stor geneage transfer and biochemical agents. The reactions viruses belong in bacteria to the realmand of Duplodnaviriaarchaea. The right. Tailed side phages of the figure in the focuses phylum onUroviricota viruses andand gene class transferCauviricetes agents. areThe the viruses most belong diverse andto abundant the realm representatives of Duplodnaviria of. Tailed this group. phages in the phylum Uroviricota and class Cauviricetes are the most diverse and abundant representatives of this group. Icosahedral capsids are characterized by the triangulation number T, which determines the number of quasi-equivalentIcosahedral capsids proteins are (complexity) characterized and by the the total triangulation number of proteinsnumber T, forming which thedetermines capsid [18 the,19 ]. number of quasi-equivalent proteins (complexity) and the total number of proteins forming the As illustrated in Figure2, icosahedral capsids can organize their proteins in four di fferent lattices, capsid [18,19]. As illustrated in Figure 2, icosahedral capsids can organize their proteins in four accommodating different stoichiometries of major and minor capsid proteins [19]. The generalized different lattices, accommodating different stoichiometries of major and minor capsid proteins [19]. T-number is Ti(h,k) = αi T0(h,k), where h and k are the steps in the hexagonal sublattice joining two The generalized T-number is Ti(h,k) = αi T0(h,k), where h and k are the steps in the hexagonal consecutive vertices in the icosahedral capsid. T0 is the classic T-number associated with the hexagonal sublattice joining two consecutive vertices in the icosahedral capsid. T0 is the classic T-number lattice and is given by the equation T (h,k) = h2 + hk + k2. The subindex i takes the values h, t, s, and r, associated with the hexagonal lattice0 and is given by the equation T0(h,k) = h2 + hk + k2. The subindex associated,i takes the respectively, values h, t, with s, and the r, hexagonal, associated, trihexagonal, respectively, snubwith hexagonal,the hexagonal, and rhombitrihexagonaltrihexagonal, snub icosahedralhexagonal, lattices. and rhombitrihexagonal The number of major icosahedral capsid proteins lattices. in The a capsid number is 60T of 0major(h,k). capsid The hexagonal proteins latticein a casecapsid only containsis 60T0(h,k). major The capsid hexagonal proteins, lattice that case is, only Th(h,k) contains= T0(h,k), major with capsidαh =proteins,1. If the that size is, of T theh(h,k) major = capsidT0(h,k), protein with is αh conserved = 1. If the size across of the lattices, major the capsid other protei threen latticesis conserved must across include lattices, minor the capsid other proteins three occupyinglattices must the secondary include minor polygons capsid (triangles proteins and occupying squares). Thisthe secondary increasesthe polygons surface (triangles of the capsid and by a factorsquares).αt = This4/3 increases1.33 (trihexagonal), the surface ofα thes = capsid7/3 2.33by a (snubfactor hexagonal),αt = 4/3 ≈ 1.33 and (trihexagonal),αr = 4/3 + 2 α/ ps =3 7/32.49 ≈ ≈ ≈ ≈ (rhombitrihexagonal).2.33 (snub hexagonal), When and combiningαr = 4/3 + 2/ all√3 lattices, ≈ 2.49 (rhombitrihexagonal). the first four elements When of the combining generalized all T-numberlattices, arethe T = first1, 1.33, four 2.33,elements and 2.49,of the containing generalized 60 T-number major capsid are T proteins = 1, 1.33, each.2.33, and The 2.49, fifth containing element of 60 the major series is Tcapsid= 3 and proteins contains each. 180 The major fifth capsidelement proteins. of the series Tailed is T phages = 3 and have contains been 180 observed major capsid to form proteins. capsids adoptingTailed thephages hexagonal have been and observed trihexagonal to form lattices capsid [19s adopting,20], but nothe tailedhexagonal phages and have trihexagonal been observed lattices to MicroorganismsMicroorganisms2020 2020,,8 8,, 1944 x FOR PEER REVIEW 33 ofof 1819 [19,20], but no tailed phages have been observed to form T ≤ 3 capsids (Figure 1). The smallest form T 3 capsids (Figure1). The smallest characterized tailed phage structure corresponds to the characterized≤ tailed phage structure corresponds to the Bacillus phage phi29, which adopts an Bacillus phage phi29, which adopts an elongated structure with icosahedral T = 3 caps and a Q = 5 elongated structure with icosahedral T = 3 caps and a Q = 5 body [12,21,22], and Streptococcus phage body [12,21,22], and Streptococcus phage C1, which adopts a T = 4 icosahedral capsid [10,23]. C1, which adopts a T = 4 icosahedral capsid [10,23]. Figure 2. Icosahedral lattices for T(2,1) = 7 capsids. The label on the top displays the name of the lattice.Figure 2. The Icosahedral blue arrows lattices and for blue T(2,1) dots = display 7 capsids. the The h and label k on steps the intop the displays hexagonal the name lattice, of hthe= lattice.2 and kThe= 1 blue in this arrows case.