bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 Exploring the remarkable diversity of Escherichia coli
2 Phages in the Danish Wastewater Environment, Including
3 91 Novel Phage Species
4 Nikoline S. Olsen 1, Witold Kot 1,2* Laura M. F. Junco2 and Lars H. Hansen 1,2,*
5 1 Department of Environmental Science, Aarhus University, Frederiksborgvej 399, Roskilde, Denmark;
7 2 Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871
8 Frederiksberg C, Denmark; [email protected], [email protected]
9 * Correspondence: [email protected] Phone: +45 28 75 20 53, [email protected] Phone: +45 35 33 38 77
10 11 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation 12 AUFF Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018.
13 Competing interests: The authors declare no competing interests. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2
14 Abstract: Phages drive bacterial diversity - profoundly influencing diverse microbial communities, from
15 microbiomes to the drivers of global biogeochemical cycling. The vast genomic diversity of phages is
16 gradually being uncovered as >8000 phage genomes have now been sequenced. Aiming to broaden our
17 understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples (0.5
18 ml) and identified 136 phages of which 104 are unique phage species and 91 represent novel species,
19 including several novel lineages. These phages are estimated to represent roughly a third of the true diversity
20 of Escherichia phages in Danish wastewater. The novel phages are remarkably diverse and represent four
21 different families Myoviridae, Siphoviridae, Podoviridae and Microviridae. They group into 14 distinct clusters
22 and nine singletons without any substantial similarity to other phages in the dataset. Their genomes vary
23 drastically in length from merely 5 342 bp to 170 817 kb, with an impressive span of GC contents ranging
24 from 35.3% to 60.0%. Hence, even for a model host bacterium, in the go-to source for phages, substantial
25 diversity remains to be uncovered. These results expand and underlines the range of Escherichia phage
26 diversity and demonstrate how far we are from fully disclosing phage diversity and ecology.
27 Keywords: bacteriophage; wastewater; Escherichia coli; diversity; genomics; taxonomy; coliphage
28
29 1. Introduction
30 Phages are important ecological contributors, they renew organic matter supplies in nutrient cycles and
31 drive bacterial diversity by enabling co-existence of competing bacteria by “Killing the winner” and by serving
32 as genomic reservoirs and transport units [1,2]. Phage genomes are known to entail auxiliary metabolism
33 genes (AMGs), toxins, virulence factors and even antibiotic resistance genes [3–7] and through lysogeny and
34 transduction they can transfer metabolic traits to their hosts and even confer immunity against homologous
35 phages [1]. Still, in spite of their ecological role, potential as antimicrobials and the fact that they carry a
36 multitude of unknown genes with great potential for biotechnological applications, phages are vastly
37 understudied. Less than 9000 phage genomes have now been published, and though the number increases
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
3
38 rapidly, we may have merely scratched the surface of the expected diversity [8]. It has been estimated that at
39 least a billion bacterial species exist [9], hence only phages targeting a tiny fraction of potential hosts have been
40 reported. Efforts to disclose the range and diversity of phages targeting a single host, have revealed a stunning
41 display of diversity. The most scrutinized phage host is the Mycobacterium smegmatis, for which the Science
42 Education Alliance Phage Hunters program has isolated more than 4700 phages and fully sequenced 680
43 phages which represent 30 distinct phage clusters [10,11]. This endeavour has provided a unique insight into
44 viral and host diversity, evolution and genetics [12–15]. No other phage host has been equally targeted, but
45 Escherichia coli phages have been isolated in fairly high numbers. The International Committee on Taxonomy
46 of Viruses (ICTV) has currently recognised 158 phage species originally isolated on E. coli [16], while 2732
47 Escherichia phage genomes have been deposited in GenBank [17]. As phages are expected to have an
48 evolutionary potential to migrate across microbial populations, host species may not be an ideal indicator of
49 relatedness, but it serves as an excellent starting point to explore phage diversity. Hierarchical classification
50 of phages is complicated by the high degree of horizontal gene transfer, consequently several classification
51 systems have been proposed [18–20], and we may not yet have reached a point where it is reasonable to
52 establish the criteria for a universal system [20]. Nonetheless, a system enabling a mutual understanding and
53 exchange of knowledge is needed. Accordingly, we have in this study chosen to classify our novel phages
54 according to the ICTV guidelines [21].
55 Here we aim to expand our understanding of Escherichia phage diversity. Earlier studies are few, have
56 relied on more laborious and time-consuming methods, or are based on in silico analyses of already published
57 phage genomes [8]. Accordingly, Korf et al., (2019) isolated 50 phages on 29 individual E. coli strains, while
58 Jurczak-Kurek et al., (2016) isolated 60 phages on a single E. coli strain, both finding a broad diversity of
59 Caudovirales and also representatives of novel phage lineages [22,23]. We hypothesized, that a high-throughput
60 screening of nearly 200 samples in time-series, using a single host, would expand the number of known phages
61 by facilitating the interception of the prevailing phage(s) of the given day.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
4
62 2. Materials and Methods
63 The screening for Escherichia phages was performed with the microplate based High-throughput screening
64 method as described in (not published). With the exception that lysates of wells giving rise to plaques were
65 sequenced without further purification. In short, an overnight enrichment was performed in microplates with
66 host culture, media and wastewater (0.5 ml per well), followed by filtration (0.45 µm), a purification step by
67 re-inoculation, a second overnight incubation and filtration (0.45µm), and then a spot-test (soft-agar overlay)
68 to indicate positive wells. All procedures were performed under sterile conditions.
69 2.1.1 Samples Bacteria and media
70 Inlet Wastewater samples (188) were collected (40-50 ml) in time-series (2-4 days spanning 1-3 weeks),
71 from 48 Danish wastewater treatment facilities (rural and urban) geographically distributed on Zealand,
72 Funen and in Jutland, during July and August 2017. The samples were centrifuged (9000 x g, 4 °C, 10 min) and
73 the supernatant filtered (0.45 µm) before storage in aliquots (-20°C) until screening. The host bacterium is E.
74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration
75 50 mM). Collected phages were stored in SM-buffer [24] at 4°C.
76 2.1.2 Sequencing and genomic characterisation
77 DNA extractions, clean-up (ZR-96 Clean and Concentrator kit, Zymo Research, Irvine, CA USA) and
78 sequencing libraries (Nextera® XT DNA kit, Illumina, San Diego, CA USA) were performed according to
79 manufacturer’s protocol with minor modifications as described in Kot et al., (2014) [25]. The libraries were
80 sequenced as paired-end reads on an Illumina NextSeq platform with the Mid Output Kit v2 (300 cycles). The
81 obtained reads were trimmed and assembled in CLC Genomics Workbench version 10.1.1. (CLC BIO, Aarhus,
82 Denmark). Overlapping reads were merged with the following settings: mismatch cost: 2, minimum score: 15,
83 gap cost: 3 and maximum unaligned end mismatches: 0, and then assembled de novo. Additional control
84 assemblies were constructed using SPAdes version 3.12.0 [26]. Phage genomes were defined as contigs with
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
5
85 an average coverage > x 20 and a sequence length ≥ 90% of closest relative. Gene prediction and annotation
86 were performed using a customized RASTtk version 2.0 [27] workflow with GeneMark [28], with manual
87 curation and verification using BLASTP [29], HHpred [30] and Pfam version 32.0 [31], or de novo annotated
88 using VIGA version 0.11.0 [32] based on DIAMOND searches (RefSeq Viral protein database) and HMMer
89 searches (pVOG HMM database). All genomes were assessed for antibiotic resistance genes, bacterial
90 virulence genes, type I, II, III and IV restriction modification (RM) genes and auxiliary metabolism genes
91 (AMGs) using ResFinder 3.1 [33,34], VirulenceFinder 2.0 [35], Restriction-ModificationFinder 1.1 (REBASE)
92 [36] and VIBRANT version 1.0.1 [37], respectively. The 104 unique phage genomes were aligned to viromes of
93 BioProject PRJNA545408 in CLC Genomics Workbench and deposited in Genbank [17].
94 2.1.3 Genetic analyses
95 Nucleotide (NT) and amino acid (AA) similarities were calculated using tools recommended by the ICTV
96 [38], i.e. BLAST [29] for identification of closest relative (BLASTn when possible, discontinuous megaBLAST
97 (word size 16) for larger genomes) and Gegenees version 2.2.1 [39] for assessing phylogenetic distances of
98 multiple genomes, for both NTs (BLASTn algorithm) and AAs (tBLASTx algorithm) a fragment size of 200 bp
99 and step size 100 bp was applied. NT similarity was determined as percentage query cover multiplied by
100 percentage NT identity. Novel phages are categorised according to ICTV taxonomy. The criterion of 95% DNA
101 sequence similarity for demarcation of species was applied to identify novel species representatives and to
102 determine uniqueness within the dataset. Evolutionary analyses for phylogenomic trees were conducted in
103 MEGA7 version 2.1 (default settings) [40]. These were based on the large terminase subunit (Caudovirales), a
104 gene commonly applied for phylogenetic analysis [41,42] and on the DNA replication gene (gpA)
105 (Microviridae). The NT sequences were aligned by MUSCLE [43] and the evolutionary history inferred by the
106 Maximum Likelihood method based on the Tamura-Nei model [44]. The trees with the highest log likelihood are
107 shown. Pairwise whole genome comparisons were performed with Easyfig 2.2.2 [45] (BLASTn algorithm), these
108 were curated by adding color-codes and identifiers in Inkscape version 0.92.2. The R package iNEXT [46,47] in
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
6
109 R studio version 1.1.456 [48] was used for rarefaction, species diversity (q = 0, datatype: incidence_raw),
110 extrapolation hereof (estimadeD) and estimation of sample coverage. The visualisation of genome sizes and
111 GC contents was prepared in Excel version 16.31.
112 3. Results and Discussion
113 3.2. Phylogenetics, taxonomy and species richness
114 The sequenced phages were analysed strictly in silico, focusing on their relatedness to known phages,
115 their taxonomy and any distinctive characteristics. Based on the confirmed morphology of closely related
116 phages, at least four different families are represented, Myoviridae (58%), Siphoviridae (25%), Podoviridae (7.4%)
117 and even the single stranded DNA (ssDNA) Microviridae (5.9%) (Table 1). Five (3.7%) of the phages are so
118 distinct that they could not with certainty be assigned to any family. A similar distribution was found by Korf
119 et al., (2019), i.e. 70% Myoviridae, 22% Siphoviridae and 8% Podoviridae, just as an analysis of the genomes of
120 Caudovirales infecting Enterobacteriaceae, by Grose & Casjens (2014), also identified more clusters belonging to
121 the Myoviridae, than the Siphoviridae and fewest of the Podoviridae [8,22]. Jurczak-Kurek et al., (2016) found more
122 siphoviruses than myovirus, but also found the Podoviridae to be the least abundant [23]. Nonetheless, these
123 distributions are just as likely to be caused by culture or isolation methods, as they are to reflect true
124 abundances. Based on DNA homology and phylogenetic analysis, the 104 unique phages identified in this
125 study group into 14 distinct clusters and nine singletons, having a <1% Gegenees score between clusters and
126 singletons, with the exceptions of cluster IV, V and Jahat which have inter-Gegenees scores of up to 20%, cluster
127 IX, X and XI which have inter-Gegenees scores of 1-8% and cluster XIII, Sortsne and Sortkaff which have inter-
128 Gegenees scores of 6-14% (Figure 1, Table 2). Excluding one phage in cluster I and two phages in cluster XIV,
129 the intra-Gegenees score of all clusters is above 39% (Figure 1).
130
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
131 Heat plot for: Comparison[ALL_090419], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/All_coliphages/All_110419.html
Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 1: MG_194_C2_Esch_VpaE1 100 82 79 78 82 82 81 80 82 80 83 80 77 74 78 79 76 79 83 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2: MG_196_C1_Esch_AYO145A 76 100 75 77 83 80 77 78 78 78 78 76 78 73 75 78 77 76 80 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3: MG_191_C1_Esch_AYO145A 75 76 100 77 78 77 84 75 80 78 77 83 76 72 77 75 79 77 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4: MG_189_C2_Esch_Alf5 73 77 76 100 78 76 76 75 77 77 77 77 78 71 75 76 76 75 77 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 5: MG_160_C2_Esch_AYO145A 78 83 77 79 100 83 80 84 79 82 81 81 80 73 76 83 79 78 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6: MG_106_C1_Esch_VpaE1 80 84 79 79 85 100 81 83 83 82 82 81 81 76 79 82 81 82 84 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7: MG_079_C1_Esch_VpaE1 76 78 84 76 80 78 100 80 80 79 80 81 78 75 78 79 79 78 78 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8: MG_053_C1_Esch_Alf5 77 80 76 77 85 81 81 100 78 79 83 82 77 73 76 84 77 76 82 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9: MG_198_C3_Esch_VpaE1 77 79 80 78 79 80 80 78 100 79 81 80 78 73 79 80 80 80 81 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10: MG_049_C2_Esch_AYO145A 76 80 78 79 83 80 80 78 80 100 78 79 77 74 77 79 79 76 80 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11: MG_087_C3_Esch_Alf5 79 79 78 78 82 80 81 82 82 78 100 83 79 73 78 84 79 79 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12: MG_156_C2_Esch_O157 76 78 84 79 82 78 82 81 81 79 82 100 78 75 78 82 82 79 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13: MG_172_C1_Esch_Alf5 74 80 77 79 81 80 79 77 78 77 78 79 100 75 81 78 79 79 79 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14: MG_047_C1_Sal_VSe11 74 78 76 75 78 77 80 76 78 77 77 79 79 100 80 76 80 76 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15: MG_131_C5_Esch_AYO145A 74 77 78 76 76 77 78 75 80 77 78 78 80 76 100 76 80 76 78 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16: MG_137_C3_Esch_Alf5 77 81 77 79 85 81 81 86 82 80 85 84 79 74 78 100 84 78 84 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17: MG_081_C1_Sal_Mushroom 74 79 80 79 81 81 81 79 82 80 80 83 80 77 81 83 100 80 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18: MG_084_C1_Esch_VpaE1 76 78 78 77 79 81 80 76 81 76 79 80 80 73 77 77 80 100 79 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 II 19: MG_008_C1_Sal_Mushroom 78 81 76 77 83 81 78 81 81 79 82 81 78 73 78 81 79 78 100 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20: MG_017_C2_Esch_SUSP2 21 20 20 19 20 20 20 19 19 21 20 21 19 19 20 20 20 20 21 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21: MG_179_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 92 88 92 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22: MG_187_C21_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 91 93 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23: MG_044_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 90 100 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24: MG_107_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 93 91 100 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25: MG_138_C36_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 89 92 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26: MG_185_C4_Esch_Murica 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 90 84 84 88 83 86 90 91 87 90 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27: MG_197_C2_Esch_slur12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 84 85 87 85 86 88 90 86 89 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 III 28: MG_132_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 83 100 83 86 85 87 85 85 88 85 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29: MG_086_C4_Esch_0157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 83 82 100 82 80 81 79 80 84 80 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30: MG_097_C2_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 85 85 83 100 86 87 86 87 86 88 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31: MG_120_C1_esch_2JES-2013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 85 86 82 87 100 87 85 86 89 85 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32: MG_130_C1_Ent_FV3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86 85 87 82 86 86 100 89 88 88 87 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33: MG_150_C5_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 87 85 80 87 84 89 100 91 86 91 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 34: MG_160_C1_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 89 85 81 87 85 88 91 100 88 92 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35: MG_183_C17_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87 85 87 84 86 87 87 86 87 100 87 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36: MG_003_R1_C1_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 88 85 82 89 85 88 91 92 88 100 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37: MG_039_C5_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 88 84 81 88 85 88 90 92 87 93 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38: MG_184_C35_Esch_ST32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39: MG_099_C3_Esch_SLUR25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IV 40: MG_001_C1_Sal_wksl3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 41: MG_106_C6_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 71 80 79 71 77 75 76 9 8 10 18 8 10 10 7 9 8 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42: MG_182_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 100 76 76 74 80 79 78 10 9 10 18 9 10 10 11 11 11 10 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 43: MG_146_C2_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77 68 100 82 70 73 72 74 8 7 8 16 7 9 8 7 9 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 44: MG_010_C12_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 76 69 83 100 73 74 76 76 6 7 7 15 7 8 9 8 9 9 9 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45: MG_187_C4_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 77 100 73 75 76 11 10 10 18 9 11 11 10 11 9 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46: MG_086_C1_Sal_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75 74 74 74 70 100 78 84 7 8 9 17 8 9 9 7 10 9 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 47: MG_142_C100_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 73 74 77 72 78 100 80 9 8 9 17 8 9 9 8 10 8 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48: MG_193_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 75 72 83 79 100 9 10 10 17 9 10 10 7 10 9 12 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 49: MG_145_C1_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 8 6 10 7 9 9 100 20 20 19 17 19 20 14 16 14 14 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 50: MG_081_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 10 8 8 9 19 100 85 79 84 87 88 76 76 77 78 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51: MG_164_C9_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 8 7 10 9 9 10 20 85 100 88 89 89 90 74 76 75 76 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52: MG_187_C12_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 16 15 17 17 17 18 18 78 87 100 80 84 82 67 69 69 70 82 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53: MG_126_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 9 8 8 9 17 86 90 82 100 91 93 76 77 76 78 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54: MG_049_C5_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 9 8 10 9 9 10 19 87 89 84 89 100 91 73 76 76 77 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55: MG_190_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 10 8 9 11 9 9 11 20 89 90 83 91 91 100 74 78 78 78 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56: MG_022_C3_Esch_phAPEEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 10 7 8 10 7 8 7 14 77 74 68 75 73 74 100 84 79 83 74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57: MG_086_C6_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 10 9 9 11 10 9 11 15 76 76 70 75 76 77 84 100 80 83 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 VI 58: MG_135_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 9 9 9 9 14 76 74 69 73 75 76 78 79 100 81 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59: MG_197_C3_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 10 9 9 12 15 78 76 70 76 76 78 82 82 81 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 60: MG_192_C21_Esch_Swan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 8 7 7 9 8 8 9 20 88 90 83 91 92 91 74 77 77 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61: MG_022_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 84 82 85 78 4 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62: MG_091_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 100 86 77 81 5 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 VII 63: MG_136_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 85 100 80 82 5 4 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64: MG_187_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 80 84 100 80 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65: MG_145_C3_Esch_ESCO13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 80 82 77 100 4 3 4 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 66: MG_170_C16_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 100 92 91 91 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 VIII 67: MG_188_C1_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 4 3 3 91 100 90 91 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68: MG_146_C3_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 5 3 4 90 91 100 88 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IX 69: MG_048_C2_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 90 91 88 100 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 70: MG_089_C1_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 91 93 90 91 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 71: MG_199_C3_Esch_ECD7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X 72: MG_148_C1_Ent_RB49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73: MG_002_C1_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 93 2 2 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74: MG_193_C3_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 100 2 3 2 3 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75: MG_002_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 100 88 88 89 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XI 76: MG_109_C1_Esch_APCEc01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 85 100 87 87 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77: MG_107_C1_Esch_PhAPEC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 85 87 100 87 8 7 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78: MG_135_C2_Ent_RB69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 86 87 86 100 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79: MG_110_C1_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 9 8 8 100 87 84 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80: MG_116_C2_Esch_slur13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 87 100 83 86 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81: MG_036_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 85 100 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XII 82: MG_044_C3_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 86 85 100 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83: MG_178_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 86 89 85 86 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84: MG_038_C11_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85: MG_099_C1_Esch_Gluttony 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 72 49 66 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86: MG_180_C4_Esch_Pride 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 71 100 48 66 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87: MG_124_C1_Esch_Sloth 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48 48 100 46 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XIII 88: MG_043_C4_Esch_SECphi18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65 66 47 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89: MG_046_C1_Esch_slur05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 67 49 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90: MG_144_C1_Halfdan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91: MG_038_Ent_St-1C6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 39 0 13 6 0 0 0 0 0 0 0 0 0 92: MG_180_C3_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 100 0 12 7 0 0 0 0 0 0 0 0 0 93: MG_053_C26_Skarpretter 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 94: MG_165_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 11 0 100 14 0 0 0 0 0 0 0 0 0 XIV 95: MG_162_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 0 13 100 0 0 0 0 0 0 0 0 0 96: MG_126_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 22 13 13 15 15 14 15 17 97: MG_039_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 100 27 28 25 25 23 23 25 98: MG_130_C3_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 26 100 90 47 48 65 67 47 99: MG_144_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 28 93 100 48 48 69 68 47 100: MG_200_C27_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 46 46 100 85 41 42 88 101: MG_116_C10_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 48 47 86 100 43 43 84 102: MG_182_C43_Ent_J8+65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 23 66 67 43 43 100 93 44 103: MG_140_C1_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 23 67 66 43 44 93 100 44 104: MG_168_C20_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 24 46 46 88 83 43 44 100
132 Figure 1 Phylogenetic tree (Maximum log Likelihood: -1325.01, large terminase subunit (Caudovirales) or DNA replication protein gpA (Microviridae)), scalebar: substitutions 133 per site, morphology is indicated by symbols (Myoviridae: ☐, Siphoviridae: ▽, Podoviridae: ○, Microviridae: ◇) and phylogenomic nucleotide distances (Gegenees, BLASTn: 134 fragment size: 200, step size: 100, threshold: 0%)
1 of 1 11/04/2019, 11.49 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2
135 Table 1 List of 104 unique Escherichia phages identified in 94 Danish wastewater samples. Novelty (*) is based on the 95% demarcation of species. Similarity is sequence 136 identity (%) times sequence coverage (%) to closest related (Blastn). Taxonomy is based on similarity (BLASTn) to closest related and the ICTV Master Species list.
Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession
Phage (bp) (n) (n) (%) (%) number
Tootiki 88257 128 22 39 * 3D Avedøre I Felixounavirus 90,2 MN850647
Mio 83431 121 18 39,1 * 13C Hadsten I Felixounavirus 89,7 MN850631
Allfine 86963 125 20 39 * 14A Hammel I Felixounavirus 91,2 MN850633
Bumzen 87360 126 20 39,1 * 14B, 14D, 15A Hammel, Hinnerup I Felixounavirus 92,5 MN850635
Dune 88511 129 20 39 * 19C Tørring I Felixounavirus 91,5 MN850636
Warpig 86106 127 17 39 * 20A Helsingør I Felixounavirus 93 MN850637
Radambza 86702 127 19 38,9 * 20D Helsingør I Felixounavirus 91,6 MN850639
Ekra 87282 128 20 38,9 * 21C Herning I Felixounavirus 92,9 MN850644
Humlepung 85311 119 19 39,1 * 12B. 25B, 37A Drøsbro, Gram, Bogense I Felixounavirus 92,1 MN850564
Finno 87554 129 20 38,9 * 31C Skovby I Felixounavirus 89,7 MN850619
Garuso 85798 130 20 38,9 * 29A, 33A Nustrup, Sommersted I Felixounavirus 90,9 MN850566
Momo 88168 130 20 39 * 38D Hofmansgave I Felixounavirus 90,7 MN850580
Esbjerg Ø, Bevtoft, Vojens, Bogense, Heid 87590 126 20 39 * 8C, 24B, 24C, 34D, 37D, 38B I Felixounavirus 91,2 MN850577 Hofmansgave
Skuden 87263 131 20 38,9 * 41D Odense NV I Felixounavirus 91,1 MN850585
Pinkbiff 88814 129 20 39 * 46A Marselisborg I Felixounavirus 93,9 MN850603
Fjerdesal 87715 128 21 39 * 3A, 39A, 46C Avedøre, Hårslev, Marselisborg I Felixounavirus 90,6 MN850605
Andreotti 83391 117 20 39,2 * 47B Egå I Felixounavirus 91,9 MN850610
Nataliec 89137 134 20 39 * 47D, 48C Egå, Viby I Felixounavirus 90,3 MN850611
Adrianh 88226 128 19 38,9 * 42C, 48B Otterup, Viby I Felixounavirus 91,1 MN850614
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
3
Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession
Phage (bp) (n) (n) (%) (%) number
Mistaenkt 86664 128 22 47,2 * 6A Kolding I Suspvirus 91,1 MN850587
Nimi 137039 213 5 43,7 * 11C Varde III Vequintavirus 93,3 MN850626
Navn 141707 224 4 43,6 * 21B Herning III Vequintavirus 91,1 MN850642
Nomine 137991 220 5 43,6 * 23A Lemvig III Vequintavirus 91,5 MN850649
Naswa 138583 222 5 43,6 * 45A Ålborg Ø III Vequintavirus 93,1 MN850595
naam 137129 215 5 43,7 31D Skovby III Vequintavirus 94,5 MN850630
Ime 137114 217 5 43,6 * 12C, 36B Drøsbo III Vequintavirus 93,1 MN850576
Magaca 135826 217 5 43,6 48A Viby III Vequintavirus 96 MN850612
Nom 136114 213 5 43,6 * 28D Jegerup III Vequintavirus 92,6 MN850646
Isim 138289 219 5 43,6 * 31B Skovby III Vequintavirus 93,8 MN850597
Nomo 137702 218 5 43,7 * 38D Hofmansgave III Vequintavirus 93,3 MN850578
Inoa 138710 220 5 43,6 * 44C Ålborg V III Vequintavirus 92 MN850593
Pangalan 136944 215 5 43,7 2A, 45D, 48D Grindsted, Egå, Viby III Vequintavirus 94,8 MN850627
Tuntematon 150473 279 11 39,1 * 7B, 7C Esbjerg V VI Myoviridae 89,6 MN850618
Anhysbys 149335 271 11 39,1 * 22C Hillerød VI Myoviridae 91,5 MN850648
Ukendt 150947 266 11 39 * 32D Skrydstrup VI Myoviridae 88,7 MN850565
Nepoznato 151514 265 10 38,9 * 19D, 35A, 48A, 48D Tørring, Årøsund, Viby VI Myoviridae 85,6 MN850571
Nieznany 144998 254 11 39,1 * 45C Ålborg Ø VI Myoviridae 88,9 MN850598
Muut 146307 243 13 37,4 * 35B Årøsund VII Myoviridae 92 MN850573
Alia 147009 246 13 37,5 * 13D Hadsten VII Myoviridae 93,1 MN850632
Outra 145482 246 13 37,4 * 22A Hillerød VII Myoviridae 93,8 MN850645
Inny 147483 247 13 37,4 * 45D Ålborg Ø VII Myoviridae 92,4 MN850601
Arall 145715 242 13 37,4 41B Odense NØ VII Myoviridae 94,6 MN850584
Kvi 163673 266 - 40,5 * 48C Viby VIII Krischvirus 94,2 MN850615
Kaaroe 163719 267 - 40,5 35D Årøsund VIII Krischvirus 94,7 MN850574
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
4
Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession
Phage (bp) (n) (n) (%) (%) number
Dhabil 165644 266 3 39,5 * 1B Billund IX Dhakavirus 87,5 MN850621
Dhaeg 170817 278 3 39,4 * 47A, 47C Egå IX Dhakavirus 87,4 MN850609
Mogra 168724 263 2 37,7 * 25C Gram X Mosigvirus 91,1 MN850579
Mobillu 163063 255 2 37,7 1B Billund X Mosigvirus 94,5 MN850622
Moha 168676 267 2 37,6 26A Haderslev X Mosigvirus 94,8 MN850590
Moskry 169410 269 2 37,6 * 32C Skrydstrup X Mosigvirus 93,2 MN850651
Teqskov 165017 257 6 35,4 * 10D Skovlund XI Tequatrovirus 91,7 MN895437
Teqdroes 166833 269 10 35,4 * 12D Drøsbro XI Tequatrovirus 88,6 MN895438
Teqhad 167892 270 10 35,3 * 26B Haderslev XI Tequatrovirus 90,1 MN895434
Teqhal 168070 266 11 35,4 * 27A, 27D Halk XI Tequatrovirus 93,9 MN895435
Teqsoen 166468 268 10 35,5 * 43B Søndersø XI Tequatrovirus 91,7 MN895436
Flopper 52092 78 1 44,2 * 44D Ålborg V - Myoviridae 87 MN850594
Damhaus 51154 89 - 44,1 * 4B Damhusåen IV Hanrivervirus 85,8 MN850602
Herni 50971 89 - 44,1 * 21B, 30D Herning, Over Jerstal IV Hanrivervirus 87,6 MN850640
Grams 49530 83 - 44,1 * 25B Gram IV Hanrivervirus 87,1 MN850567
Aaroes 51662 92 - 44,1 * 35B Årøsund IV Hanrivervirus 83 MN850572
Aalborv 46660 79 - 43,9 * 44B Ålborg V IV Hanrivervirus 86,9 MN850591
Haarsle 48613 85 - 44 * 39B, 45C Hårslev, Ålborg Ø IV Hanrivervirus 87,1 MN850600
Egaa 51643 87 - 44,1 * 47A Egå IV Hanrivervirus 89,7 MN850607
Vojen 50709 86 - 44,1 * 34B Vojens IV Hanrivervirus 89,7 MN850569
Tiwna 51014 85 - 44,6 * 21B Herning V Tunavirinae 87,2 MN850643
Tonijn 51627 86 - 44,6 * 32B, 32C Skrydstrup V Tunavirinae 88,4 MN850641
Tonnikala 51277 86 - 44,8 * 48A Viby V Tunavirinae 86,4 MN850613
Atuna 50732 88 - 44,6 * 7B Esbjerg V V Tunavirinae 84,9 MN850620
Tunus 51111 87 - 44,8 * 20A Helsingør V Tunavirinae 93,7 MN850638
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
5
Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession
Phage (bp) (n) (n) (%) (%) number
Orkinos 49798 81 - 44,6 * 30B, 30D Over Jerstal V Tunavirinae 91,3 MN850586
Ityhuna 50768 86 - 44,7 * 39D Hårslev V Tunavirinae 93,3 MN850582
Tonn 51012 87 - 44,5 * 8C, 45C Esbjerg Ø, Ålborg Ø V Tunavirinae 94 MN850596
Tinuso 50856 86 - 44,8 14A Hammel V Tunavirinae 97,3 MN850634
Tunzivis 50596 84 - 44,6 46B Marselisborg V Tunavirinae 94,5 MN850604
Tuinn 50505 86 - 44,7 46D Marselisborg V Tunavirinae 94,8 MN850606
Jahat 51101 87 - 45,7 * 35A Vojens - Tunavirinae 68,5 MK552105
Bob 45252 63 - 54,5 * 12C Drøsbro XII Dhillonvirus 88,6 MN850628
Mckay 44443 63 - 54,5 * 13B Hadsten XII Dhillonvirus 83,8 MN850629
Jat 44417 63 - 54,5 * 23C Lemvig XII Dhillonvirus 89,4 MN850650
Rolling 46017 64 - 54,2 * 29D Nustrup XII Dhillonvirus 80,2 MN850575
Welsh 45207 62 - 54,6 * 23A, 43D Lemvig, Søndersø XII Dhillonvirus 83,8 MN850589
Buks 40308 62 - 49,7 * 1A Billund - Jerseyvirus 91,3 MN850616
Skure 59474 92 - 44,6 * 23C Lemvig - Seuratvirus 90,4 MK672798
Halfdan 42858 57 - 53,7 * 34D Vojens - Siphoviridae 28,8 MH362766
Lilleen 5342 6 - 46,9 * 33A Sommersted II Gequatrovirus 93,8 MK629526
Lilleput 5490 6 - 47 * 12D Drøsbro II Gequatrovirus 93,4 MK629525
Lilleto 5492 6 - 46,8 * 43C, 43D, 44B Søndersø, Ålborg V II Gequatrovirus 92,7 MK629529
Lilledu 5483 6 - 47,2 * 25C Gram II Gequatrovirus 92,6 MK791318
Lillemer 5492 6 - 47,1 45C Ålborg Ø II Gequatrovirus 94,5
Lilleven 6090 9 - 44,4 * 11B Varde - Alphatrevirus 93,9 MK629527
Sortsyn 42116 61 - 59 * 11B Varde XIII Unclassified 92,3 MN850623
Sortregn 38200 53 - 59,3 43D Søndrsø XIII Unclassified 97,3 MN850588
Skarpretter 42042 63 - 55,8 * 15A Hinnerup - Unclassified 37,9 MK105855
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
6
Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession
Phage (bp) (n) (n) (%) (%) number
Sortkaff 42538 61 - 59,5 * 39B Hårslev - Unclassified 89,8 MN850581
Sortsne 41912 62 - 60 * 40A Odense NV - Unclassified 67,6 MK651787
Aldrigsur 42379 55 - 55,7 * 44B Ålborg V XIV Autographvirinae 71,9 MN850592
Altidsur 42197 53 - 55,7 * 33D Sommersted XIV Autographvirinae 71,8 MN850568
Forsur 42476 56 - 55,4 * 48D Viby XIV Autographvirinae 72 MN850617
Glasur 42507 56 - 55,4 * 40D Odense NV XIV Autographvirinae 72,3 MN850583
Lidtsur 42291 56 - 54,6 * 30B Over Jerstal XIV Autographvirinae 69 MK629528
Megetsur 42132 54 - 55,8 * 31B Skovby XIV Autographvirinae 73,1 MN850608
Mellemsur 40770 50 - 55,8 * 34D Vojens XIV Autographvirinae 76,4 MN850570
Smaasur 41110 50 - 55,4 * 11C Varde XIV Autographvirinae 93,3 MN850625
Usur 41906 51 - 55,4 * 27A, 27D Halk XIV Autographvirinae 73,3 MN850624
137 138
139 The high-throughput screening method favours easily culturable plaque-forming lytic phages. Still, we identified 104 unique Escherichia phages of
140 which only 16% were ≥95% similar (BLAST) to already published phages (Table 2). Phages were identified in wastewater samples from 43 of the 48
141 investigated treatment facilities. From the majority of positive samples (58) a single phage was sequenced, however in some samples the lysate held
142 more than one phage. Twenty-five of the lysates held two phages, eight lysates held three phages and one had as many as four phages. Of the 104
143 unique phages, 91 represent novel species (Table 1, 2). Of these, 51 differed by ≥10% from published phage genomes and some have NT similarities as
144 low as 29% (Table 2).
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
145 These newly sequenced phages represent a substantial quota of divergent lytic Escherichia phages in Danish
146 wastewater, but are still far from disclosing the true diversity hereof (Figure 2). An extrapolation of species
147 richness (q = 0) predicts a total of 292 distinct species (requiring a sample size of ∼900 phages). The relatively
148 small sample-size in this study (n = 136), may subject the estimation to a large prediction bias. The sampling-
149 method also introduces a bias by selecting for abundance and burst size, thereby potentially underestimating
150 diversity. Nonetheless, the results provide an indication of the minimal diversity of lytic Escherichia (MG1655,
151 K-12) phages in Danish wastewater, estimated to be as a minimum be in the range of 160 to 420 unique phage
152 species (Figure 2b). The novelty and diversity of these wastewater phages is truly remarkable and verifies our
153 hypothesis, as well as the efficiency of the High-throughput Screening Method for exploring diverse phages of a
154 single host (not published).
400 1.00
300 0.75
200 0.50 Species diversity Sample coverage
100 0.25
0 0.00
0 250 500 750 0 250 500 750 Number of sampling units Number of sampling units
interpolated extrapolated interpolated extrapolated
(a) (b) 155 Figure 2 (a) Sample-size-based rarefaction and extrapolation curve with confidence intervals (0.95). (b) Sample 156 completeness curve with confidence intervals (0.95).
157 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2
158 Table 2 Taxonomic distribution of phages identified in 94 Danish wastewater samples, 159 based on similarity to closest related and the ICTV Master Species list. 1. ≤95% similarity 160 to other phages in the dataset. 2. ≤95% similarity to other phages in the dataset and to 161 published phage genomes. Taxonomy* total unique1 novel2 Caudovirales; Myoviridae; Ounavirinae; Felixounavirus 33 19 19 Caudovirales; Myoviridae; Ounavirinae; Suspvirus 1 1 1 Caudovirales; Myoviridae; Tevenvirinae; Krischvirus 2 2 1 Caudovirales; Myoviridae; Tevenvirinae; Dhakavirus 3 2 2 Caudovirales; Myoviridae; Tevenvirinae; Mosigvirus 4 4 2 Caudovirales; Myoviridae; Tevenvirinae; Tequatrovirus 6 5 5 Caudovirales; Myoviridae; Vequintavirinae; Vequintavirus 15 12 9 Caudovirales; Myoviridae 15 11 10 Caudovirales; Podoviridae; Autographivirinae 10 9 9 Caudovirales; Siphoviridae; Dhillonvirus 6 5 5 Caudovirales; Siphoviridae; Guernseyvirinae; Jerseyvirus 1 1 1 Caudovirales; Siphoviridae; Seuratvirus 1 1 1 Caudovirales; Siphoviridae; Tunavirinae; Hanrivervirus 10 8 8 Caudovirales; Siphoviridae; Tunavirinae 15 12 8 Caudovirales; Siphoviridae 1 1 1 Microviridae; Bullavirinae; Alphatrevirus 1 1 1 Microviridae; Bullavirinae; Gequatrovirus 7 5 4 Unclassified bacterial viruses 5 5 4 Total 136 104 91 162
163 3.2. Phage genome characteristics
164 The sequenced phage genomes range in sizes from 5 342 bp of the Microviridae lilleen to a span in the
165 Caudovirales from 38 200 bp of the unclassified sortregn to 170 817 bp of the Dhakavirus dhaeg (Figure 3, Table
166 2). GC contents also vary greatly, from only 35.3% (Tequatrovirus teqhad) and up to 60.0% (the unclassified
167 sortsne) (Figure 3,Table 2), heavily diverging from the host GC content of 50.79%. Phages often have a lower
168 GC contents than their host [49], as observed for the majority (81%) of the wastewater phages (Table 2). A
169 lower GC content of phage genomes tends to correlate with an increase in genome size [50], as is also the case
170 for the Caudovirales in this study (Table 2). Rocha & Danchin (2002) hypothesized that phages, along with
171 plasmids and insertion sequences, can be considered intracellular pathogens, and like host dependent bacterial
172 pathogens and symbionts, they experience competition for the energetically expensive and less abundant GTP
173 and CTP and as a consequence develop genomes with comparatively higher AT contents [49]. However, the
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
3
174 differences in AT richness reported by Rocha & Danchin (2002) for dsDNA and isometric ssDNA phages were
175 merely 4.2% and 5% [49], while the dsDNA and isometric ssDNA phages in this study deviate from their
176 isolation host by having up to 31% higher and up to 19% lower AT content (Table 2). Accordingly, the
177 considerable differences in GC/AT contents may instead primarily be a reflection of past host relations [14].
178 The sequencing of lysates and not only plaquing phages, may have contributed in enabling the capture of such
179 a broad GC-content and overall diversity.
65
60 Podoviridae Siphoviridae Myoviridae w. tRNAs Myoviridae no tRNAs Microviridae Unclassifed 55
50
GC content (%) content GC 45
40
35
30 -5 5 15 25 35 45 55 65 75 85 95 105 115 125 135 145 155 165 175 185 180 Genome size (kb) 181 Figure 3 Bubble-diagram of the 104 unique Escherichia phages displaying genome size- and GC-content distribution. Area 182 of bubbles indicates number of phages. The yellow bubble is Podoviridae phages, green ones are Siphoviridae phage clusters, 183 dark-blue ones are Myoviridae phage clusters with tRNAs, the light-blue bubble is Myovridae phages without tRNAs, the 184 red bubble is Microviridae phages and the grey one is unclassified phages. 185
186 The genome screening algorithms identified no sequences coding for homologs of known virulence or
187 antibiotic resistance genes. Though not a definitive exclusion, this interprets as a reduced risk of presence, a
188 preferable trait for phage therapy application. Currently available tools for AMG screening of viromes did not
189 provide a comprehensive and exclusive assessment of the AMG pool in the dataset. The majority of genes
190 identified are not AMGs, but code for phage DNA modification pathways (Table S1). The function of some of
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
4
191 the suggested AMGs is unknown, these include a nicotinamide phosphoribosyltransferase (NAMPT) present
192 in cluster I (Felixounavirus and Suspvirus) and VII (unclassified Myoviridae), a 3-deoxy-7-phosphoheptulonate
193 (DAHP) synthase present in cluster X (Mosigvirus) and a complete dTDP-rhamnose biosynthesis pathway
194 present in cluster VI and VII (unclassified Myoviridae). The presence of a dTDP-rhamnose biosynthesis pathway
195 in the DNA metabolism region of phage genomes is peculiar, one possible explanation is, that these phages
196 utilize rhamnose for glycosylation of hydroxy-methylated nucleotides in the same manner as the T4 generated
197 glucosyl-hmC [51]. Dihydrofolate reductase, identified in six of the Myoviridae clusters, is involved in thymine
198 nucleotide biosynthesis, but is also a structural component in the tail baseplate of T-even phages [52]. Multiple
199 verified DNA modification genes were identified. Methyltransferases, some putative, were detected in all
200 Myoviridae of cluster III (vequintavirus), VI (unclassified), VII (unclassified), VIII (Krischvirus), in all Siphoviridae
201 of cluster IV (Hanrivervirus) and in the unclassified phages of cluster XIII. Indeed, cluster XIII and the singletons
202 skarpretter, sortsne, and sortkaff code for both DNA N-6-adenine-methyltransferases (dam) and DNA cytosine
203 methyltransferases (dcm). Finally, two novel epigenetic DNA hypermodification pathways were identified.
204 The mosigviruses of cluster X code for arabinosylation of hmC [51], while the novel seuratvirus Skure codes
205 for the recently verified complex 7-deazaguanine DNA hypermodification system [53,54].
206 3.3. Forty-eight novel Myoviridae phages species
207 The Myoviridae genomes (56 unique, 49 novel) represents the greatest span in genome sizes in this study,
208 from the unclassified flopper of 52092 bp to the dhakavirus dhaeg of 170817 bp), all, except the krischviruses,
209 code for tRNAs (Figure 3, Table 2). The Myoviridae group into nine distinct clusters and four singletons,
210 representing at least three subfamilies; Tevenvirinae, Vequintavirinae and Ounavirinae in addition to 11
211 unclassified Myoviridae (cluster VI & VII) (Figure 1, Table 2). The eleven novel Tevenvirinae distributes into four
212 distinct genera, two of the Krischvirus (cluster VIII, 164kb, 40.5% GC, no tRNAs), two of the Dhakavirus (cluster
213 IX, 166-170 kb, 39.4-39.5% GC, 3 tRNAs), two of the Mosigvirus (cluster X, 169 kb, 37.6-37.7% GC, 2 tRNAs) and
214 five of the Tequatrovirus (cluster XI, 165-168 kb, 35.3-35.5% GC, 6-11 tRNAs), while the nine novel
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
5
215 Vequintavirinae are all vequintaviruses (cluster III, 136-142 kb, 43.6-43.7% GC, 5 tRNAs) closely related (91.1-
216 93.8%, BLAST) to classified species. The vequintaviruses were identified in samples from 12 treatment
217 facilities. Only reads from two samples of a dataset of human gut viromes based on timeseries of faecal
218 samples from 10 healthy persons [55] mapped to the wastewater phages, and only to vequintaviruses. These
219 reads covered 43-54% and 13-26% of all the Vequintavirus genomes except pangalan and navn, respectively.
220 This suggests that the vequintaviruses are related to entero-phages. All but one of the Ounavirinae (cluster I,
221 82-89 kb, 38.9-39.2% GC, 17-22 tRNAs) are felixounaviruses (89.7-93.9%, BLAST) with an intra-Gegenees score
222 of 71-86% (Figure 1). The Felixounavirus is a relatively large genus with 17 recognized species. In this study,
223 felixounaviruses were identified in samples from 23 of the 48 facilities, indicating that they are both ubiquitous
224 in Danish wastewater, numerous and easily cultivated. The last ounavirus, mistaenkt (86.7 kb, 47.2% GC, 22
225 tRNAs) is a Suspvirus. The five cluster VI phages (144.9-151.5 kb, 39±0.1% GC, 10-11 tRNAs) belong to a
226 phylogenetic distinct clade, the recently proposed ´Phapecoctavirus´, and have substantial similarity (86-90%,
227 BLAST) with the anticipated type species Escherichia phage phAPEC8 (JX561091) [22,56]. Cluster VII have
228 significantly (p = 0.038) larger genomes (145.8-147.5 kb) with marginally, though not significantly (p = 0.786),
229 lower GC contents of 37.4-37.5%, all have 13 tRNAs (Table 2) and code for NAMPT not present in cluster VI
230 (Table S1). As a group, cluster VII are even more homogeneous than cluster VI and all are closely related (92-
231 95%, BLAST) to the same five unclassified Enterobacteriaceae phages vB_Ecom_PHB05 (MF805809),
232 vB_vPM_PD06 (MH816848), ECGD1 (KU522583), phi92 (NC_023693) and vB_vPM_PD114 (MH675927)
233 [57,58], with whom they represent a novel unclassified genera. The distinctive and novel singleton flopper
234 only shares NT similarity (36-87%, BLAST) with six other phages, the Escherichia phages ST32 (MF04458) [59]
235 and phiEcoM_GJ1 (EF460875) [60], the Erwinia phages Faunus (MH191398) [61] and vB_EamM-Y2
236 (NC_019504) [62], and the Pectobacterium phages PM1 (NC_023865) [63] and PP101 (KY087898). This group of
237 unclassified phages all have genomes of 52.1-56.6 kb with 43.6-44.9% GC and while only flopper, phiEcoM_GJ1
238 and PM1 code for a single tRNA, all of them have exclusively unidirectional coding sequences (CDSs) and
239 code for RNA polymerases, characteristics of the Autographivirinae of the Podoviridae [64]. However, the
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
6
240 verified morphology of ST32, phiEcoM_GJ1, PM1 and PP101, icosahedral head, neck and a contractile tail with
241 tail fibres, classifies them as myoviruses [59,60,62,63]. Based on NT similarity and genome synteny, these seven
242 phages belong to the same, not yet classified, peculiar lineage first described by Jamalludeen et al., (2008) [60].
243 3.4. Five novel Microviridae phages species
244 The singleton lilleven (6.1 kb, 44.4% GC, no tRNA) and the five (four novel) cluster II phages (5.3-5.5 kb,
245 46.9-47.2% GC, no tRNAs) are all Microviridae, a family of small ssDNA non-enveloped icosahedral phages
246 (Table 2). Lilleven is closely related to (93.9%, BLAST), have pronounced gene pattern synteny and high AA
247 similarities (89-90%), with the Alphatremicrovirus Enterobacteria phage St1 (Figure S2), and is a novel species
248 of the genus Alphatremicrovirus, subfamily Bullavirinae. The cluster II phages comprise four distinct species,
249 only differing by single nucleotide polymorphisms and in non-coding regions (Figure S2). They have high
250 intra-Gegenees score (88-92%) and share genomic organisation and extensive NT similarity (92.6-94.5-%,
251 BLAST) with the unclassified microvirus Escherichia phage SECphi17 (LT960607), primarily differing by
252 single nucleotide polymorphisms and in noncoding regions (Figure S2, Table 2). Cluster II are only related to
253 one recognised phage species Escherichia virus ID52 (63%, BLAST), genus Gequatrovirus, subfamily Bullavirinae,
254 with whom they predominantly differ in the region in and around the major spike protein (gpG), a distinctive
255 marker of the subfamily Bullavirinae involved in host attachment. The phylogenetic analysis separates cluster
256 II from the gequatroviruses, and both the Gegenees scores (<22%) and NT similarities (<65% BLAST) are
257 moderately low (Figure S2). Nonetheless, cluster II are still, based on pronounced genome organisation synteny
258 and a conserved AA similarity (62-64%, Gegenees) considered to be gequatroviruses (Figure S2).
259 The sequencing of the microviruses is peculiar, as library preparation with the Nextera® XT DNA kit
260 applies transposons targeting dsDNA. However, during microvirus infection the host polymerase converts
261 the viral ssDNA into an intermediate state of covalently closed dsDNA, which is then replicated in rolling
262 circle by viral replication proteins transcribed by the host RNA polymerase [65]. This intermediate state may
263 have enabled the library preparation. The presence of host DNA in the sequence results indicates an
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
7
264 insufficient initial DNase1 treatment, which can be attributed to chemical inhibition or inactivation of the
265 enzyme by adhesion to the sides of wells. Hence, it is reasonable to assume that the extracted microvirus DNA
266 was captured as free dsDNA inside host cells during ongoing infections.
267 3.5. Twenty-five novel Siphoviridae phage species
268 In spite of similar genome sizes, the large group of Siphoviridae, is the most diverse (28 unique, 24 novel)
269 in this study, with GC contents ranging from 43.9-54-6% (Figure 3, Table 2). The majority, clusters IV-V and
270 singleton jahat, are of the subfamily Tunavirinae, while the remaining are allocated into at least three divergent
271 genera, Dhillonvirus, Jerseyvirus and seuratvirus.
272 Cluster IV (46.7-51.6 kb, 43.9-44.1% GC, no tRNAs), V (49.8-51.6 kb, 44.6-44.8% GC, no tRNAs) and jahat
273 (51.1 kb, 45.7% GC, no tRNAs) have low inter-Gegenees scores (7-20%) (Figure 1), yet they form a
274 monophyletic clade and are all closely related (85-94%, BLAST) to published Tunavirinae. The Cluster IV phages
275 are notable, as they represent a significant increase to the small genus Hanrivervirus comprising only the type
276 species Shigella virus pSf-1 (51.8 kb, 44% GC, no tRNAs) isolated from the Han River in Korea [66]. The common
277 ancestry of Cluster IV and pSf-1 is evident by comparable genome sizes, GC contents, genomic organization
278 and substantial NT (86-90%, BLAST) and AA (77-85%, Gegenees) similarities (Table 2, Figure S3). During their
279 differentiation, many deletions and insertions of small hypothetical genes have occurred, most notable is a
280 unique version of a putative tail-spike protein in seven of the cluster IV hanriverviruses, indicating divergent
281 host ranges (Figure S3). All the hanriverviruses code for (putative) dam and Psf-1 is resistant towards at least
282 six restriction endonucleases [66], suggesting they employ DNA methylation as a defence strategy.
283 The five novel dhillonviruses of Cluster XII (44.4-46.0 kb, 54.2-54.6% GC, no tRNAs) have substantial NT
284 similarities with a group of unclassified dhillonviruses (80-89%, BLAST) (Table 2) and the type species
285 Escherichia virus HK578 (77-80%, BLAST). As with the hanriverviruses, and as observed by Korf et al., (2019)
286 [22], their genomes mainly differ in minor hypothetical genes and in putative tail-tip proteins, indicating
287 divergent host ranges (Figure S4). Based on NT similarity and the presence of the canonical 7-deazaguanine
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
8
288 operon the singleton skure (59.4 kb, 44.6% GC, no tRNAs) is a seuratvirus, while the singleton Buks (40.3 kb,
289 49.7% GC, no tRNAs) is assigned to the genus Jerseyvirus, subfamily Guernseyvirinae.
290 3.5.1 Escherichia phage halfdan, a novel lineage within the Siphoviridae
291 Interestingly, the singleton halfdan (42.8 kb, 53.7% GC, no tRNAs) has only miniscule similarity with
292 published phages (12-29%, BLAST). These entail two Pseudomonas phages vB_PaeS_SCUT-S3 (MK165657) and
293 Ab26 (HG962376) [67] both Septimatreviruses, two Acinetobacter phages of the Lokivirus IMEAB3 (KF811200)
294 and type species Acinetobacter virus Loki [68], and to a lesser degree the unclassified Achromobacter phage
295 phiAxp-1 (KP313532) [69]. They have a common genomic organization, yet their intra-Gegenees score is ≤1%
296 and NT similarity is negligent in roughly a third of halfdan’s 57 CDSs (Figure 4b, d). The phylogeny and AA
297 similarities (Gegenees) also indicate a distant relation, although grouping halfdan closer with the lokiviruses
298 (40-43%) than the septimatreviruses (33-34%) (Figure 4a, c). The genome of halfdan is mosaic, resembling the
299 lokiviruses in the structural region and the septimatreviruses in the replication region (Figure 4d).
300 Remarkably, halfdan has no NT similarity with known Escherichia phages, which could be an indication of E.
301 coli not being the natural host, and that halfdan may have to transcended species barriers. However, host
302 range, morphology and other physical characteristics are beyond the scope of this study, and will be
303 determined in future lab-based studies. Notably, both halfdan, Loki and Ab26 code for a (putative) MazG,
304 although there is negligible sequence similarity (Figure 4d). Phage encoded MazG is hypothesized to be
305 involved in restoring protein synthesis in a starved cell, in order to keep the cell alive, and ensure optimal
306 replication conditions [70]. The phylogeny, whole genome alignments and low NT and AA similarities
307 suggests that halfdan is distinct from other known phages (Figure 4). Accordingly, as per the ICTV guidelines,
308 halfdan is the first phage sequenced of a novel Siphoviridae genus. Yet, halfdan is also, by its mosaicism, an
309 indicator of the genetic continuum of phages questioning the validity of taxonomic interpretations.
310
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Heat plot for: Comparison[Halfdan_and_friends_210319],Heat plot for:Alig... Comparison[hlafdan_tblastx_210319],file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_n... Alignment[... file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_2...9
Organism 1 2 Organism3 4 5 6 7 1 2 3 4 5 6 7 1: AcinetobacterAcinetobacter phage phage IME_AB3 IME AB3 1: Acinetobacter100 2 phage0 0 IME_AB30 0 0 100 62 40 26 26 26 23
2: AcinetobacterAcinetobacter phage phage vB_AbaS_Loki vB AbaS Loki 2: Acinetobacter2 100 phage0 0 vB_AbaS_Loki0 0 0 64 100 43 28 27 27 24
Escherichia3: Escherichia phage Halfdanphage halfdan 3: Escherichia0 0 phage100 halfdan1 0 0 0 40 42 100 33 33 33 27
4: PseudomonasPseudomonas phage phage vB_PaeS-SCH_Ab26 vB PaeS SCH Ab264: Pseudomonas0 0 phage0 100 vB_PaeS-SCH_Ab2683 69 0 26 27 33 100 89 77 25
5: PseudomonasPseudomonas phage phage 73 73 5: Pseudomonas0 0 phage0 84 73 100 68 0 26 27 34 89 100 76 24
6: PseudomonasPseudomonas phage phage vB_PaeS_SCUT-S3 vB paeS SCUT-S3 6: Pseudomonas0 0 phage0 69 vB_PaeS_SCUT-S369 100 0 26 27 33 77 77 100 24 7: Achromobacter phage phiAxp-1 7: Achromobacter0 0 0 phage0 phiAxp-10 0 100 23 23 26 24 24 24 100 0.050 Acinetobacter phage phiAxp (a) (b) (c) IME_AB3 100%
Loki 64% MazG Lysis Halfdan Term. Port. Tube Tape m. Coat Polym. Prim.
vB_PaeS_SCH_Ab26
73
vB_PaeS_SCUT-S3
phiAxp-1
(d)
311 Figure 4 (a) Phylogenetic tree (Maximum log Likelihood: -7678.71, large terminase subunit), scalebar: substitutions per 312 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 313 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 314 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 315 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 316 phages identified in this study and deposited in GenBank.
317 3.6. Nine novel Podoviridae phage species
318 The nine novel Podoviridae (cluster XIV, 40.7-42.5 kb, 55.4-55.7% GC, no tRNAs), all with the hallmarks of
319 the Autographvirinae i.e. unidirectionally encoded genes and RNA polymerases [64], have conserved genome
320 organisation with the type species of the Phikmvvirus, Pseudomonas virus phiKMV (42.5 kb, 62.3% GC, no 1 of 1 1 of 1 29/03/2019, 22.19 29/03/2019, 22.19
321 tRNAs) [71], but almost no NT similarity (<1%, BLAST). Besides the unclassified Autographvirinae
322 Enterobacteria phage J8-65 (NC_025445) (40.9 kb, 55.7% GC, no tRNAs), with which Cluster XIV has
323 considerable NT similarity (69-93%, BLAST) and similar GC content, Cluster XIV only share >5% nucleotide
324 similarity (40-42%, BLAST) with the Phikmvvirus Pantoea virus Limezero (43.0 kb, 55.4% GC, no tRNAs) [72]
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
10
325 (Figure 5, Table 2). The Phikmvvirus consists of four species, phiKMV, Limezero, Pantoea virus Limelight (44.5 kb,
326 54% GC, no tRNAs) [72] and Pseudomonas virus LKA1 (41.6 kb, 60.9% GC, no tRNAs) [73]. Heat plot for: Comparison[podos_genuse_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_genus_29031...Heat plot for: Comparison[podos_tblastx_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_tblastx_2903...
Organism 1 2 3 4 5 6 7 8 9 10 Organism11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Escherichia1: phageMG_126_C2_Ent_J8-65 lidtsur 100 22 22 17 15 15 13 13 1:15 MG_126_C2_Ent_J8-6514 1 0 0 0 100 70 69 68 68 68 66 66 66 66 47 19 19 20 0.10 Escherichia2: MG_039_C2_Ent_J8-65 phage smaasur 22 100 84 25 25 25 27 28 2:23 MG_039_C2_Ent_J8-6523 0 0 0 0 72 100 91 72 71 71 72 72 71 71 47 19 20 20 Enterobacteria3: Enterobacteria_phage_J8-65 phage J8-65 23 85 100 27 25 25 27 28 3:24 Enterobacteria_phage_J8-6524 1 0 0 0 70 91 100 72 72 72 73 72 72 72 47 19 20 20 4: MG_168_C20_Ent_J8-65Escerichia phage glasur 17 24 26 100 88 83 46 46 4:44 MG_168_C20_Ent_J8-6543 1 0 0 0 67 69 70 100 92 89 77 75 77 77 46 19 19 20 5: EscherichiaMG_200_C27_Ent_J8-65 phage forsur 16 24 25 88 100 85 46 46 5:42 MG_200_C27_Ent_J8-6541 1 0 0 0 67 69 70 92 100 90 77 77 77 77 46 19 19 20 6:Escerichia MG_116_C10_Ent_J8-65 phage usur 16 24 25 84 86 100 48 47 6:43 MG_116_C10_Ent_J8-6543 1 0 0 0 68 70 70 90 91 100 77 77 77 77 47 19 19 20 7:Escherichia MG_130_C3_Ent_J8-65 phage megetsur 14 26 26 47 47 48 100 90 7:67 MG_130_C3_Ent_J8-6565 0 0 0 0 66 71 71 78 78 77 100 92 84 84 47 19 19 20 8:Eschericia MG_144_C2_Ent_J8-65 phage mellemsur 14 28 28 47 48 48 93 100 8:68 MG_144_C2_Ent_J8-6569 1 0 0 0 68 73 73 78 79 79 95 100 86 86 48 19 19 20 9: MG_140_C1_Ent_J8-65Escherichia phage altidsur 15 23 23 44 43 44 67 66 1009: MG_140_C1_Ent_J8-6593 0 0 0 0 65 70 71 77 77 77 84 83 100 95 46 19 19 20 10: EscherichiaMG_182_C43_Ent_J8+65 phage aldrigsur 14 23 23 44 43 43 66 67 10:93 MG_182_C43_Ent_J8+65100 1 0 0 0 65 69 70 77 77 76 84 83 95 100 47 19 19 20 Pantoea11: phage Pantoea LIMEzero LIMEzero 1 0 1 1 1 1 0 0 11:0 Pantoea0 100 LIMEzero0 0 0 45 45 45 46 46 46 46 46 46 46 100 19 19 20 Pantoea12: phage LIMElight LIMElight 0 0 0 0 0 0 0 0 12:0 LIMElight0 0 100 0 0 19 19 19 19 19 19 19 19 19 19 19 100 19 20 13:Pseudomonas phiKMV phage phiKMV 0 0 0 0 0 0 0 0 13:0 phiKMV0 0 0 100 0 19 19 19 19 19 19 19 19 19 19 19 19 100 27 14: PseudomonasLKA1 phage LKA1- 0 0 0 0 0 0 0 0 14:0 LKA10 0 0 0 100 20 20 20 20 20 20 20 20 20 20 20 20 27 100 (a) (b) (c)
Lidtsur DNA pol. RNA pol. Port. Tail Virion Virion Tailspike Term. 100%
Smaasur 64%
J8-65
Glasur
Forsur
Usur
1 of 1 29/03/2019, 18.17 1 of 1 29/03/2019, 18.23 Megetsur
Mellemsur
Altidsur
Aldrigsur
LIMEzero
LIMElight
phiKMV
LKA1
(d)
327 Figure 5 (a) Phylogenetic tree (Maximum log Likelihood: -11728.26, large terminase subunit), scalebar: substitutions per 328 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 329 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 330 Pairwise alignment of phage genomes (Easyfig, BLASTn), the colour bars between genomes indicate percent pairwise 331 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 332 phages identified in this study and deposited in GenBank.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
11
333 Cluster XIV and J8-65 form a diverse monophyletic clade, with a substantial amount of deletions and
334 insertions between them, subdividing into three sub-clusters with intra-Gegenees scores ≤28% (Figure 5a, d).
335 Phage lidtsur is singled-out and also codes for a unique version of tailspike colanidase, smaasur resembles J8-
336 65 phage and the rest group together (Figure 5). Still, the three sub-clusters have for a large part conserved AA
337 sequences (66-72%, Gegenees) (Figure 5b, c, d). Interestingly, this also applies to Limezero, with whom they
338 have a Gegenees NT score ≤1%, but an AA similarity of 45-48%, supporting the phylogenetic grouping of
339 Limezero and cluster XIV (Figure 5a, b, c). Based on phylogeny, limited NT and low AA similarities it is evident
340 that there is a very distant relation between cluster XIV (and J8-65) and the phikmvviruses (Figure 5). Hence,
341 cluster XIV (and J8-65) are not immediately, according to the ICTV guidelines, considered phikmvviruses.
342 However, the Phikmvvirus already includes phages with minuscule DNA homology infecting divergent hosts
343 [72,73]. The genus delimitation is based on overall genome architecture with the location of a single-subunit
344 RNA polymerase gene adjacent to the structural genes, and not in the early region as in T7-like phages [16].
345 Another characteristic feature of the phikmvvirus is the presence of direct terminal repeats, though not yet
346 verified in all members [72]. Read abundance in non-coding regions in genomic ends of cluster XIV genomes
347 suggests they also have direct terminal repeats of a few hundred bp. In conclusion, we consider the cluster XIV
348 phages (and J8-65) to be of the same lineage as the phikmvviruses, but contemplate that this genus may at
349 some point be divided into at least two independent genera, as we learn more of the nature of the genes which
350 distinguish cluster XIV from the other phikmvviruses on both NT and AA level and account for more 50% of
351 their genomes.
352 3.7. Unclassified bacterial viruses represent two novel lineages
353 The four novel phages sortsyn of cluster XIII and the three singletons sortsne, sortkaff (41.9-42.5 kb, 59-
354 60% GC, no tRNAs) and skarpretter (42.0 kb, 55.8% GC, no tRNAs) all have small genomes and high GC
355 contents (Figure 3) and are suggested to represent two distinct novel lineages. They only share considerable
356 (≥6%, BLAST) NT similarity with four phages, Enterobacteria phage IME_EC2 (KF591601)(41.5 kb, 59.2% GC,
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
12
357 no tRNAs) isolated from hospital sewage [74], Salmonella phage lumpael (MK125141)(41.4 kb, 59.5% GC, no
358 tRNAs) isolated from wastewater of the same sample-set in a previous study, Klebsiella phage
359 vB_KpnS_IME279 (MF614100) (42.5 kb, 59.3% GC, no tRNAs) and Escherichia phage C130_2 (MH363708)(41.7
360 kb, 55.4%Heat GC, plot for: no Comparison[Sortsne_and_the_gang_200319], tRNAs) isolated fromHeat cheese plot Ali... for: Comparison[skarp_and_freinds_tblastx], [75]file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_.... Alignmen... file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_t...
Organism 1 2 3 4Organism5 6 7 8 1 2 3 4 5 6 7 8 Escherichia1: Sortkaff_ny phage start Sortkaff 100 1:86 Sortkaff_ny13 6 start6 6 0 0 100 91 60 50 50 50 33 32 0.050 Klebsiella2: MF614100_IME279 phage vB KpnS IME279 86 1002: MF614100_IME27914 5 5 6 0 0 91 100 60 49 50 49 33 32 3: EscherichiaEscherichia phage phage SortsneSortsne 13 3:14 Escherichia100 13 phage15 Sortsne11 0 0 61 61 100 54 55 54 33 31 4: Sortsyn_newEscherichia start phage sortsyn 7 4:5 Sortsyn_new13 100 start87 42 0 0 51 50 54 100 91 67 33 33 5: KF591601_IME_EC2Enterobacteria phage IME EC2 6 5:5 KF591601_IME_EC215 88 100 40 0 0 51 50 56 92 100 67 33 33 6: MK125141_LumpaelSalmonella phage Lumpael 7 6:7 MK125141_Lumpael12 42 40 100 0 0 51 51 55 68 68 100 33 30 7: MK105855_Skarpretter_newEscherichia phage Skarpretter start 0 7:0 MK105855_Skarpretter_new0 0 0 0 100 start1 33 33 32 33 33 32 100 43
8: MH363708_C130_2Escherichia phage C130 2 0 8:0 MH363708_C130_20 0 0 0 1 100 32 32 31 33 33 30 43 100
(a) (b) (c)
Sortkaff 100%
vB_KonS_IME279 64%
Sortsne
Sortsyn
IME_EC2
Lumpael
Holin Methylation Skarpretter hflc Helica. Tailspike Coat Portal Term. Lysis
C130_2
(d)
361 Figure 6 (a) Phylogenetic tree (Maximum log Likelihood: -8023.43, large terminase subunit), scalebar: substitutions per 362 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 363 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 364 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 365 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 366 phages identified in this study and deposited in GenBank.
367 1 of 1 1 of 1 30/03/2019, 07.37 30/03/2019, 07.38
368 Curiously, IME_EC2 is a confirmed (TEM) Podoviridae, with a short non-contractile tail [74], while C130_2
369 is a confirmed (TEM) Myoviridae with a long contractile tail [75]. Lumpael and vB_KpnS_IME279 are
370 uncharacterised, yet both are assigned as Podoviridae in GenBank [17]. Although sortsne, sortkaff, sortsyn,
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
13
371 vB_KpnS_IME279, IME_EC2 and lumpael form a monophyletic clade, the Gegenees score (5-15%) between
372 sub-clusters is surprisingly low (Figure 6a, b). Still, these six phages have comparable genome sizes (41.5-42.5
373 kb) and organisation, including similar relatively small-sized structural proteins, dam and dcm genes, equally
374 high GC contents (59.5±0.5%) and relatively high AA similarities (49-91%) (Figure 6a, c, Table 2). Hence, they
375 are clearly of the same lineage and likely to resemble IME_EC2 in having Podoviridae morphology. This group
376 is also clearly distinct from all other known phages (<5%, BLAST) and as such constitute a novel genus, with
377 a delimitation to be determined by future physical characterisation. Skarpretter and C130_2 form a
378 monophyletic clade and both have slightly, though not significantly (P = 0.38), lower GC contents (55.6±0.2%),
379 indicating a common ancestry and host. The conserved genome organisation and clustering of skarpretter
380 with the other Podoviridae (Figure 1, 6d), suggests skarpretter also has Podoviridae morphology. Still, its closest
381 relative C130_2 is described as a Myoviridae [75], leaving the true morphology of skarpretter a conundrum
382 only resolved by TEM imaging. According to ICTV guidelines both skarpretter and C130_2 are representatives
383 of novel linages, as they are both clearly distinct from all other known phages. Skarpretter and C130_2 have
384 very low NT (1% Gegenees, 38% BLAST) and AA similarities (43%) (Figure 6b, c). Indeed, skarpretter has ≤20%
385 NT similarity (BLAST) with all other published phages. In addition, skarpretter codes for a putative hflc gene
386 possibly inhibiting proteolysis of essential phage proteins, located in the replication region (Figure 6d). This
387 gene, has to our knowledge never before been observed in phages, but occurs in almost all proteobacteria,
388 including E. coli MG1655 [76].
389 4. Conclusion
390 By screening Danish wastewater, we identified no less than 104 unique Escherichia MG1655 phages, but
391 predict the species richness to be at least in the range of 160-420, and even higher if including phages infecting
392 other Escherichia species, though it is expected to fluctuate drastically over time and both within and between
393 treatment facilities. Among the unique phages 91 represent novel phage species of at least four different
394 families Myoviridae, Siphoviridae, Podoviridae and Microviridae. The diversity of these phages is striking, they
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
14
395 vary greatly in genome size and have a very broad GC-content range - possibly prompted by former or current
396 alternate hosts. These findings add to our growing understanding of phage ecology and diversity, and through
397 classification of these many phages we come yet another step closer to a more refined taxonomic
398 understanding of phages. Furthermore, the numerous and diverse phages isolated in this study, all lytic to the
399 same single strain, serve as an excellent opportunity to learn important phage-host interactions in future
400 studies. These include, but are not limited to lysogen induced phage immunity, host-range and anti-RE
401 systems. Finally, apart from substantial contributions to known genera, we came upon several unclassified
402 lineages, some completely novel, which in our opinion constitute novel phage genera. We consider sortregn,
403 sortkaff, sortsyn and sortsne together with lumpael, vB_kpnS_IME278 and IME_EC2 to constitute a novel
404 genus within the Podoviridae. The Myoviridae flopper joins an interesting group of not yet classified phages
405 with Myoviridae morphology and Autographvirinae characteristics, just as cluster VII together with five
406 unclassified phages represent a novel unclassified genus within the Myoviridae. Lastly, we consider the
407 Siphoviridae halfdan and the unclassified bacterial virus skarpretter to be the first sequenced representatives of
408 each their novel genus. In conclusion, this study shows that uncharted territory still remains for even well-
409 studied phage-host couples.
410 Supplementary Materials: Figure S1: Microviridae, Figure S2: Hanriverviruses, Figure S3: Dhillonviruses, Table 1
411 Auxiliary metabolism genes.
412 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation AUFF
413 Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018
414 Acknowledgments: This study had not been possible without the much-appreciated contribution by the many members
415 of the Danish Water and Wastewater Association (DANVA), who kindly supplied us with time-series of wastewater
416 samples from their treatment facilities. A special thanks to the Billund and Grindsted treatment facilities of Billund Vand
417 & Energi, Lynetten, Avedøre and Damhusåen treatment facilities of BIOFOS, Kolding treatment facility of BlueKolding,
418 Esbjerg Øst, Esbjerg Vest, Ribe, Varde og Skovlund treatment facilities of DIN Forsyning, Drøsbro, Hadsten, Hammel,
419 Hinnerup and Voldum treatment facilities of Favrskov Forsyning, Haerning treatment facility of Herning Vand, Hillerød
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
15
420 treatment facility of Hillerød Forsyning,Lemvig treatment facility of Lemvig Vand og Spildevand, Bevtoft, Gram,
421 Haderslev, Halk, Jegerup, Nustrup, Over Jerstal, Skovby, Skrydstrup, Sommersted, Vojens and Årøsund treatment
422 facilities of Provas, Marselisborg, Egå and Viby treatment facilities of Aarhus vand, Hedensted, Juelsminde and Tørring
423 treatment facilities of Hedensted Spildevand, Ejby Mølle, Bogense, Hofmansgave, Hårslev, Nordvest, Nordøst, Otterup
424 and Søndersø treatment facilities of VandCenterSyd, Øst and Vest treatment facilities of Aalborg Forsyning and finally
425 Helsingør treatment facility of Forsyning Helsingør.
426 Competing interests: The authors declare no competing interests.
427 References
428 1. Weinbauer, M.G.; Rassoulzadegan, F. Are viruses driving microbial diversification and diversity? Environ.
429 Microbiol. 2003, 6, 1–11.
430 2. Weitz, J.S.; Wilhelm, S.W. Ocean viruses and their effects on microbial communities and biogeochemical cycles.
431 F1000 Biol. Rep. 2012, 4, 17.
432 3. Crummett, L.T.; Puxty, R.J.; Weihe, C.; Marston, M.F.; Martiny, J.B.H. The genomic content and context of
433 auxiliary metabolic genes in marine cyanomyoviruses. Virology 2016, 499, 219–229.
434 4. Boyd, E.F. Bacteriophage-Encoded Bacterial Virulence Factors and Phage–Pathogenicity Island Interactions. Adv.
435 Virus Res. 2012, 82, 91–118.
436 5. Fortier, L.-C.; Sekulovic, O. Importance of prophages to evolution and virulence of bacterial pathogens.
437 https://doi.org/10.4161/viru.24498 2013.
438 6. Volkova, V. V; Lu, Z.; Besser, T.; Gröhn, Y.T. Modeling the infection dynamics of bacteriophages in enteric
439 Escherichia coli: estimating the contribution of transduction to antimicrobial gene spread. Appl. Environ.
440 Microbiol. 2014, 80, 4350–62.
441 7. Bearson, B.L.; Allen, H.K.; Brunelle, B.W.; Lee, I.S.; Casjens, S.R.; Stanton, T.B. The agricultural antibiotic
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
16
442 carbadox induces phage-mediated gene transfer in Salmonella. Front. Microbiol. 2014, 5, 52.
443 8. Grose, J.H.; Casjens, S.R. Understanding the enormous diversity of bacteriophages: the tailed phages that infect
444 the bacterial family Enterobacteriaceae. Virology 2014, 468–470, 421–443.
445 9. Dykhuizen, D. Species Numbers in Bacteria. Proc. Calif. Acad. Sci. 2005, 56, 62.
446 10. Hatfull, G.F.; Pedulla, M.L.; Jacobs-Sera, D.; Cichon, P.M.; Foley, A.; Ford, M.E.; Gonda, R.M.; Houtz, J.M.;
447 Hryckowian, A.J.; Kelchner, V.A.; et al. Exploring the Mycobacteriophage Metaproteome: Phage Genomics as an
448 Educational Platform. PLoS Genet. 2006, 2, e92.
449 11. Hatfull, G.F. Mycobacteriophages: Windows into Tuberculosis. PLoS Pathog. 2014, 10, e1003953.
450 12. Hatfull, G.F.; Jacobs-Sera, D.; Lawrence, J.G.; Pope, W.H.; Russell, D.A.; Ko, C.C.; Weber, R.J.; Patel, M.C.;
451 Germane, K.L.; Edgar, R.H.; et al. Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome
452 Clustering, Gene Acquisition, and Gene Size. J. Mol. Biol. 2010, 397, 119–143.
453 13. Dedrick, R.M.; Jacobs-Sera, D.; Bustamante, C.A.G.; Garlena, R.A.; Mavrich, T.N.; Pope, W.H.; Reyes, J.C.C.;
454 Russell, D.A.; Adair, T.; Alvey, R.; et al. Prophage-mediated defence against viral attack and viral counter-
455 defence. Nat. Microbiol. 2017, 2, 16251.
456 14. Jacobs-Sera, D.; Marinelli, L.J.; Bowman, C.; Broussard, G.W.; Guerrero Bustamante, C.; Boyle, M.M.; Petrova,
457 Z.O.; Dedrick, R.M.; Pope, W.H.; Science Education Alliance Phage Hunters Advancing Genomics And
458 Evolutionary Science Sea-Phages Program, S.E.A.P.H.A.G. and E.S. (SEA-P.; et al. On the nature of
459 mycobacteriophage diversity and host preference. Virology 2012, 434, 187–201.
460 15. Hatfull, G.F. The Secret Lives of Mycobacteriophages. Adv. Virus Res. 2012, 82, 179–288.
461 16. Lefkowitz, E.J.; Dempsey, D.M.; Hendrickson, R.C.; Orton, R.J.; Siddell, S.G.; Smith, D.B. Virus taxonomy: the
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
17
462 database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic Acids Res. 2018, 46, D708–D717.
463 17. Benson, D.A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank.
464 Nucleic Acids Res. 2017, 45, D37–D42.
465 18. Lawrence, J.G.; Hatfull, G.F.; Hendrix, R.W. Imbroglios of viral taxonomy: genetic exchange and failings of
466 phenetic approaches. J. Bacteriol. 2002, 184, 4891–905.
467 19. Aiewsakun, P.; Adriaenssens, E.M.; Lavigne, R.; Kropinski, A.M.; Simmonds, P. Evaluation of the genomic
468 diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps
469 towards a unified taxonomy. J. Gen. Virol. 2018, 99, 1331–1343.
470 20. Nelson, D. Phage taxonomy: we agree to disagree. J. Bacteriol. 2004, 186, 7029–31.
471 21. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9.
472 22. Korf; Meier-Kolthoff; Adriaenssens; Kropinski; Nimtz; Rohde; van Raaij; Wittmann Still Something to Discover:
473 Novel Insights into Escherichia coli Phage Diversity and Taxonomy. Viruses 2019, 11, 454.
474 23. Jurczak-Kurek, A.; Gąsior, T.; Nejman-Faleńczyk, B.; Bloch, S.; Dydecka, A.; Topka, G.; Necel, A.; Jakubowska-
475 Deredas, M.; Narajczyk, M.; Richert, M.; et al. Biodiversity of bacteriophages: morphological and biological
476 properties of a large group of phages isolated from urban sewage. Sci. Rep. 2016, 6, 34338.
477 24. Sambrook, J.F.; Russell, D.W.; Editors Molecular cloning: A laboratory manual, third edition.; 2nd ed.; Cold Spring
478 Harbor Laboratory, 2000; ISBN 0 87969 309 6.
479 25. Kot, W.; Vogensen, F.K.; Sørensen, S.J.; Hansen, L.H. DPS – A rapid method for genome sequencing of DNA-
480 containing bacteriophages directly from a single plaque. J. Virol. Methods 2014, 196, 152–156.
481 26. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.;
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
18
482 Pham, S.; Prjibelski, A.D.; et al. SPAdes: a new genome assembly algorithm and its applications to single-cell
483 sequencing. J. Comput. Biol. 2012, 19, 455–77.
484 27. Brettin, T.; Davis, J.J.; Disz, T.; Edwards, R.A.; Gerdes, S.; Olsen, G.J.; Olson, R.; Overbeek, R.; Parrello, B.; Pusch,
485 G.D.; et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom
486 annotation pipelines and annotating batches of genomes. Sci. Rep. 2015, 5, 8365.
487 28. Besemer, J.; Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses.
488 Nucleic Acids Res. 2005, 33, W451–W454.
489 29. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990,
490 215, 403–410.
491 30. Soding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure
492 prediction. Nucleic Acids Res. 2005, 33, W244–W248.
493 31. Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.;
494 Sangrador-Vegas, A.; et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids
495 Res. 2016, 44, D279–D285.
496 32. Enrique González-Tortuero1, Thomas David Sean Sutton1, Vimalkumar Velayudhan1, 4 Andrey Nikolaevich
497 Shkoporov1, Lorraine Anne Draper1, Stephen Robert Stockdale1, 2, Reynolds 5 Paul Ross1, 3, Colin Hill1, 3, 6
498 VIGA: a sensitive, precise and automatic de novo VIral Genome Annotator. bioRxiv 2018, 277509.
499 33. Zankari, E.; Hasman, H.; Cosentino, S.; Vestergaard, M.; Rasmussen, S.; Lund, O.; Aarestrup, F.M.; Larsen, M.V.
500 Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 2012, 67, 2640–4.
501 34. Kleinheinz, K.A.; Joensen, K.G.; Larsen, M.V. Applying the ResFinder and VirulenceFinder web-services for easy
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
19
502 identification of acquired antibiotic resistance and E. coli virulence genes in bacteriophage and prophage
503 nucleotide sequences. Bacteriophage 2014, 4, e27943.
504 35. Joensen, K.G.; Scheutz, F.; Lund, O.; Hasman, H.; Kaas, R.S.; Nielsen, E.M.; Aarestrup, F.M. Real-Time Whole-
505 Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli.
506 J. Clin. Microbiol. 2014, 52, 1501–1510.
507 36. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE—a database for DNA restriction and modification:
508 enzymes, genes and genomes. Nucleic Acids Res. 2015, 43, D298–D299.
509 37. Kieft, K.; Zhou, Z.; Anantharaman, K. VIBRANT: Automated recovery, annotation and curation of microbial
510 viruses, and evaluation of virome function from genomic sequences. bioRxiv 2019, 855387.
511 38. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9.
512 39. Ågren, J.; Sundström, A.; Håfström, T.; Segerman, B. Gegenees: Fragmented Alignment of Multiple Genomes for
513 Determining Phylogenomic Distances and Genetic Signatures Unique for Specified Target Groups. PLoS One
514 2012, 7, e39107.
515 40. Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger
516 Datasets. Mol. Biol. Evol. 2016, 33, 1870–1874.
517 41. Lopes, A.; Tavares, P.; Petit, M.-A.; Guérois, R.; Zinn-Justin, S. Automated classification of tailed bacteriophages
518 according to their neck organization. BMC Genomics 2014, 15, 1027.
519 42. Mercanti, D.J.; Rousseau, G.M.; Capra, M.L.; Quiberoni, A.; Tremblay, D.M.; Labrie, S.J.; Moineau, S. Genomic
520 Diversity of Phages Infecting Probiotic Strains of Lactobacillus paracasei. Appl. Environ. Microbiol. 2016, 82, 95–
521 105.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
20
522 43. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res.
523 2004, 32, 1792–7.
524 44. Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial
525 DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–26.
526 45. Sullivan, M.J.; Petty, N.K.; Beatson, S.A. Easyfig: a genome comparison visualizer. Bioinformatics 2011, 27, 1009.
527 46. Hsieh, T.C.; Ma, K.H.; Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill
528 numbers). Methods Ecol. Evol. 2016, 7, 1451–1456.
529 47. Chao, A.; Gotelli, N.J.; Hsieh, T.C.; Sander, E.L.; Ma, K.H.; Colwell, R.K.; Ellison, A.M. Rarefaction and
530 extrapolation with Hill numbers: A framework for sampling and estimation in species diversity studies. Ecol.
531 Monogr. 2014, 84, 45–67.
532 48. RStudio Team RStudio: Integrated Development for R. 2016.
533 49. Rocha, E.P.C.; Danchin, A. Base composition bias might result from competition for metabolic resources. Trends
534 Genet. 2002, 18, 291–294.
535 50. Motlagh, A.M.; Bhattacharjee, A.S.; Coutinho, F.H.; Dutilh, B.E.; Casjens, S.R.; Goel, R.K. Insights of Phage-Host
536 Interaction in Hypersaline Ecosystem through Metagenomics Analyses. Front. Microbiol. 2017, 8, 352.
537 51. Thomas, J.A.; Orwenyo, J.; Wang, L.-X.; Black, L.W. The Odd "RB" Phage-Identification of
538 Arabinosylation as a New Epigenetic Modification of DNA in T4-Like Phage RB69. Viruses 2018, 10.
539 52. Purohit, S.; Bestwick, K.; Lasser, G.W.; Rogers, C.M.; Mathews, C.K. T4 Phage-coded Dihydrofolate Reductase.
540 1981, 256, 9121–9125.
541 53. Hutinet, G.; Kot, W.; Cui, L.; Hillebrand, R.; Balamkundu, S.; Gnanakalai, S.; Neelakandan, R.; Carstens, A.B.;
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
21
542 Lui, C.F.; Tremblay, D.; et al. 7-Deazaguanine modifications protect phage DNA from host restriction systems.
543 Nature 2019, 1–12.
544 54. Carstens, A.B.; Kot, W.; Lametsch, R.; Neve, H.; Hansen, L.H. Characterisation of a novel enterobacteria phage,
545 CAjan, isolated from rat faeces. Arch. Virol. 2016, 161, 2219–2226.
546 55. Shkoporov, A.N.; Clooney, A.G.; Sutton, T.D.S.; Ryan, F.J.; Daly, K.M.; Nolan, J.A.; McDonnell, S.A.; Khokhlova,
547 E. V.; Draper, L.A.; Forde, A.; et al. The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific.
548 Cell Host Microbe 2019, 26, 527-541.e5.
549 56. Tsonos, J.; Adriaenssens, E.M.; Klumpp, J.; Hernalsteens, J.-P.; Lavigne, R.; De Greve, H. Complete genome
550 sequence of the novel Escherichia coli phage phAPEC8. J. Virol. 2012, 86, 13117–8.
551 57. Schwarzer, D.; Buettner, F.F.R.; Browning, C.; Nazarov, S.; Rabsch, W.; Bethe, A.; Oberbeck, A.; Bowman, V.D.;
552 Stummeyer, K.; Mühlenhoff, M.; et al. A multivalent adsorption apparatus explains the broad host range of
553 phage phi92: a comprehensive genomic and structural analysis. J. Virol. 2012, 86, 10384–98.
554 58. Schwarzer, D.; Browning, C.; Stummeyer, K.; Oberbeck, A.; Mühlenhoff, M.; Gerardy-Schahn, R.; Leiman, P.G.
555 Structure and biochemical characterization of bacteriophage phi92 endosialidase. Virology 2015, 477, 133–143.
556 59. Liu, H.; Geagea, H.; Rousseau, G.M.; Labrie, S.J.; Tremblay, D.M.; Liu, X.; Moineau, S. Characterization of the
557 Escherichia coli Virulent Myophage ST32. Viruses 2018, 10.
558 60. Jamalludeen, N.; Kropinski, A.M.; Johnson, R.P.; Lingohr, E.; Harel, J.; Gyles, C.L. Complete genomic sequence
559 of bacteriophage phiEcoM-GJ1, a novel phage that has myovirus morphology and a podovirus-like RNA
560 polymerase. Appl. Environ. Microbiol. 2008, 74, 516–25.
561 61. Thompson, D.W.; Casjens, S.R.; Sharma, R.; Grose, J.H. Genomic comparison of 60 completely sequenced
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
22
562 bacteriophages that infect Erwinia and/or Pantoea bacteria. Virology 2019, 535, 59–73.
563 62. Born, Y.; Fieseler, L.; Marazzi, J.; Lurz, R.; Duffy, B.; Loessner, M.J. Novel Virulent and Broad-Host-Range
564 Erwinia amylovora Bacteriophages Reveal a High Degree of Mosaicism and a Relationship to Enterobacteriaceae
565 Phages. Appl. Environ. Microbiol. 2011, 77, 5945–5954.
566 63. Lim, J.-A.; Shin, H.; Lee, D.H.; Han, S.-W.; Lee, J.-H.; Ryu, S.; Heu, S. Complete genome sequence of the
567 Pectobacterium carotovorum subsp. carotovorum virulent bacteriophage PM1. Arch. Virol. 2014, 159, 2185–2187.
568 64. Lavigne, R.; Seto, D.; Mahadevan, P.; Ackermann, H.-W.; Kropinski, A.M. Unifying classical and molecular
569 taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 2008, 159, 406–
570 414.
571 65. Doore, S.M.; Fane, B.A. The microviridae: Diversity, assembly, and experimental evolution. Virology 2016, 491,
572 45–55.
573 66. Jun, J.W.; Kim, J.H.; Shin, S.P.; Han, J.E.; Chai, J.Y.; Park, S.C. Characterization and complete genome sequence of
574 the Shigella bacteriophage pSf-1. Res. Microbiol. 2013, 164, 979–986.
575 67. Essoh, C.; Latino, L.; Midoux, C.; Blouin, Y.; Loukou, G.; Nguetta, S.-P.A.; Lathro, S.; Cablanmian, A.; Kouassi,
576 A.K.; Vergnaud, G.; et al. Investigation of a Large Collection of Pseudomonas aeruginosa Bacteriophages
577 Collected from a Single Environmental Source in Abidjan, Côte d’Ivoire. PLoS One 2015, 10, e0130548.
578 68. Turner, D.; Wand, M.E.; Briers, Y.; Lavigne, R.; Sutton, J.M.; Reynolds, D.M. Characterisation and genome
579 sequence of the lytic Acinetobacter baumannii bacteriophage vB_AbaS_Loki. PLoS One 2017, 12, e0172303.
580 69. Ma, Y.; Li, E.; Qi, Z.; Li, H.; Wei, X.; Lin, W.; Zhao, R.; Jiang, A.; Yang, H.; Yin, Z.; et al. Isolation and molecular
581 characterisation of Achromobacter phage phiAxp-3, an N4-like bacteriophage. Sci. Rep. 2016, 6, 24776.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
23
582 70. Bryan, M.J.; Burroughs, N.J.; Spence, E.M.; Clokie, M.R.J.; Mann, N.H.; Bryan, S.J. Evidence for the intense
583 exchange of MazG in marine cyanophages by horizontal gene transfer. PLoS One 2008, 3, 1–12.
584 71. Lavigne, R.; Burkal’tseva, M. V; Robben, J.; Sykilinda, N.N.; Kurochkina, L.P.; Grymonprez, B.; Jonckx, B.;
585 Krylov, V.N.; Mesyanzhinov, V. V; Volckaert, G. The genome of bacteriophage φKMV, a T7-like virus infecting
586 Pseudomonas aeruginosa. Virology 2003, 312, 49–59.
587 72. Adriaenssens, E.M.; Ceyssens, P.-J.; Dunon, V.; Ackermann, H.-W.; Van Vaerenbergh, J.; Maes, M.; De Proft, M.;
588 Lavigne, R. Bacteriophages LIMElight and LIMEzero of Pantoea agglomerans, belonging to the "phiKMV-
589 like viruses". Appl. Environ. Microbiol. 2011, 77, 3443–50.
590 73. Ceyssens, P.-J.; Lavigne, R.; Mattheus, W.; Chibeu, A.; Hertveldt, K.; Mast, J.; Robben, J.; Volckaert, G. Genomic
591 analysis of Pseudomonas aeruginosa phages LKD16 and LKA1: establishment of the phiKMV subgroup within
592 the T7 supergroup. J. Bacteriol. 2006, 188, 6924–31.
593 74. Hua, Y.; An, X.; Pei, G.; Li, S.; Wang, W.; Xu, X.; Fan, H.; Huang, Y.; Zhang, Z.; Mi, Z.; et al. Characterization of
594 the morphology and genome of an Escherichia coli podovirus. Arch. Virol. 2014, 159, 3249–3256.
595 75. Sváb, D.; Falgenhauer, L.; Rohde, M.; Chakraborty, T.; Tóth, I. Complete genome sequence of C130_2, a novel
596 myovirus infecting pathogenic Escherichia coli and Shigella strains. Arch. Virol. 2019, 164, 321–324.
597 76. Bandyopadhyay, K.; Parua, P.K.; Datta, A.B.; Parrack, P. Escherichia coli HflK and HflC can individually inhibit
598 the HflB (FtsH)-mediated proteolysis of λCII in vitro. Arch. Biochem. Biophys. 2010, 501, 239–243.
599
600
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
24
601 SUPPLEMENTARY FIGURES AND TABLES
602 Figure S1 MicroviridHeatae plot for: Comparison[blastn_micro_190619], Alignment[analysis], Analysis[*default*]Heat plot for: Comparison[tblastx_micor_190619], Alignment[analysis], Analysis[*default*]file:///Users/au574427/Desktop/blastn_micro_190619.html file:///Users/au574427/Desktop/tblastx_micro-190619.html
Organism 1 2 3 4 5 6 7 Organism8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 Coliphage1: NC_001420_G4 G4 100 62 20 19 191: NC_001420_G421 17 21 1 1 1 100 88 62 63 62 62 62 63 42 42 42 Coliphage2: DQ079877_ID52 ID52 63 100 15 20 192: DQ079877_ID5220 17 19 2 2 2 88 100 62 62 62 63 62 63 42 41 41 Escherichia3: Escherichia_phage_SECphi17 phage SECphi17 22 17 100 74 723: Escherichia_phage_SECphi1773 76 72 1 2 2 62 62 100 89 88 90 90 86 42 42 42 Escherichia4: MG_187_C21_Esch_SECphi17 phage Lillemer 21 20 76 100 924: MG_187_C21_Esch_SECphi1793 91 88 0 0 0 62 62 89 100 95 95 94 90 41 41 41 Escherichia5: MG_179_C1_Esch_SECphi17 phage Lilleto 21 19 74 92 1005: MG_179_C1_Esch_SECphi1792 88 86 1 2 1 62 62 89 95 100 94 93 90 42 42 41 Escherichia6: MG_107_C2_Esch_SECphi17 phage Lilledu 24 20 76 93 926: 100MG_107_C2_Esch_SECphi1791 87 1 1 1 62 63 90 96 94 100 93 90 42 42 42 escherichia7: MG_044_C1_Esch_SECphi17 phage Lilleput 18 17 76 90 887: MG_044_C1_Esch_SECphi1790 100 90 1 1 1 62 62 90 94 93 93 100 92 42 42 41 Escherichia8: MG_138_C36_Esch_SECphi17 phage Lilleen 22 20 74 89 888: MG_138_C36_Esch_SECphi1789 92 100 1 1 1 64 64 88 92 92 91 94 100 43 43 42 Enterobacteria9: NC_001330_alpha3 phage alpha3 2 2 1 0 1 9: NC_001330_alpha31 1 1 100 63 66 39 39 39 39 38 39 39 39 100 86 87 Escherichia10: MG_038_C11_Ent_St-1 phage Lilleven 2 2 1 0 1 10: 1MG_038_C11_Ent_St-11 1 67 100 83 39 39 39 39 39 39 39 39 87 100 90 0.050 Enterobacteria11: Enterobacteria_phage_St-1 phage St-1 1 2 1 0 1 11: 1GQ149088_St11 1 66 84 100 39 39 39 39 38 39 39 39 87 89 100 (a) (b) (c) G4 100% ID52
SECphi17 67%
Lillemer DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)
Lilleto DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)
Lilledu DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)
Lilleput DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)
Lilleen DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)
Alpha3
Lilleven DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)
St-1
(d) 603 Figure S4 (a) Phylogenetic tree (Maximum log Likelihood: -6065.3, DNA replication protein gpA), scalebar: substitutions 604 per site (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 605 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 606 Pairwise alignment of phage genomes, the colour bars between genomes indicate percent pairwise similarity as illustrated 607 by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for phages identified in this 608 study and deposited in GenBank. 609
1 of 1 20/06/2019, 14.04 1 of 1 20/06/2019, 14.04
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
25
Heat plot610 for: Comparison[hanriver_tblastx-psf1_020419],Figure S2 Hanriverv Alig...irus file:///Users/au574427/Desktop/Coli_siphos/Hanriver/hanriver_p...
Organism 1 2 3 4 5 6 7 8 9 1: MG_010_C12_Shi_pSf-11 Damhaus 100 84 85 85 88 79 81 85 83 2: MG_086_C1_Sal_pSf-12 Herni 84 100 86 86 85 82 80 90 85 3: MG_106_C6_Shi_pSf-13 Grams 87 88 100 86 88 81 82 87 84 4: MG_142_C100_Shi_pSf-14 Vojen 85 86 84 100 84 81 82 89 85 5: MG_146_C2_Shi_pSf-15 Aaroes 88 84 85 83 100 79 80 84 81 6: MG_182_C1_Shi_pSf-16 Aalborv 85 88 85 87 86 100 83 86 84 7: MG_187_C4_Shi_pSf-17 Haarsle 85 84 83 84 84 80 100 86 84 8: MG_193_C1_Shi_pSf-18 Egaa 85 89 84 87 84 79 82 100 85 9: Shigella phage pSf-1 82 84 81 84 81 77 80 85 100 9 Shigella 611 phage pSf-1
100%
612 68%
613 Figure S2 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). 1 of 1 02/04/2019, 13.43 614 and Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 615 similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn). Genome order is the same in both 616 analyses.
617
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
26
618 Figure S3 Dhillonvirus
100%
619 70%
620 Figure S3 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent 621 pairwise similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn).
622
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
623 Table S1 Auxiliary metabolism genes
624
Gene Protein KO number Function in bacteriophages Count phages with this gene
rmlC dTDP-4-dehydrorhamnose 3,5-epimerase K01790 dtdp rhamnose biosynthesis 10 Cluster VI and VII
rmlA glucose-1-phosphate thymidylyltransferase K00973 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon
rmlB dTDP-glucose 4,6-dehydratase K01710 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon
rmlD dTDP-4-dehydrorhamnose reductase K00067 dtdp rhamnose biosynthesis 10 Cluster VI and VII
dcm DNA (cytosine-5)-methyltransferase K17398 DNA modification 5 Cluster VII
NAMPT nicotinamide phosphoribosyltransferase K03462 unknown 25 Felixounavirus, Suspvirus and Cluster VII
CysO sulfur-carrier protein]-S-L-cysteine mec K21140 tail fiber protein 20 Hanrivervirus and unclassifed Tunavirinae hydrolase
de novo synthesis of pyrimidines, Felixounavirus, Suspvirus, Krischvirus, Dhakavirus, Mosigvirus folA dihydrofolate reductase K00287 33 thymidylic acid, and certain amino acids and Tequatrovirus
aroAG 3-deoxy-7-phosphoheptulonate synthase K01626 unknown 3 Three of the four Mosigvirus
AUP1 UDP-N-acetylglucosamine diphosphorylase K23144 DNA modification 4 Mosigvirus
KdsD arabinose-5-phosphate isomerase K06041 DNA modification 4 Mosigvirus
dcm DNA (cytosine-5)-methyltransferase K00558 DNA modification 1 Pangalan
queC 7-cyano-7-deazaguanine synthase K06920 DNA modification 1 Skure
queE 7-carboxy-7-deazaguanine synthase K10026 DNA modification 1 Skure
folE GTP cyclohydrolase K01495 DNA modification 1 Skure
queD 6-carboxy-5,6,7,8-tetrahydropterin synthase K01737 DNA modification 1 Skure
625