bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Exploring the remarkable diversity of

2 Phages in the Danish Wastewater Environment, Including

3 91 Novel Phage Species

4 Nikoline S. Olsen 1, Witold Kot 1,2* Laura M. F. Junco2 and Lars H. Hansen 1,2,*

5 1 Department of Environmental Science, Aarhus University, Frederiksborgvej 399, Roskilde, Denmark;

6 [email protected]

7 2 Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871

8 Frederiksberg C, Denmark; [email protected], [email protected]

9 * Correspondence: [email protected] Phone: +45 28 75 20 53, [email protected] Phone: +45 35 33 38 77

10 11 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation 12 AUFF Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018.

13 Competing interests: The authors declare no competing interests. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2

14 Abstract: Phages drive bacterial diversity - profoundly influencing diverse microbial communities, from

15 microbiomes to the drivers of global biogeochemical cycling. The vast genomic diversity of phages is

16 gradually being uncovered as >8000 phage have now been sequenced. Aiming to broaden our

17 understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples (0.5

18 ml) and identified 136 phages of which 104 are unique phage species and 91 represent novel species,

19 including several novel lineages. These phages are estimated to represent roughly a third of the true diversity

20 of Escherichia phages in Danish wastewater. The novel phages are remarkably diverse and represent four

21 different families , , and . They group into 14 distinct clusters

22 and nine singletons without any substantial similarity to other phages in the dataset. Their genomes vary

23 drastically in length from merely 5 342 bp to 170 817 kb, with an impressive span of GC contents ranging

24 from 35.3% to 60.0%. Hence, even for a model host bacterium, in the go-to source for phages, substantial

25 diversity remains to be uncovered. These results expand and underlines the range of Escherichia phage

26 diversity and demonstrate how far we are from fully disclosing phage diversity and ecology.

27 Keywords: ; wastewater; Escherichia coli; diversity; genomics; taxonomy; coliphage

28

29 1. Introduction

30 Phages are important ecological contributors, they renew organic matter supplies in nutrient cycles and

31 drive bacterial diversity by enabling co-existence of competing bacteria by “Killing the winner” and by serving

32 as genomic reservoirs and transport units [1,2]. Phage genomes are known to entail auxiliary metabolism

33 (AMGs), toxins, virulence factors and even antibiotic resistance genes [3–7] and through lysogeny and

34 transduction they can transfer metabolic traits to their hosts and even confer immunity against homologous

35 phages [1]. Still, in spite of their ecological role, potential as antimicrobials and the fact that they carry a

36 multitude of unknown genes with great potential for biotechnological applications, phages are vastly

37 understudied. Less than 9000 phage genomes have now been published, and though the number increases

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

38 rapidly, we may have merely scratched the surface of the expected diversity [8]. It has been estimated that at

39 least a billion bacterial species exist [9], hence only phages targeting a tiny fraction of potential hosts have been

40 reported. Efforts to disclose the range and diversity of phages targeting a single host, have revealed a stunning

41 display of diversity. The most scrutinized phage host is the smegmatis, for which the Science

42 Education Alliance Phage Hunters program has isolated more than 4700 phages and fully sequenced 680

43 phages which represent 30 distinct phage clusters [10,11]. This endeavour has provided a unique insight into

44 viral and host diversity, evolution and genetics [12–15]. No other phage host has been equally targeted, but

45 Escherichia coli phages have been isolated in fairly high numbers. The International Committee on Taxonomy

46 of (ICTV) has currently recognised 158 phage species originally isolated on E. coli [16], while 2732

47 Escherichia phage genomes have been deposited in GenBank [17]. As phages are expected to have an

48 evolutionary potential to migrate across microbial populations, host species may not be an ideal indicator of

49 relatedness, but it serves as an excellent starting point to explore phage diversity. Hierarchical classification

50 of phages is complicated by the high degree of horizontal transfer, consequently several classification

51 systems have been proposed [18–20], and we may not yet have reached a point where it is reasonable to

52 establish the criteria for a universal system [20]. Nonetheless, a system enabling a mutual understanding and

53 exchange of knowledge is needed. Accordingly, we have in this study chosen to classify our novel phages

54 according to the ICTV guidelines [21].

55 Here we aim to expand our understanding of Escherichia phage diversity. Earlier studies are few, have

56 relied on more laborious and time-consuming methods, or are based on in silico analyses of already published

57 phage genomes [8]. Accordingly, Korf et al., (2019) isolated 50 phages on 29 individual E. coli strains, while

58 Jurczak-Kurek et al., (2016) isolated 60 phages on a single E. coli strain, both finding a broad diversity of

59 and also representatives of novel phage lineages [22,23]. We hypothesized, that a high-throughput

60 screening of nearly 200 samples in time-series, using a single host, would expand the number of known phages

61 by facilitating the interception of the prevailing phage(s) of the given day.

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

62 2. Materials and Methods

63 The screening for Escherichia phages was performed with the microplate based High-throughput screening

64 method as described in (not published). With the exception that lysates of wells giving rise to plaques were

65 sequenced without further purification. In short, an overnight enrichment was performed in microplates with

66 host culture, media and wastewater (0.5 ml per well), followed by filtration (0.45 µm), a purification step by

67 re-inoculation, a second overnight incubation and filtration (0.45µm), and then a spot-test (soft-agar overlay)

68 to indicate positive wells. All procedures were performed under sterile conditions.

69 2.1.1 Samples Bacteria and media

70 Inlet Wastewater samples (188) were collected (40-50 ml) in time-series (2-4 days spanning 1-3 weeks),

71 from 48 Danish wastewater treatment facilities (rural and urban) geographically distributed on Zealand,

72 Funen and in Jutland, during July and August 2017. The samples were centrifuged (9000 x g, 4 °C, 10 min) and

73 the supernatant filtered (0.45 µm) before storage in aliquots (-20°C) until screening. The host bacterium is E.

74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

75 50 mM). Collected phages were stored in SM-buffer [24] at 4°C.

76 2.1.2 Sequencing and genomic characterisation

77 DNA extractions, clean-up (ZR-96 Clean and Concentrator kit, Zymo Research, Irvine, CA USA) and

78 sequencing libraries (Nextera® XT DNA kit, Illumina, San Diego, CA USA) were performed according to

79 manufacturer’s protocol with minor modifications as described in Kot et al., (2014) [25]. The libraries were

80 sequenced as paired-end reads on an Illumina NextSeq platform with the Mid Output Kit v2 (300 cycles). The

81 obtained reads were trimmed and assembled in CLC Genomics Workbench version 10.1.1. (CLC BIO, Aarhus,

82 Denmark). Overlapping reads were merged with the following settings: mismatch cost: 2, minimum score: 15,

83 gap cost: 3 and maximum unaligned end mismatches: 0, and then assembled de novo. Additional control

84 assemblies were constructed using SPAdes version 3.12.0 [26]. Phage genomes were defined as contigs with

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

85 an average coverage > x 20 and a sequence length ≥ 90% of closest relative. Gene prediction and annotation

86 were performed using a customized RASTtk version 2.0 [27] workflow with GeneMark [28], with manual

87 curation and verification using BLASTP [29], HHpred [30] and Pfam version 32.0 [31], or de novo annotated

88 using VIGA version 0.11.0 [32] based on DIAMOND searches (RefSeq Viral protein database) and HMMer

89 searches (pVOG HMM database). All genomes were assessed for antibiotic resistance genes, bacterial

90 virulence genes, type I, II, III and IV restriction modification (RM) genes and auxiliary metabolism genes

91 (AMGs) using ResFinder 3.1 [33,34], VirulenceFinder 2.0 [35], Restriction-ModificationFinder 1.1 (REBASE)

92 [36] and VIBRANT version 1.0.1 [37], respectively. The 104 unique phage genomes were aligned to viromes of

93 BioProject PRJNA545408 in CLC Genomics Workbench and deposited in Genbank [17].

94 2.1.3 Genetic analyses

95 Nucleotide (NT) and amino acid (AA) similarities were calculated using tools recommended by the ICTV

96 [38], i.e. BLAST [29] for identification of closest relative (BLASTn when possible, discontinuous megaBLAST

97 (word size 16) for larger genomes) and Gegenees version 2.2.1 [39] for assessing phylogenetic distances of

98 multiple genomes, for both NTs (BLASTn algorithm) and AAs (tBLASTx algorithm) a fragment size of 200 bp

99 and step size 100 bp was applied. NT similarity was determined as percentage query cover multiplied by

100 percentage NT identity. Novel phages are categorised according to ICTV taxonomy. The criterion of 95% DNA

101 sequence similarity for demarcation of species was applied to identify novel species representatives and to

102 determine uniqueness within the dataset. Evolutionary analyses for phylogenomic trees were conducted in

103 MEGA7 version 2.1 (default settings) [40]. These were based on the large terminase subunit (Caudovirales), a

104 gene commonly applied for phylogenetic analysis [41,42] and on the DNA replication gene (gpA)

105 (Microviridae). The NT sequences were aligned by MUSCLE [43] and the evolutionary history inferred by the

106 Maximum Likelihood method based on the Tamura-Nei model [44]. The trees with the highest log likelihood are

107 shown. Pairwise whole comparisons were performed with Easyfig 2.2.2 [45] (BLASTn algorithm), these

108 were curated by adding color-codes and identifiers in Inkscape version 0.92.2. The R package iNEXT [46,47] in

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

109 R studio version 1.1.456 [48] was used for rarefaction, species diversity (q = 0, datatype: incidence_raw),

110 extrapolation hereof (estimadeD) and estimation of sample coverage. The visualisation of genome sizes and

111 GC contents was prepared in Excel version 16.31.

112 3. Results and Discussion

113 3.2. Phylogenetics, taxonomy and species richness

114 The sequenced phages were analysed strictly in silico, focusing on their relatedness to known phages,

115 their taxonomy and any distinctive characteristics. Based on the confirmed morphology of closely related

116 phages, at least four different families are represented, Myoviridae (58%), Siphoviridae (25%), Podoviridae (7.4%)

117 and even the single stranded DNA (ssDNA) Microviridae (5.9%) (Table 1). Five (3.7%) of the phages are so

118 distinct that they could not with certainty be assigned to any family. A similar distribution was found by Korf

119 et al., (2019), i.e. 70% Myoviridae, 22% Siphoviridae and 8% Podoviridae, just as an analysis of the genomes of

120 Caudovirales infecting Enterobacteriaceae, by Grose & Casjens (2014), also identified more clusters belonging to

121 the Myoviridae, than the Siphoviridae and fewest of the Podoviridae [8,22]. Jurczak-Kurek et al., (2016) found more

122 siphoviruses than myovirus, but also found the Podoviridae to be the least abundant [23]. Nonetheless, these

123 distributions are just as likely to be caused by culture or isolation methods, as they are to reflect true

124 abundances. Based on DNA homology and phylogenetic analysis, the 104 unique phages identified in this

125 study group into 14 distinct clusters and nine singletons, having a <1% Gegenees score between clusters and

126 singletons, with the exceptions of cluster IV, V and Jahat which have inter-Gegenees scores of up to 20%, cluster

127 IX, X and XI which have inter-Gegenees scores of 1-8% and cluster XIII, Sortsne and Sortkaff which have inter-

128 Gegenees scores of 6-14% (Figure 1, Table 2). Excluding one phage in cluster I and two phages in cluster XIV,

129 the intra-Gegenees score of all clusters is above 39% (Figure 1).

130

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

131 Heat plot for: Comparison[ALL_090419], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/All_coliphages/All_110419.html

Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 1: MG_194_C2_Esch_VpaE1 100 82 79 78 82 82 81 80 82 80 83 80 77 74 78 79 76 79 83 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2: MG_196_C1_Esch_AYO145A 76 100 75 77 83 80 77 78 78 78 78 76 78 73 75 78 77 76 80 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3: MG_191_C1_Esch_AYO145A 75 76 100 77 78 77 84 75 80 78 77 83 76 72 77 75 79 77 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4: MG_189_C2_Esch_Alf5 73 77 76 100 78 76 76 75 77 77 77 77 78 71 75 76 76 75 77 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 5: MG_160_C2_Esch_AYO145A 78 83 77 79 100 83 80 84 79 82 81 81 80 73 76 83 79 78 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6: MG_106_C1_Esch_VpaE1 80 84 79 79 85 100 81 83 83 82 82 81 81 76 79 82 81 82 84 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7: MG_079_C1_Esch_VpaE1 76 78 84 76 80 78 100 80 80 79 80 81 78 75 78 79 79 78 78 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8: MG_053_C1_Esch_Alf5 77 80 76 77 85 81 81 100 78 79 83 82 77 73 76 84 77 76 82 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9: MG_198_C3_Esch_VpaE1 77 79 80 78 79 80 80 78 100 79 81 80 78 73 79 80 80 80 81 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10: MG_049_C2_Esch_AYO145A 76 80 78 79 83 80 80 78 80 100 78 79 77 74 77 79 79 76 80 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11: MG_087_C3_Esch_Alf5 79 79 78 78 82 80 81 82 82 78 100 83 79 73 78 84 79 79 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12: MG_156_C2_Esch_O157 76 78 84 79 82 78 82 81 81 79 82 100 78 75 78 82 82 79 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13: MG_172_C1_Esch_Alf5 74 80 77 79 81 80 79 77 78 77 78 79 100 75 81 78 79 79 79 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14: MG_047_C1_Sal_VSe11 74 78 76 75 78 77 80 76 78 77 77 79 79 100 80 76 80 76 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15: MG_131_C5_Esch_AYO145A 74 77 78 76 76 77 78 75 80 77 78 78 80 76 100 76 80 76 78 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16: MG_137_C3_Esch_Alf5 77 81 77 79 85 81 81 86 82 80 85 84 79 74 78 100 84 78 84 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17: MG_081_C1_Sal_Mushroom 74 79 80 79 81 81 81 79 82 80 80 83 80 77 81 83 100 80 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18: MG_084_C1_Esch_VpaE1 76 78 78 77 79 81 80 76 81 76 79 80 80 73 77 77 80 100 79 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 II 19: MG_008_C1_Sal_Mushroom 78 81 76 77 83 81 78 81 81 79 82 81 78 73 78 81 79 78 100 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20: MG_017_C2_Esch_SUSP2 21 20 20 19 20 20 20 19 19 21 20 21 19 19 20 20 20 20 21 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21: MG_179_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 92 88 92 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22: MG_187_C21_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 91 93 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23: MG_044_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 90 100 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24: MG_107_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 93 91 100 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25: MG_138_C36_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 89 92 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26: MG_185_C4_Esch_Murica 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 90 84 84 88 83 86 90 91 87 90 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27: MG_197_C2_Esch_slur12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 84 85 87 85 86 88 90 86 89 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 III 28: MG_132_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 83 100 83 86 85 87 85 85 88 85 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29: MG_086_C4_Esch_0157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 83 82 100 82 80 81 79 80 84 80 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30: MG_097_C2_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 85 85 83 100 86 87 86 87 86 88 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31: MG_120_C1_esch_2JES-2013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 85 86 82 87 100 87 85 86 89 85 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32: MG_130_C1_Ent_FV3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86 85 87 82 86 86 100 89 88 88 87 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33: MG_150_C5_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 87 85 80 87 84 89 100 91 86 91 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 34: MG_160_C1_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 89 85 81 87 85 88 91 100 88 92 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35: MG_183_C17_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87 85 87 84 86 87 87 86 87 100 87 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36: MG_003_R1_C1_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 88 85 82 89 85 88 91 92 88 100 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37: MG_039_C5_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 88 84 81 88 85 88 90 92 87 93 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38: MG_184_C35_Esch_ST32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39: MG_099_C3_Esch_SLUR25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IV 40: MG_001_C1_Sal_wksl3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 41: MG_106_C6_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 71 80 79 71 77 75 76 9 8 10 18 8 10 10 7 9 8 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42: MG_182_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 100 76 76 74 80 79 78 10 9 10 18 9 10 10 11 11 11 10 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 43: MG_146_C2_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77 68 100 82 70 73 72 74 8 7 8 16 7 9 8 7 9 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 44: MG_010_C12_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 76 69 83 100 73 74 76 76 6 7 7 15 7 8 9 8 9 9 9 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45: MG_187_C4_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 77 100 73 75 76 11 10 10 18 9 11 11 10 11 9 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46: MG_086_C1_Sal_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75 74 74 74 70 100 78 84 7 8 9 17 8 9 9 7 10 9 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 47: MG_142_C100_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 73 74 77 72 78 100 80 9 8 9 17 8 9 9 8 10 8 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48: MG_193_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 75 72 83 79 100 9 10 10 17 9 10 10 7 10 9 12 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 49: MG_145_C1_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 8 6 10 7 9 9 100 20 20 19 17 19 20 14 16 14 14 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 50: MG_081_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 10 8 8 9 19 100 85 79 84 87 88 76 76 77 78 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51: MG_164_C9_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 8 7 10 9 9 10 20 85 100 88 89 89 90 74 76 75 76 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52: MG_187_C12_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 16 15 17 17 17 18 18 78 87 100 80 84 82 67 69 69 70 82 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53: MG_126_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 9 8 8 9 17 86 90 82 100 91 93 76 77 76 78 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54: MG_049_C5_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 9 8 10 9 9 10 19 87 89 84 89 100 91 73 76 76 77 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55: MG_190_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 10 8 9 11 9 9 11 20 89 90 83 91 91 100 74 78 78 78 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56: MG_022_C3_Esch_phAPEEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 10 7 8 10 7 8 7 14 77 74 68 75 73 74 100 84 79 83 74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57: MG_086_C6_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 10 9 9 11 10 9 11 15 76 76 70 75 76 77 84 100 80 83 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 VI 58: MG_135_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 9 9 9 9 14 76 74 69 73 75 76 78 79 100 81 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59: MG_197_C3_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 10 9 9 12 15 78 76 70 76 76 78 82 82 81 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 60: MG_192_C21_Esch_Swan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 8 7 7 9 8 8 9 20 88 90 83 91 92 91 74 77 77 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61: MG_022_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 84 82 85 78 4 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62: MG_091_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 100 86 77 81 5 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 VII 63: MG_136_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 85 100 80 82 5 4 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64: MG_187_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 80 84 100 80 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65: MG_145_C3_Esch_ESCO13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 80 82 77 100 4 3 4 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 66: MG_170_C16_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 100 92 91 91 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 VIII 67: MG_188_C1_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 4 3 3 91 100 90 91 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68: MG_146_C3_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 5 3 4 90 91 100 88 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IX 69: MG_048_C2_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 90 91 88 100 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 70: MG_089_C1_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 91 93 90 91 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 71: MG_199_C3_Esch_ECD7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X 72: MG_148_C1_Ent_RB49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73: MG_002_C1_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 93 2 2 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74: MG_193_C3_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 100 2 3 2 3 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75: MG_002_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 100 88 88 89 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XI 76: MG_109_C1_Esch_APCEc01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 85 100 87 87 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77: MG_107_C1_Esch_PhAPEC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 85 87 100 87 8 7 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78: MG_135_C2_Ent_RB69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 86 87 86 100 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79: MG_110_C1_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 9 8 8 100 87 84 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80: MG_116_C2_Esch_slur13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 87 100 83 86 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81: MG_036_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 85 100 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XII 82: MG_044_C3_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 86 85 100 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83: MG_178_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 86 89 85 86 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84: MG_038_C11_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85: MG_099_C1_Esch_Gluttony 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 72 49 66 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86: MG_180_C4_Esch_Pride 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 71 100 48 66 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87: MG_124_C1_Esch_Sloth 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48 48 100 46 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XIII 88: MG_043_C4_Esch_SECphi18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65 66 47 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89: MG_046_C1_Esch_slur05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 67 49 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90: MG_144_C1_Halfdan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91: MG_038_Ent_St-1C6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 39 0 13 6 0 0 0 0 0 0 0 0 0 92: MG_180_C3_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 100 0 12 7 0 0 0 0 0 0 0 0 0 93: MG_053_C26_Skarpretter 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 94: MG_165_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 11 0 100 14 0 0 0 0 0 0 0 0 0 XIV 95: MG_162_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 0 13 100 0 0 0 0 0 0 0 0 0 96: MG_126_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 22 13 13 15 15 14 15 17 97: MG_039_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 100 27 28 25 25 23 23 25 98: MG_130_C3_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 26 100 90 47 48 65 67 47 99: MG_144_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 28 93 100 48 48 69 68 47 100: MG_200_C27_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 46 46 100 85 41 42 88 101: MG_116_C10_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 48 47 86 100 43 43 84 102: MG_182_C43_Ent_J8+65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 23 66 67 43 43 100 93 44 103: MG_140_C1_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 23 67 66 43 44 93 100 44 104: MG_168_C20_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 24 46 46 88 83 43 44 100

132 Figure 1 Phylogenetic tree (Maximum log Likelihood: -1325.01, large terminase subunit (Caudovirales) or DNA replication protein gpA (Microviridae)), scalebar: substitutions 133 per site, morphology is indicated by symbols (Myoviridae: ☐, Siphoviridae: ▽, Podoviridae: ○, Microviridae: ◇) and phylogenomic nucleotide distances (Gegenees, BLASTn: 134 fragment size: 200, step size: 100, threshold: 0%)

1 of 1 11/04/2019, 11.49 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2

135 Table 1 List of 104 unique Escherichia phages identified in 94 Danish wastewater samples. Novelty (*) is based on the 95% demarcation of species. Similarity is sequence 136 identity (%) times sequence coverage (%) to closest related (Blastn). Taxonomy is based on similarity (BLASTn) to closest related and the ICTV Master Species list.

Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession

Phage (bp) (n) (n) (%) (%) number

Tootiki 88257 128 22 39 * 3D Avedøre I Felixounavirus 90,2 MN850647

Mio 83431 121 18 39,1 * 13C Hadsten I Felixounavirus 89,7 MN850631

Allfine 86963 125 20 39 * 14A Hammel I Felixounavirus 91,2 MN850633

Bumzen 87360 126 20 39,1 * 14B, 14D, 15A Hammel, Hinnerup I Felixounavirus 92,5 MN850635

Dune 88511 129 20 39 * 19C Tørring I Felixounavirus 91,5 MN850636

Warpig 86106 127 17 39 * 20A Helsingør I Felixounavirus 93 MN850637

Radambza 86702 127 19 38,9 * 20D Helsingør I Felixounavirus 91,6 MN850639

Ekra 87282 128 20 38,9 * 21C Herning I Felixounavirus 92,9 MN850644

Humlepung 85311 119 19 39,1 * 12B. 25B, 37A Drøsbro, Gram, Bogense I Felixounavirus 92,1 MN850564

Finno 87554 129 20 38,9 * 31C Skovby I Felixounavirus 89,7 MN850619

Garuso 85798 130 20 38,9 * 29A, 33A Nustrup, Sommersted I Felixounavirus 90,9 MN850566

Momo 88168 130 20 39 * 38D Hofmansgave I Felixounavirus 90,7 MN850580

Esbjerg Ø, Bevtoft, Vojens, Bogense, Heid 87590 126 20 39 * 8C, 24B, 24C, 34D, 37D, 38B I Felixounavirus 91,2 MN850577 Hofmansgave

Skuden 87263 131 20 38,9 * 41D Odense NV I Felixounavirus 91,1 MN850585

Pinkbiff 88814 129 20 39 * 46A Marselisborg I Felixounavirus 93,9 MN850603

Fjerdesal 87715 128 21 39 * 3A, 39A, 46C Avedøre, Hårslev, Marselisborg I Felixounavirus 90,6 MN850605

Andreotti 83391 117 20 39,2 * 47B Egå I Felixounavirus 91,9 MN850610

Nataliec 89137 134 20 39 * 47D, 48C Egå, Viby I Felixounavirus 90,3 MN850611

Adrianh 88226 128 19 38,9 * 42C, 48B Otterup, Viby I Felixounavirus 91,1 MN850614

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession

Phage (bp) (n) (n) (%) (%) number

Mistaenkt 86664 128 22 47,2 * 6A Kolding I Suspvirus 91,1 MN850587

Nimi 137039 213 5 43,7 * 11C Varde III Vequintavirus 93,3 MN850626

Navn 141707 224 4 43,6 * 21B Herning III Vequintavirus 91,1 MN850642

Nomine 137991 220 5 43,6 * 23A Lemvig III Vequintavirus 91,5 MN850649

Naswa 138583 222 5 43,6 * 45A Ålborg Ø III Vequintavirus 93,1 MN850595

naam 137129 215 5 43,7 31D Skovby III Vequintavirus 94,5 MN850630

Ime 137114 217 5 43,6 * 12C, 36B Drøsbo III Vequintavirus 93,1 MN850576

Magaca 135826 217 5 43,6 48A Viby III Vequintavirus 96 MN850612

Nom 136114 213 5 43,6 * 28D Jegerup III Vequintavirus 92,6 MN850646

Isim 138289 219 5 43,6 * 31B Skovby III Vequintavirus 93,8 MN850597

Nomo 137702 218 5 43,7 * 38D Hofmansgave III Vequintavirus 93,3 MN850578

Inoa 138710 220 5 43,6 * 44C Ålborg V III Vequintavirus 92 MN850593

Pangalan 136944 215 5 43,7 2A, 45D, 48D Grindsted, Egå, Viby III Vequintavirus 94,8 MN850627

Tuntematon 150473 279 11 39,1 * 7B, 7C Esbjerg V VI Myoviridae 89,6 MN850618

Anhysbys 149335 271 11 39,1 * 22C Hillerød VI Myoviridae 91,5 MN850648

Ukendt 150947 266 11 39 * 32D Skrydstrup VI Myoviridae 88,7 MN850565

Nepoznato 151514 265 10 38,9 * 19D, 35A, 48A, 48D Tørring, Årøsund, Viby VI Myoviridae 85,6 MN850571

Nieznany 144998 254 11 39,1 * 45C Ålborg Ø VI Myoviridae 88,9 MN850598

Muut 146307 243 13 37,4 * 35B Årøsund VII Myoviridae 92 MN850573

Alia 147009 246 13 37,5 * 13D Hadsten VII Myoviridae 93,1 MN850632

Outra 145482 246 13 37,4 * 22A Hillerød VII Myoviridae 93,8 MN850645

Inny 147483 247 13 37,4 * 45D Ålborg Ø VII Myoviridae 92,4 MN850601

Arall 145715 242 13 37,4 41B Odense NØ VII Myoviridae 94,6 MN850584

Kvi 163673 266 - 40,5 * 48C Viby VIII Krischvirus 94,2 MN850615

Kaaroe 163719 267 - 40,5 35D Årøsund VIII Krischvirus 94,7 MN850574

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession

Phage (bp) (n) (n) (%) (%) number

Dhabil 165644 266 3 39,5 * 1B Billund IX Dhakavirus 87,5 MN850621

Dhaeg 170817 278 3 39,4 * 47A, 47C Egå IX Dhakavirus 87,4 MN850609

Mogra 168724 263 2 37,7 * 25C Gram X Mosigvirus 91,1 MN850579

Mobillu 163063 255 2 37,7 1B Billund X Mosigvirus 94,5 MN850622

Moha 168676 267 2 37,6 26A Haderslev X Mosigvirus 94,8 MN850590

Moskry 169410 269 2 37,6 * 32C Skrydstrup X Mosigvirus 93,2 MN850651

Teqskov 165017 257 6 35,4 * 10D Skovlund XI Tequatrovirus 91,7 MN895437

Teqdroes 166833 269 10 35,4 * 12D Drøsbro XI Tequatrovirus 88,6 MN895438

Teqhad 167892 270 10 35,3 * 26B Haderslev XI Tequatrovirus 90,1 MN895434

Teqhal 168070 266 11 35,4 * 27A, 27D Halk XI Tequatrovirus 93,9 MN895435

Teqsoen 166468 268 10 35,5 * 43B Søndersø XI Tequatrovirus 91,7 MN895436

Flopper 52092 78 1 44,2 * 44D Ålborg V - Myoviridae 87 MN850594

Damhaus 51154 89 - 44,1 * 4B Damhusåen IV Hanrivervirus 85,8 MN850602

Herni 50971 89 - 44,1 * 21B, 30D Herning, Over Jerstal IV Hanrivervirus 87,6 MN850640

Grams 49530 83 - 44,1 * 25B Gram IV Hanrivervirus 87,1 MN850567

Aaroes 51662 92 - 44,1 * 35B Årøsund IV Hanrivervirus 83 MN850572

Aalborv 46660 79 - 43,9 * 44B Ålborg V IV Hanrivervirus 86,9 MN850591

Haarsle 48613 85 - 44 * 39B, 45C Hårslev, Ålborg Ø IV Hanrivervirus 87,1 MN850600

Egaa 51643 87 - 44,1 * 47A Egå IV Hanrivervirus 89,7 MN850607

Vojen 50709 86 - 44,1 * 34B Vojens IV Hanrivervirus 89,7 MN850569

Tiwna 51014 85 - 44,6 * 21B Herning V Tunavirinae 87,2 MN850643

Tonijn 51627 86 - 44,6 * 32B, 32C Skrydstrup V Tunavirinae 88,4 MN850641

Tonnikala 51277 86 - 44,8 * 48A Viby V Tunavirinae 86,4 MN850613

Atuna 50732 88 - 44,6 * 7B Esbjerg V V Tunavirinae 84,9 MN850620

Tunus 51111 87 - 44,8 * 20A Helsingør V Tunavirinae 93,7 MN850638

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession

Phage (bp) (n) (n) (%) (%) number

Orkinos 49798 81 - 44,6 * 30B, 30D Over Jerstal V Tunavirinae 91,3 MN850586

Ityhuna 50768 86 - 44,7 * 39D Hårslev V Tunavirinae 93,3 MN850582

Tonn 51012 87 - 44,5 * 8C, 45C Esbjerg Ø, Ålborg Ø V Tunavirinae 94 MN850596

Tinuso 50856 86 - 44,8 14A Hammel V Tunavirinae 97,3 MN850634

Tunzivis 50596 84 - 44,6 46B Marselisborg V Tunavirinae 94,5 MN850604

Tuinn 50505 86 - 44,7 46D Marselisborg V Tunavirinae 94,8 MN850606

Jahat 51101 87 - 45,7 * 35A Vojens - Tunavirinae 68,5 MK552105

Bob 45252 63 - 54,5 * 12C Drøsbro XII Dhillonvirus 88,6 MN850628

Mckay 44443 63 - 54,5 * 13B Hadsten XII Dhillonvirus 83,8 MN850629

Jat 44417 63 - 54,5 * 23C Lemvig XII Dhillonvirus 89,4 MN850650

Rolling 46017 64 - 54,2 * 29D Nustrup XII Dhillonvirus 80,2 MN850575

Welsh 45207 62 - 54,6 * 23A, 43D Lemvig, Søndersø XII Dhillonvirus 83,8 MN850589

Buks 40308 62 - 49,7 * 1A Billund - Jerseyvirus 91,3 MN850616

Skure 59474 92 - 44,6 * 23C Lemvig - Seuratvirus 90,4 MK672798

Halfdan 42858 57 - 53,7 * 34D Vojens - Siphoviridae 28,8 MH362766

Lilleen 5342 6 - 46,9 * 33A Sommersted II Gequatrovirus 93,8 MK629526

Lilleput 5490 6 - 47 * 12D Drøsbro II Gequatrovirus 93,4 MK629525

Lilleto 5492 6 - 46,8 * 43C, 43D, 44B Søndersø, Ålborg V II Gequatrovirus 92,7 MK629529

Lilledu 5483 6 - 47,2 * 25C Gram II Gequatrovirus 92,6 MK791318

Lillemer 5492 6 - 47,1 45C Ålborg Ø II Gequatrovirus 94,5

Lilleven 6090 9 - 44,4 * 11B Varde - Alphatrevirus 93,9 MK629527

Sortsyn 42116 61 - 59 * 11B Varde XIII Unclassified 92,3 MN850623

Sortregn 38200 53 - 59,3 43D Søndrsø XIII Unclassified 97,3 MN850588

Skarpretter 42042 63 - 55,8 * 15A Hinnerup - Unclassified 37,9 MK105855

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

Escherichia Genome ORFs tRNAs GC Novel Sample Location Cluster Taxonomy Similarity Accession

Phage (bp) (n) (n) (%) (%) number

Sortkaff 42538 61 - 59,5 * 39B Hårslev - Unclassified 89,8 MN850581

Sortsne 41912 62 - 60 * 40A Odense NV - Unclassified 67,6 MK651787

Aldrigsur 42379 55 - 55,7 * 44B Ålborg V XIV Autographvirinae 71,9 MN850592

Altidsur 42197 53 - 55,7 * 33D Sommersted XIV Autographvirinae 71,8 MN850568

Forsur 42476 56 - 55,4 * 48D Viby XIV Autographvirinae 72 MN850617

Glasur 42507 56 - 55,4 * 40D Odense NV XIV Autographvirinae 72,3 MN850583

Lidtsur 42291 56 - 54,6 * 30B Over Jerstal XIV Autographvirinae 69 MK629528

Megetsur 42132 54 - 55,8 * 31B Skovby XIV Autographvirinae 73,1 MN850608

Mellemsur 40770 50 - 55,8 * 34D Vojens XIV Autographvirinae 76,4 MN850570

Smaasur 41110 50 - 55,4 * 11C Varde XIV Autographvirinae 93,3 MN850625

Usur 41906 51 - 55,4 * 27A, 27D Halk XIV Autographvirinae 73,3 MN850624

137 138

139 The high-throughput screening method favours easily culturable plaque-forming lytic phages. Still, we identified 104 unique Escherichia phages of

140 which only 16% were ≥95% similar (BLAST) to already published phages (Table 2). Phages were identified in wastewater samples from 43 of the 48

141 investigated treatment facilities. From the majority of positive samples (58) a single phage was sequenced, however in some samples the lysate held

142 more than one phage. Twenty-five of the lysates held two phages, eight lysates held three phages and one had as many as four phages. Of the 104

143 unique phages, 91 represent novel species (Table 1, 2). Of these, 51 differed by ≥10% from published phage genomes and some have NT similarities as

144 low as 29% (Table 2).

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

145 These newly sequenced phages represent a substantial quota of divergent lytic Escherichia phages in Danish

146 wastewater, but are still far from disclosing the true diversity hereof (Figure 2). An extrapolation of species

147 richness (q = 0) predicts a total of 292 distinct species (requiring a sample size of ∼900 phages). The relatively

148 small sample-size in this study (n = 136), may subject the estimation to a large prediction bias. The sampling-

149 method also introduces a bias by selecting for abundance and burst size, thereby potentially underestimating

150 diversity. Nonetheless, the results provide an indication of the minimal diversity of lytic Escherichia (MG1655,

151 K-12) phages in Danish wastewater, estimated to be as a minimum be in the range of 160 to 420 unique phage

152 species (Figure 2b). The novelty and diversity of these wastewater phages is truly remarkable and verifies our

153 hypothesis, as well as the efficiency of the High-throughput Screening Method for exploring diverse phages of a

154 single host (not published).

400 1.00

300 0.75

200 0.50 Species diversity Sample coverage

100 0.25

0 0.00

0 250 500 750 0 250 500 750 Number of sampling units Number of sampling units

interpolated extrapolated interpolated extrapolated

(a) (b) 155 Figure 2 (a) Sample-size-based rarefaction and extrapolation curve with confidence intervals (0.95). (b) Sample 156 completeness curve with confidence intervals (0.95).

157 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2

158 Table 2 Taxonomic distribution of phages identified in 94 Danish wastewater samples, 159 based on similarity to closest related and the ICTV Master Species list. 1. ≤95% similarity 160 to other phages in the dataset. 2. ≤95% similarity to other phages in the dataset and to 161 published phage genomes. Taxonomy* total unique1 novel2 Caudovirales; Myoviridae; Ounavirinae; Felixounavirus 33 19 19 Caudovirales; Myoviridae; Ounavirinae; Suspvirus 1 1 1 Caudovirales; Myoviridae; Tevenvirinae; Krischvirus 2 2 1 Caudovirales; Myoviridae; Tevenvirinae; Dhakavirus 3 2 2 Caudovirales; Myoviridae; Tevenvirinae; Mosigvirus 4 4 2 Caudovirales; Myoviridae; Tevenvirinae; Tequatrovirus 6 5 5 Caudovirales; Myoviridae; Vequintavirinae; Vequintavirus 15 12 9 Caudovirales; Myoviridae 15 11 10 Caudovirales; Podoviridae; Autographivirinae 10 9 9 Caudovirales; Siphoviridae; Dhillonvirus 6 5 5 Caudovirales; Siphoviridae; Guernseyvirinae; Jerseyvirus 1 1 1 Caudovirales; Siphoviridae; Seuratvirus 1 1 1 Caudovirales; Siphoviridae; Tunavirinae; Hanrivervirus 10 8 8 Caudovirales; Siphoviridae; Tunavirinae 15 12 8 Caudovirales; Siphoviridae 1 1 1 Microviridae; Bullavirinae; Alphatrevirus 1 1 1 Microviridae; Bullavirinae; Gequatrovirus 7 5 4 Unclassified bacterial viruses 5 5 4 Total 136 104 91 162

163 3.2. Phage genome characteristics

164 The sequenced phage genomes range in sizes from 5 342 bp of the Microviridae lilleen to a span in the

165 Caudovirales from 38 200 bp of the unclassified sortregn to 170 817 bp of the Dhakavirus dhaeg (Figure 3, Table

166 2). GC contents also vary greatly, from only 35.3% (Tequatrovirus teqhad) and up to 60.0% (the unclassified

167 sortsne) (Figure 3,Table 2), heavily diverging from the host GC content of 50.79%. Phages often have a lower

168 GC contents than their host [49], as observed for the majority (81%) of the wastewater phages (Table 2). A

169 lower GC content of phage genomes tends to correlate with an increase in genome size [50], as is also the case

170 for the Caudovirales in this study (Table 2). Rocha & Danchin (2002) hypothesized that phages, along with

171 plasmids and insertion sequences, can be considered intracellular pathogens, and like host dependent bacterial

172 pathogens and symbionts, they experience competition for the energetically expensive and less abundant GTP

173 and CTP and as a consequence develop genomes with comparatively higher AT contents [49]. However, the

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

174 differences in AT richness reported by Rocha & Danchin (2002) for dsDNA and isometric ssDNA phages were

175 merely 4.2% and 5% [49], while the dsDNA and isometric ssDNA phages in this study deviate from their

176 isolation host by having up to 31% higher and up to 19% lower AT content (Table 2). Accordingly, the

177 considerable differences in GC/AT contents may instead primarily be a reflection of past host relations [14].

178 The sequencing of lysates and not only plaquing phages, may have contributed in enabling the capture of such

179 a broad GC-content and overall diversity.

65

60 Podoviridae Siphoviridae Myoviridae w. tRNAs Myoviridae no tRNAs Microviridae Unclassifed 55

50

GC content (%) content GC 45

40

35

30 -5 5 15 25 35 45 55 65 75 85 95 105 115 125 135 145 155 165 175 185 180 Genome size (kb) 181 Figure 3 Bubble-diagram of the 104 unique Escherichia phages displaying genome size- and GC-content distribution. Area 182 of bubbles indicates number of phages. The yellow bubble is Podoviridae phages, green ones are Siphoviridae phage clusters, 183 dark-blue ones are Myoviridae phage clusters with tRNAs, the light-blue bubble is Myovridae phages without tRNAs, the 184 red bubble is Microviridae phages and the grey one is unclassified phages. 185

186 The genome screening algorithms identified no sequences coding for homologs of known virulence or

187 antibiotic resistance genes. Though not a definitive exclusion, this interprets as a reduced risk of presence, a

188 preferable trait for application. Currently available tools for AMG screening of viromes did not

189 provide a comprehensive and exclusive assessment of the AMG pool in the dataset. The majority of genes

190 identified are not AMGs, but code for phage DNA modification pathways (Table S1). The function of some of

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

191 the suggested AMGs is unknown, these include a nicotinamide phosphoribosyltransferase (NAMPT) present

192 in cluster I (Felixounavirus and Suspvirus) and VII (unclassified Myoviridae), a 3-deoxy-7-phosphoheptulonate

193 (DAHP) synthase present in cluster X (Mosigvirus) and a complete dTDP-rhamnose biosynthesis pathway

194 present in cluster VI and VII (unclassified Myoviridae). The presence of a dTDP-rhamnose biosynthesis pathway

195 in the DNA metabolism region of phage genomes is peculiar, one possible explanation is, that these phages

196 utilize rhamnose for glycosylation of hydroxy-methylated nucleotides in the same manner as the T4 generated

197 glucosyl-hmC [51]. Dihydrofolate reductase, identified in six of the Myoviridae clusters, is involved in thymine

198 nucleotide biosynthesis, but is also a structural component in the tail baseplate of T-even phages [52]. Multiple

199 verified DNA modification genes were identified. Methyltransferases, some putative, were detected in all

200 Myoviridae of cluster III (vequintavirus), VI (unclassified), VII (unclassified), VIII (Krischvirus), in all Siphoviridae

201 of cluster IV (Hanrivervirus) and in the unclassified phages of cluster XIII. Indeed, cluster XIII and the singletons

202 skarpretter, sortsne, and sortkaff code for both DNA N-6-adenine-methyltransferases (dam) and DNA cytosine

203 methyltransferases (dcm). Finally, two novel epigenetic DNA hypermodification pathways were identified.

204 The mosigviruses of cluster X code for arabinosylation of hmC [51], while the novel seuratvirus Skure codes

205 for the recently verified complex 7-deazaguanine DNA hypermodification system [53,54].

206 3.3. Forty-eight novel Myoviridae phages species

207 The Myoviridae genomes (56 unique, 49 novel) represents the greatest span in genome sizes in this study,

208 from the unclassified flopper of 52092 bp to the dhakavirus dhaeg of 170817 bp), all, except the krischviruses,

209 code for tRNAs (Figure 3, Table 2). The Myoviridae group into nine distinct clusters and four singletons,

210 representing at least three subfamilies; Tevenvirinae, Vequintavirinae and Ounavirinae in addition to 11

211 unclassified Myoviridae (cluster VI & VII) (Figure 1, Table 2). The eleven novel Tevenvirinae distributes into four

212 distinct genera, two of the Krischvirus (cluster VIII, 164kb, 40.5% GC, no tRNAs), two of the Dhakavirus (cluster

213 IX, 166-170 kb, 39.4-39.5% GC, 3 tRNAs), two of the Mosigvirus (cluster X, 169 kb, 37.6-37.7% GC, 2 tRNAs) and

214 five of the Tequatrovirus (cluster XI, 165-168 kb, 35.3-35.5% GC, 6-11 tRNAs), while the nine novel

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

215 Vequintavirinae are all vequintaviruses (cluster III, 136-142 kb, 43.6-43.7% GC, 5 tRNAs) closely related (91.1-

216 93.8%, BLAST) to classified species. The vequintaviruses were identified in samples from 12 treatment

217 facilities. Only reads from two samples of a dataset of human gut viromes based on timeseries of faecal

218 samples from 10 healthy persons [55] mapped to the wastewater phages, and only to vequintaviruses. These

219 reads covered 43-54% and 13-26% of all the Vequintavirus genomes except pangalan and navn, respectively.

220 This suggests that the vequintaviruses are related to entero-phages. All but one of the Ounavirinae (cluster I,

221 82-89 kb, 38.9-39.2% GC, 17-22 tRNAs) are felixounaviruses (89.7-93.9%, BLAST) with an intra-Gegenees score

222 of 71-86% (Figure 1). The Felixounavirus is a relatively large genus with 17 recognized species. In this study,

223 felixounaviruses were identified in samples from 23 of the 48 facilities, indicating that they are both ubiquitous

224 in Danish wastewater, numerous and easily cultivated. The last ounavirus, mistaenkt (86.7 kb, 47.2% GC, 22

225 tRNAs) is a Suspvirus. The five cluster VI phages (144.9-151.5 kb, 39±0.1% GC, 10-11 tRNAs) belong to a

226 phylogenetic distinct clade, the recently proposed ´Phapecoctavirus´, and have substantial similarity (86-90%,

227 BLAST) with the anticipated type species Escherichia phage phAPEC8 (JX561091) [22,56]. Cluster VII have

228 significantly (p = 0.038) larger genomes (145.8-147.5 kb) with marginally, though not significantly (p = 0.786),

229 lower GC contents of 37.4-37.5%, all have 13 tRNAs (Table 2) and code for NAMPT not present in cluster VI

230 (Table S1). As a group, cluster VII are even more homogeneous than cluster VI and all are closely related (92-

231 95%, BLAST) to the same five unclassified Enterobacteriaceae phages vB_Ecom_PHB05 (MF805809),

232 vB_vPM_PD06 (MH816848), ECGD1 (KU522583), phi92 (NC_023693) and vB_vPM_PD114 (MH675927)

233 [57,58], with whom they represent a novel unclassified genera. The distinctive and novel singleton flopper

234 only shares NT similarity (36-87%, BLAST) with six other phages, the Escherichia phages ST32 (MF04458) [59]

235 and phiEcoM_GJ1 (EF460875) [60], the Erwinia phages Faunus (MH191398) [61] and vB_EamM-Y2

236 (NC_019504) [62], and the Pectobacterium phages PM1 (NC_023865) [63] and PP101 (KY087898). This group of

237 unclassified phages all have genomes of 52.1-56.6 kb with 43.6-44.9% GC and while only flopper, phiEcoM_GJ1

238 and PM1 code for a single tRNA, all of them have exclusively unidirectional coding sequences (CDSs) and

239 code for RNA polymerases, characteristics of the Autographivirinae of the Podoviridae [64]. However, the

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

240 verified morphology of ST32, phiEcoM_GJ1, PM1 and PP101, icosahedral head, neck and a contractile tail with

241 tail fibres, classifies them as myoviruses [59,60,62,63]. Based on NT similarity and genome synteny, these seven

242 phages belong to the same, not yet classified, peculiar lineage first described by Jamalludeen et al., (2008) [60].

243 3.4. Five novel Microviridae phages species

244 The singleton lilleven (6.1 kb, 44.4% GC, no tRNA) and the five (four novel) cluster II phages (5.3-5.5 kb,

245 46.9-47.2% GC, no tRNAs) are all Microviridae, a family of small ssDNA non-enveloped icosahedral phages

246 (Table 2). Lilleven is closely related to (93.9%, BLAST), have pronounced gene pattern synteny and high AA

247 similarities (89-90%), with the Alphatremicrovirus Enterobacteria phage St1 (Figure S2), and is a novel species

248 of the genus Alphatremicrovirus, subfamily Bullavirinae. The cluster II phages comprise four distinct species,

249 only differing by single nucleotide polymorphisms and in non-coding regions (Figure S2). They have high

250 intra-Gegenees score (88-92%) and share genomic organisation and extensive NT similarity (92.6-94.5-%,

251 BLAST) with the unclassified microvirus Escherichia phage SECphi17 (LT960607), primarily differing by

252 single nucleotide polymorphisms and in noncoding regions (Figure S2, Table 2). Cluster II are only related to

253 one recognised phage species Escherichia ID52 (63%, BLAST), genus Gequatrovirus, subfamily Bullavirinae,

254 with whom they predominantly differ in the region in and around the major spike protein (gpG), a distinctive

255 marker of the subfamily Bullavirinae involved in host attachment. The phylogenetic analysis separates cluster

256 II from the gequatroviruses, and both the Gegenees scores (<22%) and NT similarities (<65% BLAST) are

257 moderately low (Figure S2). Nonetheless, cluster II are still, based on pronounced genome organisation synteny

258 and a conserved AA similarity (62-64%, Gegenees) considered to be gequatroviruses (Figure S2).

259 The sequencing of the microviruses is peculiar, as library preparation with the Nextera® XT DNA kit

260 applies transposons targeting dsDNA. However, during microvirus infection the host polymerase converts

261 the viral ssDNA into an intermediate state of covalently closed dsDNA, which is then replicated in rolling

262 circle by viral replication proteins transcribed by the host RNA polymerase [65]. This intermediate state may

263 have enabled the library preparation. The presence of host DNA in the sequence results indicates an

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

7

264 insufficient initial DNase1 treatment, which can be attributed to chemical inhibition or inactivation of the

265 enzyme by adhesion to the sides of wells. Hence, it is reasonable to assume that the extracted microvirus DNA

266 was captured as free dsDNA inside host cells during ongoing infections.

267 3.5. Twenty-five novel Siphoviridae phage species

268 In spite of similar genome sizes, the large group of Siphoviridae, is the most diverse (28 unique, 24 novel)

269 in this study, with GC contents ranging from 43.9-54-6% (Figure 3, Table 2). The majority, clusters IV-V and

270 singleton jahat, are of the subfamily Tunavirinae, while the remaining are allocated into at least three divergent

271 genera, Dhillonvirus, Jerseyvirus and seuratvirus.

272 Cluster IV (46.7-51.6 kb, 43.9-44.1% GC, no tRNAs), V (49.8-51.6 kb, 44.6-44.8% GC, no tRNAs) and jahat

273 (51.1 kb, 45.7% GC, no tRNAs) have low inter-Gegenees scores (7-20%) (Figure 1), yet they form a

274 monophyletic clade and are all closely related (85-94%, BLAST) to published Tunavirinae. The Cluster IV phages

275 are notable, as they represent a significant increase to the small genus Hanrivervirus comprising only the type

276 species Shigella virus pSf-1 (51.8 kb, 44% GC, no tRNAs) isolated from the Han River in Korea [66]. The common

277 ancestry of Cluster IV and pSf-1 is evident by comparable genome sizes, GC contents, genomic organization

278 and substantial NT (86-90%, BLAST) and AA (77-85%, Gegenees) similarities (Table 2, Figure S3). During their

279 differentiation, many deletions and insertions of small hypothetical genes have occurred, most notable is a

280 unique version of a putative tail-spike protein in seven of the cluster IV hanriverviruses, indicating divergent

281 host ranges (Figure S3). All the hanriverviruses code for (putative) dam and Psf-1 is resistant towards at least

282 six restriction endonucleases [66], suggesting they employ DNA methylation as a defence strategy.

283 The five novel dhillonviruses of Cluster XII (44.4-46.0 kb, 54.2-54.6% GC, no tRNAs) have substantial NT

284 similarities with a group of unclassified dhillonviruses (80-89%, BLAST) (Table 2) and the type species

285 Escherichia virus HK578 (77-80%, BLAST). As with the hanriverviruses, and as observed by Korf et al., (2019)

286 [22], their genomes mainly differ in minor hypothetical genes and in putative tail-tip proteins, indicating

287 divergent host ranges (Figure S4). Based on NT similarity and the presence of the canonical 7-deazaguanine

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

8

288 operon the singleton skure (59.4 kb, 44.6% GC, no tRNAs) is a seuratvirus, while the singleton Buks (40.3 kb,

289 49.7% GC, no tRNAs) is assigned to the genus Jerseyvirus, subfamily Guernseyvirinae.

290 3.5.1 Escherichia phage halfdan, a novel lineage within the Siphoviridae

291 Interestingly, the singleton halfdan (42.8 kb, 53.7% GC, no tRNAs) has only miniscule similarity with

292 published phages (12-29%, BLAST). These entail two phages vB_PaeS_SCUT-S3 (MK165657) and

293 Ab26 (HG962376) [67] both Septimatreviruses, two Acinetobacter phages of the Lokivirus IMEAB3 (KF811200)

294 and type species Acinetobacter virus Loki [68], and to a lesser degree the unclassified Achromobacter phage

295 phiAxp-1 (KP313532) [69]. They have a common genomic organization, yet their intra-Gegenees score is ≤1%

296 and NT similarity is negligent in roughly a third of halfdan’s 57 CDSs (Figure 4b, d). The phylogeny and AA

297 similarities (Gegenees) also indicate a distant relation, although grouping halfdan closer with the lokiviruses

298 (40-43%) than the septimatreviruses (33-34%) (Figure 4a, c). The genome of halfdan is mosaic, resembling the

299 lokiviruses in the structural region and the septimatreviruses in the replication region (Figure 4d).

300 Remarkably, halfdan has no NT similarity with known Escherichia phages, which could be an indication of E.

301 coli not being the natural host, and that halfdan may have to transcended species barriers. However, host

302 range, morphology and other physical characteristics are beyond the scope of this study, and will be

303 determined in future lab-based studies. Notably, both halfdan, Loki and Ab26 code for a (putative) MazG,

304 although there is negligible sequence similarity (Figure 4d). Phage encoded MazG is hypothesized to be

305 involved in restoring protein synthesis in a starved cell, in order to keep the cell alive, and ensure optimal

306 replication conditions [70]. The phylogeny, whole genome alignments and low NT and AA similarities

307 suggests that halfdan is distinct from other known phages (Figure 4). Accordingly, as per the ICTV guidelines,

308 halfdan is the first phage sequenced of a novel Siphoviridae genus. Yet, halfdan is also, by its mosaicism, an

309 indicator of the genetic continuum of phages questioning the validity of taxonomic interpretations.

310

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Heat plot for: Comparison[Halfdan_and_friends_210319],Heat plot for:Alig... Comparison[hlafdan_tblastx_210319],file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_n... Alignment[... file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_2...9

Organism 1 2 Organism3 4 5 6 7 1 2 3 4 5 6 7 1: AcinetobacterAcinetobacter phage phage IME_AB3 IME AB3 1: Acinetobacter100 2 phage0 0 IME_AB30 0 0 100 62 40 26 26 26 23

2: AcinetobacterAcinetobacter phage phage vB_AbaS_Loki vB AbaS Loki 2: Acinetobacter2 100 phage0 0 vB_AbaS_Loki0 0 0 64 100 43 28 27 27 24

Escherichia3: Escherichia phage Halfdanphage halfdan 3: Escherichia0 0 phage100 halfdan1 0 0 0 40 42 100 33 33 33 27

4: PseudomonasPseudomonas phage phage vB_PaeS-SCH_Ab26 vB PaeS SCH Ab264: Pseudomonas0 0 phage0 100 vB_PaeS-SCH_Ab2683 69 0 26 27 33 100 89 77 25

5: PseudomonasPseudomonas phage phage 73 73 5: Pseudomonas0 0 phage0 84 73 100 68 0 26 27 34 89 100 76 24

6: PseudomonasPseudomonas phage phage vB_PaeS_SCUT-S3 vB paeS SCUT-S3 6: Pseudomonas0 0 phage0 69 vB_PaeS_SCUT-S369 100 0 26 27 33 77 77 100 24 7: Achromobacter phage phiAxp-1 7: Achromobacter0 0 0 phage0 phiAxp-10 0 100 23 23 26 24 24 24 100 0.050 Acinetobacter phage phiAxp (a) (b) (c) IME_AB3 100%

Loki 64% MazG Lysis Halfdan Term. Port. Tube Tape m. Coat Polym. Prim.

vB_PaeS_SCH_Ab26

73

vB_PaeS_SCUT-S3

phiAxp-1

(d)

311 Figure 4 (a) Phylogenetic tree (Maximum log Likelihood: -7678.71, large terminase subunit), scalebar: substitutions per 312 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 313 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 314 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 315 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 316 phages identified in this study and deposited in GenBank.

317 3.6. Nine novel Podoviridae phage species

318 The nine novel Podoviridae (cluster XIV, 40.7-42.5 kb, 55.4-55.7% GC, no tRNAs), all with the hallmarks of

319 the Autographvirinae i.e. unidirectionally encoded genes and RNA polymerases [64], have conserved genome

320 organisation with the type species of the Phikmvvirus, Pseudomonas virus phiKMV (42.5 kb, 62.3% GC, no 1 of 1 1 of 1 29/03/2019, 22.19 29/03/2019, 22.19

321 tRNAs) [71], but almost no NT similarity (<1%, BLAST). Besides the unclassified Autographvirinae

322 Enterobacteria phage J8-65 (NC_025445) (40.9 kb, 55.7% GC, no tRNAs), with which Cluster XIV has

323 considerable NT similarity (69-93%, BLAST) and similar GC content, Cluster XIV only share >5% nucleotide

324 similarity (40-42%, BLAST) with the Phikmvvirus Pantoea virus Limezero (43.0 kb, 55.4% GC, no tRNAs) [72]

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10

325 (Figure 5, Table 2). The Phikmvvirus consists of four species, phiKMV, Limezero, Pantoea virus Limelight (44.5 kb,

326 54% GC, no tRNAs) [72] and Pseudomonas virus LKA1 (41.6 kb, 60.9% GC, no tRNAs) [73]. Heat plot for: Comparison[podos_genuse_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_genus_29031...Heat plot for: Comparison[podos_tblastx_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_tblastx_2903...

Organism 1 2 3 4 5 6 7 8 9 10 Organism11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Escherichia1: phageMG_126_C2_Ent_J8-65 lidtsur 100 22 22 17 15 15 13 13 1:15 MG_126_C2_Ent_J8-6514 1 0 0 0 100 70 69 68 68 68 66 66 66 66 47 19 19 20 0.10 Escherichia2: MG_039_C2_Ent_J8-65 phage smaasur 22 100 84 25 25 25 27 28 2:23 MG_039_C2_Ent_J8-6523 0 0 0 0 72 100 91 72 71 71 72 72 71 71 47 19 20 20 Enterobacteria3: Enterobacteria_phage_J8-65 phage J8-65 23 85 100 27 25 25 27 28 3:24 Enterobacteria_phage_J8-6524 1 0 0 0 70 91 100 72 72 72 73 72 72 72 47 19 20 20 4: MG_168_C20_Ent_J8-65Escerichia phage glasur 17 24 26 100 88 83 46 46 4:44 MG_168_C20_Ent_J8-6543 1 0 0 0 67 69 70 100 92 89 77 75 77 77 46 19 19 20 5: EscherichiaMG_200_C27_Ent_J8-65 phage forsur 16 24 25 88 100 85 46 46 5:42 MG_200_C27_Ent_J8-6541 1 0 0 0 67 69 70 92 100 90 77 77 77 77 46 19 19 20 6:Escerichia MG_116_C10_Ent_J8-65 phage usur 16 24 25 84 86 100 48 47 6:43 MG_116_C10_Ent_J8-6543 1 0 0 0 68 70 70 90 91 100 77 77 77 77 47 19 19 20 7:Escherichia MG_130_C3_Ent_J8-65 phage megetsur 14 26 26 47 47 48 100 90 7:67 MG_130_C3_Ent_J8-6565 0 0 0 0 66 71 71 78 78 77 100 92 84 84 47 19 19 20 8:Eschericia MG_144_C2_Ent_J8-65 phage mellemsur 14 28 28 47 48 48 93 100 8:68 MG_144_C2_Ent_J8-6569 1 0 0 0 68 73 73 78 79 79 95 100 86 86 48 19 19 20 9: MG_140_C1_Ent_J8-65Escherichia phage altidsur 15 23 23 44 43 44 67 66 1009: MG_140_C1_Ent_J8-6593 0 0 0 0 65 70 71 77 77 77 84 83 100 95 46 19 19 20 10: EscherichiaMG_182_C43_Ent_J8+65 phage aldrigsur 14 23 23 44 43 43 66 67 10:93 MG_182_C43_Ent_J8+65100 1 0 0 0 65 69 70 77 77 76 84 83 95 100 47 19 19 20 Pantoea11: phage Pantoea LIMEzero LIMEzero 1 0 1 1 1 1 0 0 11:0 Pantoea0 100 LIMEzero0 0 0 45 45 45 46 46 46 46 46 46 46 100 19 19 20 Pantoea12: phage LIMElight LIMElight 0 0 0 0 0 0 0 0 12:0 LIMElight0 0 100 0 0 19 19 19 19 19 19 19 19 19 19 19 100 19 20 13:Pseudomonas phiKMV phage phiKMV 0 0 0 0 0 0 0 0 13:0 phiKMV0 0 0 100 0 19 19 19 19 19 19 19 19 19 19 19 19 100 27 14: PseudomonasLKA1 phage LKA1- 0 0 0 0 0 0 0 0 14:0 LKA10 0 0 0 100 20 20 20 20 20 20 20 20 20 20 20 20 27 100 (a) (b) (c)

Lidtsur DNA pol. RNA pol. Port. Tail Virion Virion Tailspike Term. 100%

Smaasur 64%

J8-65

Glasur

Forsur

Usur

1 of 1 29/03/2019, 18.17 1 of 1 29/03/2019, 18.23 Megetsur

Mellemsur

Altidsur

Aldrigsur

LIMEzero

LIMElight

phiKMV

LKA1

(d)

327 Figure 5 (a) Phylogenetic tree (Maximum log Likelihood: -11728.26, large terminase subunit), scalebar: substitutions per 328 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 329 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 330 Pairwise alignment of phage genomes (Easyfig, BLASTn), the colour bars between genomes indicate percent pairwise 331 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 332 phages identified in this study and deposited in GenBank.

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

11

333 Cluster XIV and J8-65 form a diverse monophyletic clade, with a substantial amount of deletions and

334 insertions between them, subdividing into three sub-clusters with intra-Gegenees scores ≤28% (Figure 5a, d).

335 Phage lidtsur is singled-out and also codes for a unique version of tailspike colanidase, smaasur resembles J8-

336 65 phage and the rest group together (Figure 5). Still, the three sub-clusters have for a large part conserved AA

337 sequences (66-72%, Gegenees) (Figure 5b, c, d). Interestingly, this also applies to Limezero, with whom they

338 have a Gegenees NT score ≤1%, but an AA similarity of 45-48%, supporting the phylogenetic grouping of

339 Limezero and cluster XIV (Figure 5a, b, c). Based on phylogeny, limited NT and low AA similarities it is evident

340 that there is a very distant relation between cluster XIV (and J8-65) and the phikmvviruses (Figure 5). Hence,

341 cluster XIV (and J8-65) are not immediately, according to the ICTV guidelines, considered phikmvviruses.

342 However, the Phikmvvirus already includes phages with minuscule DNA homology infecting divergent hosts

343 [72,73]. The genus delimitation is based on overall genome architecture with the location of a single-subunit

344 RNA polymerase gene adjacent to the structural genes, and not in the early region as in T7-like phages [16].

345 Another characteristic feature of the phikmvvirus is the presence of direct terminal repeats, though not yet

346 verified in all members [72]. Read abundance in non-coding regions in genomic ends of cluster XIV genomes

347 suggests they also have direct terminal repeats of a few hundred bp. In conclusion, we consider the cluster XIV

348 phages (and J8-65) to be of the same lineage as the phikmvviruses, but contemplate that this genus may at

349 some point be divided into at least two independent genera, as we learn more of the nature of the genes which

350 distinguish cluster XIV from the other phikmvviruses on both NT and AA level and account for more 50% of

351 their genomes.

352 3.7. Unclassified bacterial viruses represent two novel lineages

353 The four novel phages sortsyn of cluster XIII and the three singletons sortsne, sortkaff (41.9-42.5 kb, 59-

354 60% GC, no tRNAs) and skarpretter (42.0 kb, 55.8% GC, no tRNAs) all have small genomes and high GC

355 contents (Figure 3) and are suggested to represent two distinct novel lineages. They only share considerable

356 (≥6%, BLAST) NT similarity with four phages, Enterobacteria phage IME_EC2 (KF591601)(41.5 kb, 59.2% GC,

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

12

357 no tRNAs) isolated from hospital sewage [74], Salmonella phage lumpael (MK125141)(41.4 kb, 59.5% GC, no

358 tRNAs) isolated from wastewater of the same sample-set in a previous study, Klebsiella phage

359 vB_KpnS_IME279 (MF614100) (42.5 kb, 59.3% GC, no tRNAs) and Escherichia phage C130_2 (MH363708)(41.7

360 kb, 55.4%Heat GC, plot for: no Comparison[Sortsne_and_the_gang_200319], tRNAs) isolated fromHeat cheese plot Ali... for: Comparison[skarp_and_freinds_tblastx], [75]file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_.... Alignmen... file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_t...

Organism 1 2 3 4Organism5 6 7 8 1 2 3 4 5 6 7 8 Escherichia1: Sortkaff_ny phage start Sortkaff 100 1:86 Sortkaff_ny13 6 start6 6 0 0 100 91 60 50 50 50 33 32 0.050 Klebsiella2: MF614100_IME279 phage vB KpnS IME279 86 1002: MF614100_IME27914 5 5 6 0 0 91 100 60 49 50 49 33 32 3: EscherichiaEscherichia phage phage SortsneSortsne 13 3:14 Escherichia100 13 phage15 Sortsne11 0 0 61 61 100 54 55 54 33 31 4: Sortsyn_newEscherichia start phage sortsyn 7 4:5 Sortsyn_new13 100 start87 42 0 0 51 50 54 100 91 67 33 33 5: KF591601_IME_EC2Enterobacteria phage IME EC2 6 5:5 KF591601_IME_EC215 88 100 40 0 0 51 50 56 92 100 67 33 33 6: MK125141_LumpaelSalmonella phage Lumpael 7 6:7 MK125141_Lumpael12 42 40 100 0 0 51 51 55 68 68 100 33 30 7: MK105855_Skarpretter_newEscherichia phage Skarpretter start 0 7:0 MK105855_Skarpretter_new0 0 0 0 100 start1 33 33 32 33 33 32 100 43

8: MH363708_C130_2Escherichia phage C130 2 0 8:0 MH363708_C130_20 0 0 0 1 100 32 32 31 33 33 30 43 100

(a) (b) (c)

Sortkaff 100%

vB_KonS_IME279 64%

Sortsne

Sortsyn

IME_EC2

Lumpael

Holin Methylation Skarpretter hflc Helica. Tailspike Coat Portal Term. Lysis

C130_2

(d)

361 Figure 6 (a) Phylogenetic tree (Maximum log Likelihood: -8023.43, large terminase subunit), scalebar: substitutions per 362 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 363 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 364 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 365 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 366 phages identified in this study and deposited in GenBank.

367 1 of 1 1 of 1 30/03/2019, 07.37 30/03/2019, 07.38

368 Curiously, IME_EC2 is a confirmed (TEM) Podoviridae, with a short non-contractile tail [74], while C130_2

369 is a confirmed (TEM) Myoviridae with a long contractile tail [75]. Lumpael and vB_KpnS_IME279 are

370 uncharacterised, yet both are assigned as Podoviridae in GenBank [17]. Although sortsne, sortkaff, sortsyn,

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

13

371 vB_KpnS_IME279, IME_EC2 and lumpael form a monophyletic clade, the Gegenees score (5-15%) between

372 sub-clusters is surprisingly low (Figure 6a, b). Still, these six phages have comparable genome sizes (41.5-42.5

373 kb) and organisation, including similar relatively small-sized structural proteins, dam and dcm genes, equally

374 high GC contents (59.5±0.5%) and relatively high AA similarities (49-91%) (Figure 6a, c, Table 2). Hence, they

375 are clearly of the same lineage and likely to resemble IME_EC2 in having Podoviridae morphology. This group

376 is also clearly distinct from all other known phages (<5%, BLAST) and as such constitute a novel genus, with

377 a delimitation to be determined by future physical characterisation. Skarpretter and C130_2 form a

378 monophyletic clade and both have slightly, though not significantly (P = 0.38), lower GC contents (55.6±0.2%),

379 indicating a common ancestry and host. The conserved genome organisation and clustering of skarpretter

380 with the other Podoviridae (Figure 1, 6d), suggests skarpretter also has Podoviridae morphology. Still, its closest

381 relative C130_2 is described as a Myoviridae [75], leaving the true morphology of skarpretter a conundrum

382 only resolved by TEM imaging. According to ICTV guidelines both skarpretter and C130_2 are representatives

383 of novel linages, as they are both clearly distinct from all other known phages. Skarpretter and C130_2 have

384 very low NT (1% Gegenees, 38% BLAST) and AA similarities (43%) (Figure 6b, c). Indeed, skarpretter has ≤20%

385 NT similarity (BLAST) with all other published phages. In addition, skarpretter codes for a putative hflc gene

386 possibly inhibiting proteolysis of essential phage proteins, located in the replication region (Figure 6d). This

387 gene, has to our knowledge never before been observed in phages, but occurs in almost all proteobacteria,

388 including E. coli MG1655 [76].

389 4. Conclusion

390 By screening Danish wastewater, we identified no less than 104 unique Escherichia MG1655 phages, but

391 predict the species richness to be at least in the range of 160-420, and even higher if including phages infecting

392 other Escherichia species, though it is expected to fluctuate drastically over time and both within and between

393 treatment facilities. Among the unique phages 91 represent novel phage species of at least four different

394 families Myoviridae, Siphoviridae, Podoviridae and Microviridae. The diversity of these phages is striking, they

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14

395 vary greatly in genome size and have a very broad GC-content range - possibly prompted by former or current

396 alternate hosts. These findings add to our growing understanding of phage ecology and diversity, and through

397 classification of these many phages we come yet another step closer to a more refined taxonomic

398 understanding of phages. Furthermore, the numerous and diverse phages isolated in this study, all lytic to the

399 same single strain, serve as an excellent opportunity to learn important phage-host interactions in future

400 studies. These include, but are not limited to lysogen induced phage immunity, host-range and anti-RE

401 systems. Finally, apart from substantial contributions to known genera, we came upon several unclassified

402 lineages, some completely novel, which in our opinion constitute novel phage genera. We consider sortregn,

403 sortkaff, sortsyn and sortsne together with lumpael, vB_kpnS_IME278 and IME_EC2 to constitute a novel

404 genus within the Podoviridae. The Myoviridae flopper joins an interesting group of not yet classified phages

405 with Myoviridae morphology and Autographvirinae characteristics, just as cluster VII together with five

406 unclassified phages represent a novel unclassified genus within the Myoviridae. Lastly, we consider the

407 Siphoviridae halfdan and the unclassified bacterial virus skarpretter to be the first sequenced representatives of

408 each their novel genus. In conclusion, this study shows that uncharted territory still remains for even well-

409 studied phage-host couples.

410 Supplementary Materials: Figure S1: Microviridae, Figure S2: Hanriverviruses, Figure S3: Dhillonviruses, Table 1

411 Auxiliary metabolism genes.

412 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation AUFF

413 Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018

414 Acknowledgments: This study had not been possible without the much-appreciated contribution by the many members

415 of the Danish Water and Wastewater Association (DANVA), who kindly supplied us with time-series of wastewater

416 samples from their treatment facilities. A special thanks to the Billund and Grindsted treatment facilities of Billund Vand

417 & Energi, Lynetten, Avedøre and Damhusåen treatment facilities of BIOFOS, Kolding treatment facility of BlueKolding,

418 Esbjerg Øst, Esbjerg Vest, Ribe, Varde og Skovlund treatment facilities of DIN Forsyning, Drøsbro, Hadsten, Hammel,

419 Hinnerup and Voldum treatment facilities of Favrskov Forsyning, Haerning treatment facility of Herning Vand, Hillerød

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15

420 treatment facility of Hillerød Forsyning,Lemvig treatment facility of Lemvig Vand og Spildevand, Bevtoft, Gram,

421 Haderslev, Halk, Jegerup, Nustrup, Over Jerstal, Skovby, Skrydstrup, Sommersted, Vojens and Årøsund treatment

422 facilities of Provas, Marselisborg, Egå and Viby treatment facilities of Aarhus vand, Hedensted, Juelsminde and Tørring

423 treatment facilities of Hedensted Spildevand, Ejby Mølle, Bogense, Hofmansgave, Hårslev, Nordvest, Nordøst, Otterup

424 and Søndersø treatment facilities of VandCenterSyd, Øst and Vest treatment facilities of Aalborg Forsyning and finally

425 Helsingør treatment facility of Forsyning Helsingør.

426 Competing interests: The authors declare no competing interests.

427 References

428 1. Weinbauer, M.G.; Rassoulzadegan, F. Are viruses driving microbial diversification and diversity? Environ.

429 Microbiol. 2003, 6, 1–11.

430 2. Weitz, J.S.; Wilhelm, S.W. Ocean viruses and their effects on microbial communities and biogeochemical cycles.

431 F1000 Biol. Rep. 2012, 4, 17.

432 3. Crummett, L.T.; Puxty, R.J.; Weihe, C.; Marston, M.F.; Martiny, J.B.H. The genomic content and context of

433 auxiliary metabolic genes in marine cyanomyoviruses. Virology 2016, 499, 219–229.

434 4. Boyd, E.F. Bacteriophage-Encoded Bacterial Virulence Factors and Phage–Pathogenicity Island Interactions. Adv.

435 Virus Res. 2012, 82, 91–118.

436 5. Fortier, L.-C.; Sekulovic, O. Importance of prophages to evolution and virulence of bacterial pathogens.

437 https://doi.org/10.4161/viru.24498 2013.

438 6. Volkova, V. V; Lu, Z.; Besser, T.; Gröhn, Y.T. Modeling the infection dynamics of in enteric

439 Escherichia coli: estimating the contribution of transduction to antimicrobial gene spread. Appl. Environ.

440 Microbiol. 2014, 80, 4350–62.

441 7. Bearson, B.L.; Allen, H.K.; Brunelle, B.W.; Lee, I.S.; Casjens, S.R.; Stanton, T.B. The agricultural antibiotic

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

16

442 carbadox induces phage-mediated gene transfer in Salmonella. Front. Microbiol. 2014, 5, 52.

443 8. Grose, J.H.; Casjens, S.R. Understanding the enormous diversity of bacteriophages: the tailed phages that infect

444 the bacterial family Enterobacteriaceae. Virology 2014, 468–470, 421–443.

445 9. Dykhuizen, D. Species Numbers in Bacteria. Proc. Calif. Acad. Sci. 2005, 56, 62.

446 10. Hatfull, G.F.; Pedulla, M.L.; Jacobs-Sera, D.; Cichon, P.M.; Foley, A.; Ford, M.E.; Gonda, R.M.; Houtz, J.M.;

447 Hryckowian, A.J.; Kelchner, V.A.; et al. Exploring the Mycobacteriophage Metaproteome: Phage Genomics as an

448 Educational Platform. PLoS Genet. 2006, 2, e92.

449 11. Hatfull, G.F. Mycobacteriophages: Windows into Tuberculosis. PLoS Pathog. 2014, 10, e1003953.

450 12. Hatfull, G.F.; Jacobs-Sera, D.; Lawrence, J.G.; Pope, W.H.; Russell, D.A.; Ko, C.C.; Weber, R.J.; Patel, M.C.;

451 Germane, K.L.; Edgar, R.H.; et al. Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome

452 Clustering, Gene Acquisition, and Gene Size. J. Mol. Biol. 2010, 397, 119–143.

453 13. Dedrick, R.M.; Jacobs-Sera, D.; Bustamante, C.A.G.; Garlena, R.A.; Mavrich, T.N.; Pope, W.H.; Reyes, J.C.C.;

454 Russell, D.A.; Adair, T.; Alvey, R.; et al. Prophage-mediated defence against viral attack and viral counter-

455 defence. Nat. Microbiol. 2017, 2, 16251.

456 14. Jacobs-Sera, D.; Marinelli, L.J.; Bowman, C.; Broussard, G.W.; Guerrero Bustamante, C.; Boyle, M.M.; Petrova,

457 Z.O.; Dedrick, R.M.; Pope, W.H.; Science Education Alliance Phage Hunters Advancing Genomics And

458 Evolutionary Science Sea-Phages Program, S.E.A.P.H.A.G. and E.S. (SEA-P.; et al. On the nature of

459 mycobacteriophage diversity and host preference. Virology 2012, 434, 187–201.

460 15. Hatfull, G.F. The Secret Lives of Mycobacteriophages. Adv. Virus Res. 2012, 82, 179–288.

461 16. Lefkowitz, E.J.; Dempsey, D.M.; Hendrickson, R.C.; Orton, R.J.; Siddell, S.G.; Smith, D.B. Virus taxonomy: the

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

17

462 database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic Acids Res. 2018, 46, D708–D717.

463 17. Benson, D.A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank.

464 Nucleic Acids Res. 2017, 45, D37–D42.

465 18. Lawrence, J.G.; Hatfull, G.F.; Hendrix, R.W. Imbroglios of viral taxonomy: genetic exchange and failings of

466 phenetic approaches. J. Bacteriol. 2002, 184, 4891–905.

467 19. Aiewsakun, P.; Adriaenssens, E.M.; Lavigne, R.; Kropinski, A.M.; Simmonds, P. Evaluation of the genomic

468 diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps

469 towards a unified taxonomy. J. Gen. Virol. 2018, 99, 1331–1343.

470 20. Nelson, D. Phage taxonomy: we agree to disagree. J. Bacteriol. 2004, 186, 7029–31.

471 21. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9.

472 22. Korf; Meier-Kolthoff; Adriaenssens; Kropinski; Nimtz; Rohde; van Raaij; Wittmann Still Something to Discover:

473 Novel Insights into Escherichia coli Phage Diversity and Taxonomy. Viruses 2019, 11, 454.

474 23. Jurczak-Kurek, A.; Gąsior, T.; Nejman-Faleńczyk, B.; Bloch, S.; Dydecka, A.; Topka, G.; Necel, A.; Jakubowska-

475 Deredas, M.; Narajczyk, M.; Richert, M.; et al. Biodiversity of bacteriophages: morphological and biological

476 properties of a large group of phages isolated from urban sewage. Sci. Rep. 2016, 6, 34338.

477 24. Sambrook, J.F.; Russell, D.W.; Editors Molecular cloning: A laboratory manual, third edition.; 2nd ed.; Cold Spring

478 Harbor Laboratory, 2000; ISBN 0 87969 309 6.

479 25. Kot, W.; Vogensen, F.K.; Sørensen, S.J.; Hansen, L.H. DPS – A rapid method for genome sequencing of DNA-

480 containing bacteriophages directly from a single plaque. J. Virol. Methods 2014, 196, 152–156.

481 26. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.;

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18

482 Pham, S.; Prjibelski, A.D.; et al. SPAdes: a new genome assembly algorithm and its applications to single-cell

483 sequencing. J. Comput. Biol. 2012, 19, 455–77.

484 27. Brettin, T.; Davis, J.J.; Disz, T.; Edwards, R.A.; Gerdes, S.; Olsen, G.J.; Olson, R.; Overbeek, R.; Parrello, B.; Pusch,

485 G.D.; et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom

486 annotation pipelines and annotating batches of genomes. Sci. Rep. 2015, 5, 8365.

487 28. Besemer, J.; Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses.

488 Nucleic Acids Res. 2005, 33, W451–W454.

489 29. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990,

490 215, 403–410.

491 30. Soding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure

492 prediction. Nucleic Acids Res. 2005, 33, W244–W248.

493 31. Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.;

494 Sangrador-Vegas, A.; et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids

495 Res. 2016, 44, D279–D285.

496 32. Enrique González-Tortuero1, Thomas David Sean Sutton1, Vimalkumar Velayudhan1, 4 Andrey Nikolaevich

497 Shkoporov1, Lorraine Anne Draper1, Stephen Robert Stockdale1, 2, Reynolds 5 Paul Ross1, 3, Colin Hill1, 3, 6

498 VIGA: a sensitive, precise and automatic de novo VIral Genome Annotator. bioRxiv 2018, 277509.

499 33. Zankari, E.; Hasman, H.; Cosentino, S.; Vestergaard, M.; Rasmussen, S.; Lund, O.; Aarestrup, F.M.; Larsen, M.V.

500 Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 2012, 67, 2640–4.

501 34. Kleinheinz, K.A.; Joensen, K.G.; Larsen, M.V. Applying the ResFinder and VirulenceFinder web-services for easy

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

19

502 identification of acquired antibiotic resistance and E. coli virulence genes in bacteriophage and prophage

503 nucleotide sequences. Bacteriophage 2014, 4, e27943.

504 35. Joensen, K.G.; Scheutz, F.; Lund, O.; Hasman, H.; Kaas, R.S.; Nielsen, E.M.; Aarestrup, F.M. Real-Time Whole-

505 Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli.

506 J. Clin. Microbiol. 2014, 52, 1501–1510.

507 36. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE—a database for DNA restriction and modification:

508 enzymes, genes and genomes. Nucleic Acids Res. 2015, 43, D298–D299.

509 37. Kieft, K.; Zhou, Z.; Anantharaman, K. VIBRANT: Automated recovery, annotation and curation of microbial

510 viruses, and evaluation of virome function from genomic sequences. bioRxiv 2019, 855387.

511 38. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9.

512 39. Ågren, J.; Sundström, A.; Håfström, T.; Segerman, B. Gegenees: Fragmented Alignment of Multiple Genomes for

513 Determining Phylogenomic Distances and Genetic Signatures Unique for Specified Target Groups. PLoS One

514 2012, 7, e39107.

515 40. Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger

516 Datasets. Mol. Biol. Evol. 2016, 33, 1870–1874.

517 41. Lopes, A.; Tavares, P.; Petit, M.-A.; Guérois, R.; Zinn-Justin, S. Automated classification of tailed bacteriophages

518 according to their neck organization. BMC Genomics 2014, 15, 1027.

519 42. Mercanti, D.J.; Rousseau, G.M.; Capra, M.L.; Quiberoni, A.; Tremblay, D.M.; Labrie, S.J.; Moineau, S. Genomic

520 Diversity of Phages Infecting Probiotic Strains of Lactobacillus paracasei. Appl. Environ. Microbiol. 2016, 82, 95–

521 105.

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

20

522 43. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res.

523 2004, 32, 1792–7.

524 44. Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial

525 DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–26.

526 45. Sullivan, M.J.; Petty, N.K.; Beatson, S.A. Easyfig: a genome comparison visualizer. Bioinformatics 2011, 27, 1009.

527 46. Hsieh, T.C.; Ma, K.H.; Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill

528 numbers). Methods Ecol. Evol. 2016, 7, 1451–1456.

529 47. Chao, A.; Gotelli, N.J.; Hsieh, T.C.; Sander, E.L.; Ma, K.H.; Colwell, R.K.; Ellison, A.M. Rarefaction and

530 extrapolation with Hill numbers: A framework for sampling and estimation in species diversity studies. Ecol.

531 Monogr. 2014, 84, 45–67.

532 48. RStudio Team RStudio: Integrated Development for R. 2016.

533 49. Rocha, E.P.C.; Danchin, A. Base composition bias might result from competition for metabolic resources. Trends

534 Genet. 2002, 18, 291–294.

535 50. Motlagh, A.M.; Bhattacharjee, A.S.; Coutinho, F.H.; Dutilh, B.E.; Casjens, S.R.; Goel, R.K. Insights of Phage-Host

536 Interaction in Hypersaline Ecosystem through Metagenomics Analyses. Front. Microbiol. 2017, 8, 352.

537 51. Thomas, J.A.; Orwenyo, J.; Wang, L.-X.; Black, L.W. The Odd "RB" Phage-Identification of

538 Arabinosylation as a New Epigenetic Modification of DNA in T4-Like Phage RB69. Viruses 2018, 10.

539 52. Purohit, S.; Bestwick, K.; Lasser, G.W.; Rogers, C.M.; Mathews, C.K. T4 Phage-coded Dihydrofolate Reductase.

540 1981, 256, 9121–9125.

541 53. Hutinet, G.; Kot, W.; Cui, L.; Hillebrand, R.; Balamkundu, S.; Gnanakalai, S.; Neelakandan, R.; Carstens, A.B.;

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

21

542 Lui, C.F.; Tremblay, D.; et al. 7-Deazaguanine modifications protect phage DNA from host restriction systems.

543 Nature 2019, 1–12.

544 54. Carstens, A.B.; Kot, W.; Lametsch, R.; Neve, H.; Hansen, L.H. Characterisation of a novel enterobacteria phage,

545 CAjan, isolated from rat faeces. Arch. Virol. 2016, 161, 2219–2226.

546 55. Shkoporov, A.N.; Clooney, A.G.; Sutton, T.D.S.; Ryan, F.J.; Daly, K.M.; Nolan, J.A.; McDonnell, S.A.; Khokhlova,

547 E. V.; Draper, L.A.; Forde, A.; et al. The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific.

548 Cell Host Microbe 2019, 26, 527-541.e5.

549 56. Tsonos, J.; Adriaenssens, E.M.; Klumpp, J.; Hernalsteens, J.-P.; Lavigne, R.; De Greve, H. Complete genome

550 sequence of the novel Escherichia coli phage phAPEC8. J. Virol. 2012, 86, 13117–8.

551 57. Schwarzer, D.; Buettner, F.F.R.; Browning, C.; Nazarov, S.; Rabsch, W.; Bethe, A.; Oberbeck, A.; Bowman, V.D.;

552 Stummeyer, K.; Mühlenhoff, M.; et al. A multivalent adsorption apparatus explains the broad host range of

553 phage phi92: a comprehensive genomic and structural analysis. J. Virol. 2012, 86, 10384–98.

554 58. Schwarzer, D.; Browning, C.; Stummeyer, K.; Oberbeck, A.; Mühlenhoff, M.; Gerardy-Schahn, R.; Leiman, P.G.

555 Structure and biochemical characterization of bacteriophage phi92 endosialidase. Virology 2015, 477, 133–143.

556 59. Liu, H.; Geagea, H.; Rousseau, G.M.; Labrie, S.J.; Tremblay, D.M.; Liu, X.; Moineau, S. Characterization of the

557 Escherichia coli Virulent Myophage ST32. Viruses 2018, 10.

558 60. Jamalludeen, N.; Kropinski, A.M.; Johnson, R.P.; Lingohr, E.; Harel, J.; Gyles, C.L. Complete genomic sequence

559 of bacteriophage phiEcoM-GJ1, a novel phage that has myovirus morphology and a podovirus-like RNA

560 polymerase. Appl. Environ. Microbiol. 2008, 74, 516–25.

561 61. Thompson, D.W.; Casjens, S.R.; Sharma, R.; Grose, J.H. Genomic comparison of 60 completely sequenced

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

22

562 bacteriophages that infect Erwinia and/or Pantoea bacteria. Virology 2019, 535, 59–73.

563 62. Born, Y.; Fieseler, L.; Marazzi, J.; Lurz, R.; Duffy, B.; Loessner, M.J. Novel Virulent and Broad-Host-Range

564 Erwinia amylovora Bacteriophages Reveal a High Degree of Mosaicism and a Relationship to Enterobacteriaceae

565 Phages. Appl. Environ. Microbiol. 2011, 77, 5945–5954.

566 63. Lim, J.-A.; Shin, H.; Lee, D.H.; Han, S.-W.; Lee, J.-H.; Ryu, S.; Heu, S. Complete genome sequence of the

567 Pectobacterium carotovorum subsp. carotovorum virulent bacteriophage PM1. Arch. Virol. 2014, 159, 2185–2187.

568 64. Lavigne, R.; Seto, D.; Mahadevan, P.; Ackermann, H.-W.; Kropinski, A.M. Unifying classical and molecular

569 taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 2008, 159, 406–

570 414.

571 65. Doore, S.M.; Fane, B.A. The microviridae: Diversity, assembly, and experimental evolution. Virology 2016, 491,

572 45–55.

573 66. Jun, J.W.; Kim, J.H.; Shin, S.P.; Han, J.E.; Chai, J.Y.; Park, S.C. Characterization and complete genome sequence of

574 the Shigella bacteriophage pSf-1. Res. Microbiol. 2013, 164, 979–986.

575 67. Essoh, C.; Latino, L.; Midoux, C.; Blouin, Y.; Loukou, G.; Nguetta, S.-P.A.; Lathro, S.; Cablanmian, A.; Kouassi,

576 A.K.; Vergnaud, G.; et al. Investigation of a Large Collection of Pseudomonas aeruginosa Bacteriophages

577 Collected from a Single Environmental Source in Abidjan, Côte d’Ivoire. PLoS One 2015, 10, e0130548.

578 68. Turner, D.; Wand, M.E.; Briers, Y.; Lavigne, R.; Sutton, J.M.; Reynolds, D.M. Characterisation and genome

579 sequence of the lytic Acinetobacter baumannii bacteriophage vB_AbaS_Loki. PLoS One 2017, 12, e0172303.

580 69. Ma, Y.; Li, E.; Qi, Z.; Li, H.; Wei, X.; Lin, W.; Zhao, R.; Jiang, A.; Yang, H.; Yin, Z.; et al. Isolation and molecular

581 characterisation of Achromobacter phage phiAxp-3, an N4-like bacteriophage. Sci. Rep. 2016, 6, 24776.

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

23

582 70. Bryan, M.J.; Burroughs, N.J.; Spence, E.M.; Clokie, M.R.J.; Mann, N.H.; Bryan, S.J. Evidence for the intense

583 exchange of MazG in marine cyanophages by . PLoS One 2008, 3, 1–12.

584 71. Lavigne, R.; Burkal’tseva, M. V; Robben, J.; Sykilinda, N.N.; Kurochkina, L.P.; Grymonprez, B.; Jonckx, B.;

585 Krylov, V.N.; Mesyanzhinov, V. V; Volckaert, G. The genome of bacteriophage φKMV, a T7-like virus infecting

586 Pseudomonas aeruginosa. Virology 2003, 312, 49–59.

587 72. Adriaenssens, E.M.; Ceyssens, P.-J.; Dunon, V.; Ackermann, H.-W.; Van Vaerenbergh, J.; Maes, M.; De Proft, M.;

588 Lavigne, R. Bacteriophages LIMElight and LIMEzero of Pantoea agglomerans, belonging to the "phiKMV-

589 like viruses". Appl. Environ. Microbiol. 2011, 77, 3443–50.

590 73. Ceyssens, P.-J.; Lavigne, R.; Mattheus, W.; Chibeu, A.; Hertveldt, K.; Mast, J.; Robben, J.; Volckaert, G. Genomic

591 analysis of Pseudomonas aeruginosa phages LKD16 and LKA1: establishment of the phiKMV subgroup within

592 the T7 supergroup. J. Bacteriol. 2006, 188, 6924–31.

593 74. Hua, Y.; An, X.; Pei, G.; Li, S.; Wang, W.; Xu, X.; Fan, H.; Huang, Y.; Zhang, Z.; Mi, Z.; et al. Characterization of

594 the morphology and genome of an Escherichia coli podovirus. Arch. Virol. 2014, 159, 3249–3256.

595 75. Sváb, D.; Falgenhauer, L.; Rohde, M.; Chakraborty, T.; Tóth, I. Complete genome sequence of C130_2, a novel

596 myovirus infecting pathogenic Escherichia coli and Shigella strains. Arch. Virol. 2019, 164, 321–324.

597 76. Bandyopadhyay, K.; Parua, P.K.; Datta, A.B.; Parrack, P. Escherichia coli HflK and HflC can individually inhibit

598 the HflB (FtsH)-mediated proteolysis of λCII in vitro. Arch. Biochem. Biophys. 2010, 501, 239–243.

599

600

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

24

601 SUPPLEMENTARY FIGURES AND TABLES

602 Figure S1 MicroviridHeatae plot for: Comparison[blastn_micro_190619], Alignment[analysis], Analysis[*default*]Heat plot for: Comparison[tblastx_micor_190619], Alignment[analysis], Analysis[*default*]file:///Users/au574427/Desktop/blastn_micro_190619.html file:///Users/au574427/Desktop/tblastx_micro-190619.html

Organism 1 2 3 4 5 6 7 Organism8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 Coliphage1: NC_001420_G4 G4 100 62 20 19 191: NC_001420_G421 17 21 1 1 1 100 88 62 63 62 62 62 63 42 42 42 Coliphage2: DQ079877_ID52 ID52 63 100 15 20 192: DQ079877_ID5220 17 19 2 2 2 88 100 62 62 62 63 62 63 42 41 41 Escherichia3: Escherichia_phage_SECphi17 phage SECphi17 22 17 100 74 723: Escherichia_phage_SECphi1773 76 72 1 2 2 62 62 100 89 88 90 90 86 42 42 42 Escherichia4: MG_187_C21_Esch_SECphi17 phage Lillemer 21 20 76 100 924: MG_187_C21_Esch_SECphi1793 91 88 0 0 0 62 62 89 100 95 95 94 90 41 41 41 Escherichia5: MG_179_C1_Esch_SECphi17 phage Lilleto 21 19 74 92 1005: MG_179_C1_Esch_SECphi1792 88 86 1 2 1 62 62 89 95 100 94 93 90 42 42 41 Escherichia6: MG_107_C2_Esch_SECphi17 phage Lilledu 24 20 76 93 926: 100MG_107_C2_Esch_SECphi1791 87 1 1 1 62 63 90 96 94 100 93 90 42 42 42 escherichia7: MG_044_C1_Esch_SECphi17 phage Lilleput 18 17 76 90 887: MG_044_C1_Esch_SECphi1790 100 90 1 1 1 62 62 90 94 93 93 100 92 42 42 41 Escherichia8: MG_138_C36_Esch_SECphi17 phage Lilleen 22 20 74 89 888: MG_138_C36_Esch_SECphi1789 92 100 1 1 1 64 64 88 92 92 91 94 100 43 43 42 Enterobacteria9: NC_001330_alpha3 phage alpha3 2 2 1 0 1 9: NC_001330_alpha31 1 1 100 63 66 39 39 39 39 38 39 39 39 100 86 87 Escherichia10: MG_038_C11_Ent_St-1 phage Lilleven 2 2 1 0 1 10: 1MG_038_C11_Ent_St-11 1 67 100 83 39 39 39 39 39 39 39 39 87 100 90 0.050 Enterobacteria11: Enterobacteria_phage_St-1 phage St-1 1 2 1 0 1 11: 1GQ149088_St11 1 66 84 100 39 39 39 39 38 39 39 39 87 89 100 (a) (b) (c) G4 100% ID52

SECphi17 67%

Lillemer DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)

Lilleto DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)

Lilledu DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)

Lilleput DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)

Lilleen DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)

Alpha3

Lilleven DNA replication (gpA) gpC gpD Major coat (gpF) gpG Minor spike (gpH)

St-1

(d) 603 Figure S4 (a) Phylogenetic tree (Maximum log Likelihood: -6065.3, DNA replication protein gpA), scalebar: substitutions 604 per site (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 605 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 606 Pairwise alignment of phage genomes, the colour bars between genomes indicate percent pairwise similarity as illustrated 607 by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for phages identified in this 608 study and deposited in GenBank. 609

1 of 1 20/06/2019, 14.04 1 of 1 20/06/2019, 14.04

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

25

Heat plot610 for: Comparison[hanriver_tblastx-psf1_020419],Figure S2 Hanriverv Alig...irus file:///Users/au574427/Desktop/Coli_siphos/Hanriver/hanriver_p...

Organism 1 2 3 4 5 6 7 8 9 1: MG_010_C12_Shi_pSf-11 Damhaus 100 84 85 85 88 79 81 85 83 2: MG_086_C1_Sal_pSf-12 Herni 84 100 86 86 85 82 80 90 85 3: MG_106_C6_Shi_pSf-13 Grams 87 88 100 86 88 81 82 87 84 4: MG_142_C100_Shi_pSf-14 Vojen 85 86 84 100 84 81 82 89 85 5: MG_146_C2_Shi_pSf-15 Aaroes 88 84 85 83 100 79 80 84 81 6: MG_182_C1_Shi_pSf-16 Aalborv 85 88 85 87 86 100 83 86 84 7: MG_187_C4_Shi_pSf-17 Haarsle 85 84 83 84 84 80 100 86 84 8: MG_193_C1_Shi_pSf-18 Egaa 85 89 84 87 84 79 82 100 85 9: Shigella phage pSf-1 82 84 81 84 81 77 80 85 100 9 Shigella 611 phage pSf-1

100%

612 68%

613 Figure S2 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). 1 of 1 02/04/2019, 13.43 614 and Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 615 similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn). Genome order is the same in both 616 analyses.

617

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

26

618 Figure S3 Dhillonvirus

100%

619 70%

620 Figure S3 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent 621 pairwise similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn).

622

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.911818; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

623 Table S1 Auxiliary metabolism genes

624

Gene Protein KO number Function in bacteriophages Count phages with this gene

rmlC dTDP-4-dehydrorhamnose 3,5-epimerase K01790 dtdp rhamnose biosynthesis 10 Cluster VI and VII

rmlA glucose-1-phosphate thymidylyltransferase K00973 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon

rmlB dTDP-glucose 4,6-dehydratase K01710 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon

rmlD dTDP-4-dehydrorhamnose reductase K00067 dtdp rhamnose biosynthesis 10 Cluster VI and VII

dcm DNA (cytosine-5)-methyltransferase K17398 DNA modification 5 Cluster VII

NAMPT nicotinamide phosphoribosyltransferase K03462 unknown 25 Felixounavirus, Suspvirus and Cluster VII

CysO sulfur-carrier protein]-S-L-cysteine mec K21140 tail fiber protein 20 Hanrivervirus and unclassifed Tunavirinae hydrolase

de novo synthesis of pyrimidines, Felixounavirus, Suspvirus, Krischvirus, Dhakavirus, Mosigvirus folA dihydrofolate reductase K00287 33 thymidylic acid, and certain amino acids and Tequatrovirus

aroAG 3-deoxy-7-phosphoheptulonate synthase K01626 unknown 3 Three of the four Mosigvirus

AUP1 UDP-N-acetylglucosamine diphosphorylase K23144 DNA modification 4 Mosigvirus

KdsD arabinose-5-phosphate isomerase K06041 DNA modification 4 Mosigvirus

dcm DNA (cytosine-5)-methyltransferase K00558 DNA modification 1 Pangalan

queC 7-cyano-7-deazaguanine synthase K06920 DNA modification 1 Skure

queE 7-carboxy-7-deazaguanine synthase K10026 DNA modification 1 Skure

folE GTP cyclohydrolase K01495 DNA modification 1 Skure

queD 6-carboxy-5,6,7,8-tetrahydropterin synthase K01737 DNA modification 1 Skure

625