Supplementary Information

Figure S1-S6 Tables S1-5 Additional file 1: Detailed description of ETEC reference a L1_E925_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 L1 E925

E1739_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

L1 E1739

b

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 L2 E1649

L2_E1649_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

L2 E70

E70_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 L2 E66

E66_whole_genome c

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 5500000

L3 E36

L3_E36_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 ETEC strain 00-3279

d Strain_00-3279_CP024293-CP024294-CP024295-CP024296_whole_genome500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 L3 E2980

L3_E2980_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

E2264

E2264_CP023349_-CP023350-CP023351-CP023352_whole_genome e

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 L4 E1441

L4_E1441_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

L4 E1752

E1752_whole_genome f

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

L5 E1779

L5_E1779_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 L5 E1635

g E1635_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

L6 E562

L6_E562_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 ETEC strain 504239

h Strain_504239_CP025859-CP025860-CP025861_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

L7 E1373

L7_E1373_whole_genome 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 ETEC strain 2014-EL 1345-2

Strain_2014EL-1345-2_CP024223-CP024227-CP024226-CP024225-CP024224_whole_genome

Figure S1: Whole genome comparisons using progressiveMAUVE between each ETEC reference strain with one or two different ETEC strains that belong to the same lineage. a) ETEC reference strain E925 compared to another long-read sequenced ETEC strain (E1739) that belong to the same lineage (L1). b) ETEC reference strain E1649, lineage 2, compared to two additional L2 ETEC strains (E70 and E66). c) ETEC reference strain E36, lineage 3, compared to ETEC strain 00-3279 (CP024293-96). d) ETEC reference strain E2980, lineage 3, compared to E2264 (CP023349-52). e) ETEC reference strain E1441 compared to E1752, both part of lineage L4. f) ETEC reference strain E1779 compared to E1635 part of lineage 5. g) ETEC reference strain E562 from lineage 6 compared to the ETEC strain 504239 (CP025859-61). h) ETEC reference strain E1373 part of lineage 7 compared to the ETEC strain 2014-EL-1345-2 (CP024223-27).

L1_E925

L2_E1649

L3_E36

L3_E2980

L4_E1441

L5_E1779

L6_E562

L7_E1373 500 kb

Figure S2: Comparison of the chromosomes of each ETEC reference genome using progressiveMAUVE. Circularized chromosomes extracted from the reference ETEC strains of lineage 1 through 7 (two reference strains belong to L3, one is CFA/I and the other CS7). The chromosomes are to a large extent colinear. Visualized using genoplotR.

a

Plasmid features Phage features b

Figure S3: Comparison between the two identified ETEC phage-plasmids using tblastx. a) pAvM_E1649_9 is a P1-like phage here compared to Enterobacteria phage P1 (Escherichia P1; NC_005856.1) and pEC2_5 (E. coli strain EC2_5; CP041960.1). b) pAvM_E1373_29, a phage-plasmid similar to the pHCM2 (Salmonella typhi strain CT18; AL513384.1) and SSU5 (Salmo- nella phage; JQ965645.1) in Salmonella typhi. Tblastx comparison were made using BRIGS with the thresholds indicated to the right of each plasmid comparison. Annotations are from the respective ETEC phage-plasmids which were used as references for the blast.

Figure S3: Comparison between the two identified ETEC phage-plasmids using tblastx. a) pAvM_E1649_9 is a P1-like emrE_1.match emrK.match emrR.match emrY.match eptA.match evgA.match evgS.match gadW.match gadX.match kdpE.match marA.match marR.match mdfA.match mdtA.match mdtB.match mdtC.match mdtE.match mdtF.match mdtG.match mdtH.match mdtM.match mdtN.match mdtO.match mdtP.match mgrB.match mipA.match msbA.match nfsA.match porin_OmpC.match rrsB+.match sul1.match sul2.match tet_-.match tet_D_.match ugd.match APH_3____Ib.match APH_6__Id.match AcrE.match AcrF.match AcrS.match CRP.match Escherichia_coli_23S.match Escherichia_coli_EF_Tu.match Escherichia_coli_GlpT.match Escherichia_coli_LamB.match Escherichia_coli_UhpA_1.match H_NS.match PmrF.match TEM_-.match TolC.match YojI.match aadA-.match acrA_2.match acrB.match acrD.match ampC_2.match bacA.match baeR.match baeS.match cpxA.match dfrA15.match emrA.match emrB.match

6649_2#14.report.tsvE562

6732_2#16.report.tsvE1373

6753_7#5.report.tsvE36

9475_4#69.report.tsvE1441

7067_5#14.report.tsvE1649

6732_2#1.report.tsvE925

6732_2#23.report.tsvE1779

7114_1#12.report.tsvE2980

Figure S4. Presence and absence of antibiotic resistance genes. emrAB.match emrE.match emrKY.match mdtG.match mdtH.match mdtM.match mdtNOP.match msbA.match OmpC.match GlpT.match AcrE.match AcrF.match AcrS.match AcrAB.match AcrD.match TolC.match marA.match gadW.match mdtA.match mdtBC.match mdtEF.match

6649_2#14.report.tsvE562

6732_2#16.report.tsvE1373

6753_7#5.report.tsvE36

9475_4#69.report.tsvE1441

7067_5#14.report.tsvE1649

6732_2#1.report.tsvE925

6732_2#23.report.tsvE1779

7114_1#12.report.tsvE2980

Figure S5. Presence and absence of efflux systems and porins.

Figure S6: BLASTn comparison between the mosaic cointegrate pCoo (AY536429.1) and the concatenated plasmids pAvM_E925_6 and 7. The blue dashed line indicates which part of the pCoo matches with which ETEC reference plasmid. An insertion and duplication have been introduced in pAvM_E925_6 as indicated by the dashed lines. The gene cooB is missing from pCoo but is present in pAvM_E925_6.

Table S1: Number of CDSs identified in the complete genomes of the reference genomes using the Hybrid assembler (short and long-reads) compared to Velvet assembler (short-reads only). L1 E925 L2 E1649 L3 E36 L3 E2980 L4 E1441 L5 E1779 L6 E562 L7 E1373 Hybrid (raw) 5,206 5,048 5,731 5,270 5,074 5,476 5,376 5,070 Hybrid (after 4,935 5,014 5,318 4,886 4.823 5,112 5,085 4,802 manual curation) Illumina 4,906 4,781 4,955 5,047 4,865 5,179 5,097 4,988

Table S2: General characteristics of the chromosome of the ETEC reference strains Characteristic L1 E925 L2 E1649 L3 E36 L3 E2980 L4 E1441 L5 L6 E562 L7 E1373 E1779a Length (bp) 4,858,376 4,721,269 5,151,162 4,992,286 4,869,798 4,982,341 4,918,084 4,926,854 GC content 50.7 50.8 50.7 50.7 50.8 50.8 50.9 50.4 (%) Total no of 4,521 4,409 4,816 4,924 4,560 4,649 4,568 4,760 CDSs Accession numberb a Chromosome not circularized b GCA_000000000 (submitted to EMBL, accessions pending)

Table S3: Genomic antibiotic resistance profile, including plasmid and chromosomal genes as well as efflux systems and porins and metal resistance.

Strain Lineage Antibiotic resistance genes Porins and efflux pumps associated with antibiotic resistance* Metal resistance

E925 L1 ampC*, ampDE* acrRAB-tolC, acrEF-S, crp, emrRAB, emrE, mdf(A), gadW, mdtAB, mdtEF, mdtG, mdtH, mdtM, mdtNOP, msbA, ompC E1649 L2 ampC*, ampDE* acrRAB-tolC, acrEF-S, crp, emrRAB, emrE, gadW, mdf(A), mdtAB, mdtEF, mdtG, mdtH, mdtM, mdtNOP, msbA, ompC E36 L3 ampC*, ampDE*, tet(B) acrRAB-tolC, acrEF-S, crp, emrRAB, emrE, gadW, glpT, mdf(A)-like, mdtAB, mdtEF, mdtG, mdtH,mdtM, mdtNOP, msbA, ompC E2980 L3 ampC*, ampDE*, blaTEM-1b, strA (aph(3’’)-Ib- acrRAB-tolC, acrEF-S, crp, emrRAB, glpT, mdf(A), mdtAB, mdtEF , mdtG, like), strB (aph(6)-Id), sul2 mdtH, mdtM, mdtNOP, msbA, ompC

E1441 L4 aadA1-like, ampC*, ampDE*, dfrA15, sul1 (Class I acrRAB-tolC, acrEF-S, crp, emrRAB, emrE, gadW, mdf(A)-like, mdtAB, mdtEF, mer operon (merRTPCADE) (Tn21), integron), tet(A) (Tn1721) mdtG, mdtH, mdtM, mdtNOP, msbA, ompC DqacE (part of the Class I integron)

E1779 L5 ampC*, ampDE* acrRAB-tolC, crp, emrRAB, glpT, mdf(A)-like, mdtAB, mdtEF, mdtG, mdtM, mdtNOP, msbA E562 L6 ampC*, ampDE*, blaTEM-1b, tet(A) acrRAB-tolC, acrEF-S, crp, emrRAB, mdf(A)-like, mdtEF, mdtG, mdtH, mer operon (merRTPCADE) (Tn21) mdtNOP, msbA, ompC E1373 L7 ampC*, ampDE*, tet(B) acrRAB-tolC, acrEF-S, crp, emrRAB, emrE, glpT, mdf(A)-like, mdtAB, mdtEF, mdtG, mdtH, mdtM, mdtNOP, msbA * Located on the chromosome

Table S4: Phenotypic antibiotic profile of the ETEC reference strains. Strain Lineage MIC mg/L (R/I/S)# Class Penicillin Cephalosporin Tetracyclines Fluoroquinolones Macrolides Sulphonamide Misc agents

Antibiotic Amp Amc* Oxa Caz Cro Dox* Tet* Na* Nor Azm* Ery* Sxt Cm NIT E925 L1 4 (S) 4 (S) >128 0.5 (S) 0.5 (S) 2(S) 8 (I) 32 (R) 0.5 (S) 16 (S) 16 < 0.25 (S) 8 (S) 32 (S)

E1649 L2 8 (S) 8 (S) >128 0.5 (S) 2 (I) 2 (S) 64 (R) 16 (S) 2 (S) 16 (S) 64 0.25 (S) 16 (I) 32 (S) E 36 L3 16 (I) 16 (I) >128 0.5 (S) 2 (I) 64 (R) 2 (S) 16 (S) 1 (S) 16 (S) 64 S) 8 (S) 16 (S) E2980 L3 >128 (R) 16(I) >128 0.5 (S) 2 (I) 2 (S) 16 (R) < 0.25 (S) 2 (S) 64 (R) 128 0.25 (S) 16 (I) 32 (S)

E1441 L4 4 (S) 4 (S) >128 0.5 (S) 2 (I) 8 (I) 128 (R) 8 (S) 1 (S) 16 (S) 64 >128 (R) >128 (R) 16 (S)

E1779 L5 16 (I) 16 (I) >128 0.5 (S) 2 (I) 2 (S) 2 (S) 16 (S) 2 (S) 16 (S) 16 < 0.25 (S) 16 (I) 16 (S)

E562 L6 >128 (R) 16(I) >128 0.25 (S) 1 (S) 8 (I) 128 (R) 8 (S) 0.5 (S) 16 (S) 32 < 0.25 (S) 8 (S) 32 (S)

E1373 L7 8 (S) 8 (I) >128 0.5 (S) 2 (I) 64 (R) >128 (R) 8 (S) 2 (S) 16 (S) 64 < 0.25 (S) 16 (I) 32 (S)

Amp = Ampicillin, Amc = Amoxicillin-clavulanic acid, Oxa = Oxacillin, Caz = Ceftazidime, Cro = Ceftriaxone, Dox = Doxucycline, Tet = Tetracyclin, Na = Naldixic acid, Nor = Norfloxacin, Azm = Azithromycin, Ery = Erythromycin, Cm = Chloramphenicol, NIT = Nitrofurantoin Sxt = sulphamethoxazole-trimethoprim R = resistant, S = sensitive, NA = not applicable *Clinical MIC breakpoints were not determined by EUCAST (The European Committee on Antimicrobial Susceptibility Testing. Breakpoint tables for interpretation of MICs and zone diameters. 2016., Version 9. http://www.eucast.org. [Accessed Febrary 2017] #R/I/S patterns from CLSI (SOURCE: http://em100.edaptivedocs.net/GetDoc.aspx?doc=CLSI%20M100%20ED30:2020&sbssok=CLSI%20M100%20ED30:2020%20TABLE%202A&format=HTML#CLSI%20M100%20ED30:202 0%20TABLE%202A) . Table S5: Identified prophages in the chromosome and plasmids in each of the reference ETEC strains. Region of prophages and potential cargo genes are indicated.

Strain Lineage Chromosome/ Prophage Source Locationa Size (kB) GC% No. of Cargo genesb plasmid (Pph) proteins E925 L1 CS1+CS3+CS21 Chromosome E925_Pph_1 Entero_P88_ 1980800-2010357 29.6 50.1 39 flagella biosynthesis regulator NC_026014(25) (E925_01987)

E925_Pph_2 Entero_lambda_ 2829055-2849450 23.4 50.4 24 ompX and sohB (probable (Cryptic) NC_001416(6) Protease) E925_Pph_3 Entero_lambda_ 3229100-3240718 11.62 44.1 15 Outer membrane lipoprotein, blc (cryptic) NC_001416(4)

E925_Pph_4 Shigel_SfIV_ 3724871-3749140 24.3 46.7 32 gadW transcriptional regulator of NC_022749(12) the AraC family (helix-turn-helix motif). tRNA-Thr insertion site pAvM_E925_7 E925_Pph_5 Stx2_c_1717_ 23316-45489 22.1 48.7 22 eatA_3, eatA_4, eatA_5 (eatA NC_011357(3) disrupted in three ORFs), stbAB and psiA E1649 L2 CS2+CS3+CS21 Chromosome E1649_Pph_1 Escher_TL_2011b 1298543-1335201 36.6 49.1 45 istB-like ATP binding protein Cryptic phage NC_019445(29) similar to DNA replication protein, dnaA. E1649_Pph_2 Entero_P88_ 1502430-1542242 29.1 52.4 46 Putative Acetyltransferase and NC_026014(33) arabinose transporter E1649_Pph_3 Entero_P88_ 2177200-2195400 17.7 49.9 23 No cargo genes (cryptic NC_026014(15) phage) E1649_Pph_4 Entero_P4_ 4048900-4077099 21.05 48.4 26 nanC- Porin gene (N- NC_001609(9) acetylneuraminate epimerase precursor and putative inner membrane protein). tRNA-Leu insertion site pAvM_E1649_ E1649_Pph_6 Salmon_SJ46_ 4916371-5038771 122.4 47 138 eatA. This is the P1 like phage- 9 NC_031129(88) plasmid E36 L3 CFA/I + CS21 Chromosome E36_Pph_1 Entero_P4_ 47658-59450 11.6 48 15 Retron-type Reverse transcriptase (cryptic NC_001609(9) and DNA primase phage) E36_Pph_2 Entero_P4_ Two adjacent 46.4 49.6 61 Putative bacteriocin adjacent to the NC_001609(9) and phage: 1220220- intA_2 gene (E36_01826). Insertion Salmon_Fels_2 1231640 and site is in the region of ssrA NC_010463(37) 1231677-1266590 E36_Pph_3 Entero_P88_ 1285100-1328400 43.3 52.7 56 stbA and stbB (plasmid stability NC_026014(41) genes) and tfp pilus assembly gene, P2 family of pilA E36_Pph_4 Entero_mEp460_ 2138535-2199525 48.0 50.3 63 terB (tellurite reistance), and ompX. NC_019716(23 Insert site is tRNA-Gly E36_Pph_5 Entero_mEp460_ 2499123-2535609 36.5 50.9 40 No cargo genes NC_019716(23) (probable cryptic phage)

E36_Pph_6 Burkho_BcepMu_ 2738731-2795496 56.7 52.4 68 Potassium transporter, trkG and NC_005882(32) parB partition protein (with nuclease N-terminal domain) E36_Pph_7 Entero_HK629_ 3024129-3076300 36.5 51.3 56 Tellurite resistance protein and NC_019711(21) ncRNA, dicF and dicC genes present are of interest but may not be considered as cargo). E36_Pph_8 Entero_lambda_ 3532592-3576100 43.5 51.6 71 Outer membrane protein Blc. Group NC_001416(27) II intron reverse transcriptase (ltrA_14) with adjacent group II intron uvrB, sohB_3, dicC (E36_04216) E36_Pph_9 Entero_P4_ 44922786-4519904 27.6 49.8 14 No cargo genes NC_001609(12)

pAvM_E36_12 E36_Pph_10 Entero_BP_4795_ 182814-195964 13.1 52.9 23 No cargo genes (cryptic NC_004813(3) phage) E36_Pph_11 Entero_BP_4795_ 267737-281095 13.3 51.2 21 sopB and estA2 (cryptic NC_004813(3) phage) Prophage* E36_Pph_12 Salmon_Fels_2_ 700-18544 17.8 55.4 29 No cargo genes Contig 2 (Partial - no NC_010463(24) int gene present. Probable cryptic) Prophage* E36_Pph_13 Entero_lambda_ 1-16565 16.5 54.2 22 No cargo genes Contig 998 Cryptic phage) NC_001416(11 E2980 L3 CS7 Chromosome E2980_Pph_1 Salmon_SEN34_ 1314690-1354910 40.22 49.7 50 No cargo genes NC_028699(24) Chromosome E2980_Pph_2 Shigel_Sf6_ 1568700-1611651 42.9 46.4 63 wzy (E2980_01970) gene is an NC_005344(14) oligosaccharide repeat unit polymerase adjacent to the integrase (oac gene in sf6 phage at the same location next to int).tRNA-Arg insertion site Chromosome E2980_Pph_3 Entero_P88_ 2070150-2109664 39.5 51.6 55 Putative plasmid stability proteins NC_026014(43) and repA (E2980_02444, E2980_02445 and E2980_02446) and flagella biosynthesis regulator (as found in the P88 phage of E925 above) near int gene. tRNA-Leu insertion. This pahge is related to P1 phage so it can probably exist as a Phage-Plasmid. Chromosome E2980_Pph_4 Entero_lambda_ 2447775-2466400 18.6 51.8 17 No cargo genes (Cryptic NC_001416(11) phage) Chromosome E2980_Pph_5 Entero_lambda_ 2918700-2962650 43.95 51.6 60 tciAB (tellurite resistance), ompX NC_001416(22) Chromosome E2980_Pph_6 Entero_lambda_ 3588788-3636557 47.8 49.6 58 No cargo genes. tRNA insertion site NC_001416(27) adjacent to int gene, ompX gene Prophage* E2980_Pph_7 Salmon_SEN34_ 8730-21920 13.1 49.4 25 No cargo genes Contig 5 NC_028699(13) E1441 L4 CS6 Chromosome E1441_Pph_1 Salmon_HK620_ 1473781-1517255 43.5 46.5 57 tRNA-Arg insertion site. CatB- NC_002730(17) related O-acetyltransferase adjacent to int Chromosome E1441_Pph_2 Entero_mEp460_ 2663330-2710354 47.0 50.6 65 Anti-adapter gene iraM (operates NC_019716(11) via rpoS) Chromosome E1441_Pph_3 Entero_lambda_ 3194858-3250061 55.2 50.4 66 Phospholipid binding protein, ybhB; NC_001416(26) Inner membrane protein, yeaI; slyB lipoprotein precursor, sohB; outer membrane protein, ompD and slyB and kilR Chromosome E1441_Pph_4 Entero_mEp460_ 3756217-3834601 78.38 52 101 CysO- .Phage adjacent to tRNA- NC_019716(37) Thr E1179 L5 CS5+CS6 Chromosome E1779_Pph_1 Salmon_Fels_2_ 36.9 49.1 45 Cargo genes are present but all the NC_010463(36) 1195952-1232915 hypothetical. ssrA integration site. Chromosome E1779_Pph_2 Entero_P2_ 24.3 49.8 32 No cargo genes identified NC_001895(21) 1783098-1807397

Chromosome E1779_Pph_3 Entero_lambda_ 2368430-2397000 28.57 55.4 37 sohB-probable protease NC_001416(22)

Chromosome E1779_Pph_4 Entero_lambda_ 3157352-3201305 43.9 51.6 56 iraM (anti-adaptor protein blocking NC_001416(18) RpoS degradation)

Chromosome E1779_Pph_5 Entero_cdtI_NC_00 2859100-2906257 47.16 51.6 65 Some potential cargo genes but all 9514(22) hypothetical Chromosome E1779_Pph_6 Entero_lambda 3288450-3335837 47.4 51.5 65 Putative ompD porin protein, NC_001416(29) E1779_03897 encodes ZapA, a cell division protein Chromosome E1779_Pph_7 Shigel_SfII 3839659-3878224 38.5 50.8 55 tRNA-Thr insertion site. Region NC_021857(35) with number of regulators: HTH- type transcriptional regulator gadW- like (acid resistance).lexA repressor, perC activator, gntR family regulators, yfdR family of dicC DNA binding transciprtional regulator, lexA/HTH-type transcriptional regulator; yfbR- 5’ deoxynucleotidase Chromosome E1779_Pph_8 Entero_fiAA91_ss_ 4718700-4750300 31.6 51.7 42 No cargo genes NC_022750(31)

pAvm_E1779_1 Phage islet Stx2_c_1717_ 49007-62586 13.5 51.3 23 estA3/4, traI, traX, finO 9 NC_011357(3) E562 L6 CFA/I+CS21 Chromosome E562_Pph_1 Entero_lambda_ 2695572-2745700 49.4 51.5 62 ftsZ inhibitor gene, kilR. Though NC_001416(22) may be of phage origin.

Chromosome E562_Pph_2 Entero_fiAA91_ss 4666803-4698527 31.7 52 42 No cargo genes NC_022750(31)

E1373 L7 CS6 Chromosome E1373_Pph_1 Entero_mEp460 2848300-2890275 41.98 51.5 22 ompX (resistance to NC_019716(23) complement killing), ,

Tellurite resistance gene, tehB and a S-Adenosyl-l-methionine (SAM)-dependent Chromosome E1373_Pph_2 Shigel_SfIV 3856500-3897232 40.73 48.7 59 tRNA-Thr insertion site. kilA-like NC_022749(34) domain, lexA, perC and gntR Similar to the regulator. A almost identical region content of is present in the Shigella phage E1779_Pph_7 related to SflV (accession number (Shigel_SfII_NC_0 KC814930.1) 21857(35)) pAvM_E1373_ E1373_Pph_3 Salmon_SSU5_ 1-109318 109.1 46.4 130 terB, mind (cell division inhibitor), 29 NC_018843(91) uvrB, parB-like (2 genes, This is the Phage- ORFs:04807 and 04808), hlyB, Flp Plasmid related to pilus assembly (ORF: 04842), grxA pHCM2 and SSU5 (glutaredoxin), cobT and cobS (aerobic cobaltochelatase subunits), dnaE, polA (DNA polymerase I), recA (recombiase A), repA (FIB), nrdA and nrdB (ribonucleoside- diphoasphatase), a lot of hypo proteins. aThe location within the chromosome, plasmid or contig individually. bThe locus_tags will be updated when sequences have been accepted in GenBank. Additional file 1 Detailed description of ETEC reference plasmids

ETEC strain E925 (L1) The reference strain E925 expresses the CFs CS1, CS3 and CS21 along with the enterotoxins LT and STh and belong to lineage L1 [1]. Members of L1 have the same virulence profile. Genomic analysis revealed 4 plasmids in total, pAvM_E925_4-7. Interestingly, two or possibly three of the identified plasmids (pAvM_E925_4, 6 and 7) may be a co-integrated plasmid where individual replicons are present (8). Several plasmids were assigned to the same replicon type, IncFII (pAvM_E925_4, pAvM_E925_5 and pAvM_E925_7). The ETEC plasmid pCoo is a co-integrate with two replicons, IncFII and IncI1. A blastn comparison of the concatenated pAvM_E925_6 and pAvM_E925_7 show that these most likely form a co-integrate plasmid similar to pCoo (Figure S6). pAvM_E925_4 The pAvM_E925_4 is predicted as an IncFII (FII-11) plasmid with a predicted oriT site. The prediction is supported by its size (116,803b) and because of an intact conjugation machinery (tra genes) which is similar to R100 (the quintessential IncFII plasmid) [2]. Blastn analysis revealed that the plasmid is similar to a number of plasmids of sequenced O6:H16 ETEC isolates including the CS3+ plasmid in the reference strain E1649 (Figure 1a). Specifically, plasmid unnamed2 of PacBio sequenced ETEC isolate F5656C1 isolated by CDC in US [3], a plasmid of ETEC isolate FMU073332 isolated in Mexico 1987 (CP017848.1) and plasmid pETEC_80 of type strain E24377A.

Plasmid features: A partial tra region is present, but the plasmid may be mobilizable with the help from other plasmids. The plasmid contains genes involved in plasmid stability (psiAB, stbAB). Two TA systems are present in this plasmid: ccdAB system, where ccdA encoding the toxin and ccdB which encodes the antitoxin needed for survival of the cell, and the hok-sok plasmid antisense RNA-regulated system. Virulence: The pAvM_E925_4 harbours the gene cluster encoding CS3 (cst genes) as well as eltAB (LTh variant LT1) and STh (estA3/4). The CS3 locus consist of 8 CDS, cstA-H. The cstA encodes a chaperone and cstG and cstH encodes two major subunits. The outer membrane usher is encoded by cstB which also contain CDSs for four additional proteins: cstC-F. The C- terminal of cstB is missing compared to the reference [4] and is found in the second open reading frame. A nucleotide comparison between the cstB in E925 and cstB of additional CS3 positive strains (CS1+CS3+/-CS21, CS2+CS3+/-CS21) [1] as well as cstB in X16944.1, to the available reference sequence encoding cstC (63 kD protein found within cstB) show that an extra G is inserted at position 2432 in cstB of all analysed sequences. This generates a frame- shift mutation resulting in a premature stop codon and the C-terminal to be found in the second reading frame instead of the first. In addition to the classical ETEC virulence genes this plasmid also contain the genes encoding for the putative virulence factor EtpBAC [5–7]. The gene encoding the ABC transcriptional regulator Rns is located on this plasmid and required for expression of CS1 or CS2 colonization factors and for adhesion [8]. pAvM_E925_5 Inc group: This is a multi-replicon plasmid with both IncFII (FII-111) and IncFIB (FIB-45). Blastn analysis showed that pAvM_E925_5 is most similar to a plasmid (CP024258.1) from the ETEC strain F5505-C1 [3] with 80% query coverage and 98.9% sequence identity as well as the p557 of the CS1+CS3+CS21 positive reference strain E1392/75 (FN822746) (12) with 68% coverage with 99.9% sequence identity. The p557 does not contain the conjugal machinery which pAvM_E925_5 does. The conjugal transfer genes were found to be 99% identical to the corresponding genes in the multi-drug resistant pHK17a-like plasmid present in an E. coli strain isolated from a patient with a bloodstream infection, specifically a ST95 E. coli, which is a multi-drug resistant plasmid (JF779678.1). Plasmid features: Harbours an intact tra region as a well as a predicted oriT. Genes involved in plasmid stability are present (psiAB, parB and sopAB). Virulence: The locus (lngX1, R, S, T, X2, A-J, P) encoding the colonisation factor CS21, one of the most prevalent CFs, is present in this plasmid. pAvM_E925_6 The plasmid was assigned to IncI1 and may be part of a co-integrated plasmid, most likely together with pAvM_E925_7 which is assigned to IncFII (FII-15) (Figure S6). Blastn comparison showed highest homology to p746 from ETEC strain E1392/75 (FN822748.1) with 96% query coverage and 99.9% nucleotide identity and share sequence homology to pCoo (CR942285) from the ETEC strain JEF100 showed an 80% coverage at a >99% identity. In pAvM_E925_6 there is a duplicated region a region encompassing a shufflon-specific DNA recombinase (rci), traE-G, traM, nikB and nikA have been duplicated and upon this duplication the traE was interrupted due to a deletion. The same region is present in p746 of 1392/75 but not present in the pCoo plasmid. The duplication has resulted in the presence of two oriT regions (Figure 1b). Nucleotide alignment show that large regions are shared across pAvM_E925_6 and pCoo, however the pAvM_E925_6 lacks part of the tra locus (blastn comparison with the IncI1 R64 plasmid). Furthermore, the pAvM_E925_6 plasmid contains an intact pil loci encoding for the thin pilus and is 99% identical to the pil loci in R64. Plasmid features: Genes involved in plasmid stability were identified; stbAB as well as the vapBC toxin-antitoxin system. The plasmid harbours the iib gene encoding the Colicin 1b immunity protein, however it lacks the additional structural cib gene. Virulence: The coo loci (cooABCD), which encodes the proteins needed for CS1, is located on this plasmid. The putative virulence factor cexE and the aat operon is also present. pAvM_E925_7 This plasmid was assigned to IncFII (FII-15) and share part of the CS1 plasmid pCoo (Figure S6). Plasmid features: The plasmid only carries five tra genes (traM, traA, traL, traE and traK). This may be an indication that this plasmid is part of a co-integrated plasmid together with pAvM_E925_6 (IncI1) as described above (see pAvM_E025_6). The following plasmid stability genes are present in the plasmid: ccdAB, psiA, repAB and stbAB. Phage features: A complete prophage (E925_Pph_5: Stx2_c_1717_NC_011357) was identified in this plasmid and contains eatA_3-5 as cargo genes as well as plasmid stability genes stbAB and psiA along with multiple IS elements. Virulence: Two CDSs were identified as eatA_1 and eatA_2, sequence alignment showed that this a disrupted eatA gene. Downstream of eatA_1 and eatA_2 are three additional CDSs predicted as eatA (eatA_3, eatA_4, eatA_5), also disrupted by premature stop codons due to two different single base deletions. The multiple eatA CDSs are due to a duplication event, including ygjH (cyclic di-GMP phosphodiesterase), eatA CDSs, ISEc63 and deltaIS1X2.

ETEC strain E1649 (L2) The reference strain E1649 expresses CS2, CS3 and CS21. The colonisation factor CS2 is located on the chromosome, whereas CS3 and CS21 are plasmid-borne. Two plasmids were fully circularized: pAvM_E1649_8 and pAvM_E1649_10. A phage-like plasmid was identified but not fully circularised, pAvM_E1649_9. A cryptic plasmid (pAvM_E1649_11) was found but the replicon could not be determined. Four prophages were identified in the chromosome, one of them with the fim operon as cargo (Table S5). pAvM_E1649_8 The plasmid is a classical IncFII (FII-15) replicon, however it does not contain a complete tra region but an oriT was identified. The plasmid is similar to the other CS3+ ETEC reference plasmid pAvM_E925_4 (Figure 1A), although an extra region containing eatA is present in pAvM_E1649_8. Furthermore, a region has been duplicated, where in the first region one of the genes encoding MsbB, involved in lipid A biosynthesis, has multiple mutations resulting in premature stop codons. The second region contains the intact msbB gene. An msbB gene is also located in the chromosome and the plasmid encodes a homologous msbB gene. The same as been reported in E. coli O157:H7 [9]. Other genes within the duplicated region are the ccdAB (toxin-antitoxin system; plasmid stability), a gene encoding a polysaccharide deacetylase and three hypothetical proteins. The plasmid is similar to pAvM_E925_4, and a 137,665 bp plasmid pEcoFMU07332d (, CP024277.1) isolated from an ETEC strain collected in Mexico 1987 (87% coverage, 99.88% identity). Several other plasmids from O6:H16 strains share similarity with this plasmid (Figure 1a). Plasmid features: A partial tra region is present as well as the plasmid stability genes psiAB and stbAB. The plasmid contains the hok-sok plasmid antisense RNA-regulated system as well as the ccdAB system involved in plasmid stability. Virulence: The plasmid harbours the CS3 locus, cstA-H and the rns regulator. Together with the CF, the plasmid also harbours the eltAB genes, encoding the LT1 variant (LTh), as well as the sta3/4 (STh). The etpBAC locus is also located on this plasmid. However, the etpA is truncated but may still be functional. Another putative virulence factor is eatA which is also present. The plasmid contains the gene for cexE located upstream of the aat operon. pAvM_E1649_9 This is a P1 phage-like plasmid assigned to IncY that contains the complete prophage E1649_Pph_6 (PHAGE_Salmon_SJ46_NC_031129). pAvM_E1649_9 encodes a full repertoire of genes related to the P1-like phage-plasmid including not only replication protein, repA and the partition genes parA/parB, but also structural genes for the phage tail, major capsid protein, portal protein, a P1-like toxin/anti-toxin system and various tail fibre proteins. The tciA gene encoding a tellurite resistance protein is present, however lacking the full tciABC locus which present in the P1 phage. Furthermore dam, a methylase, is located upstream of tciA. (Figure 2a and Figure S3a). The plasmid is similar to other E. coli plasmids; however, these have been isolated from chicken, human blood and urine and no apparent virulence genes could be identified in the plasmids. This suggests that this plasmid has been introduced from a diverse E. coli from a different source. Plasmid features: The plasmid stability genes sopAB are present. Phage features: A full complement of genes highly similar to P1 bacteriophage. pAvM_E1649_10 The pAvM_E1649_10 plasmid contains two replicons: IncFII (FII-6) and IncFIB (FIB-45) and share sequence homology with pETEC_74 of the ETEC strain E24337A (CP000799.1) (70% query coverage; 99.9% sequence identity) which is a one-replicon plasmid. pAvM_E1649_10 is also highly related to an unknown plasmid from the ETEC strain 2011EL-1370-2 (CP022913.1) which is a two-replicon plasmid (FII and FIB) like pAvM_E1649_10. Plasmid features: A partial tra region, plasmid stability genes psiAB as well as the plasmid antisense RNA-regulated system hok-sok are present. Virulence: The lng locus (lngX1, R, S, T, X2, A-J, P), encoding the CF CS21, is located on this plasmid. pAvM_E1649_11 A small cryptic plasmid of 8,834 bp harbouring regions similar to a small plasmid in Klebsiella variicola strain WCHKP19 plasmid p4_020019 (CP028552). Contains hypothetical genes and two RNA one modulator proteins Rop. No replicons could be identified.

ETEC strain E36 (L3) The strain E36 is one of the two ETEC strains expressing CFA/I. The phylogenetic tree of the ETEC population presented by von Mentzer et al., revealed that there are at least three lineages (L3, L6 and L15) comprising ETEC strains expressing CFA/I [1]. A recent report identified the same lineages but also additional ETEC lineages encompassing CFA/I ST-only strains [10]. In the ETEC strain E36 two plasmids were identified, the largest identified amongst these strains (pAvM_E36_12) categorized as a virulence plasmid, and the smaller plasmid (pAvM_E36_13) categorised as an antibiotic resistance plasmid. pAvM_E36_12 The plasmid is the largest plasmid identified among the eight ETEC reference genomes, a total of 381,858 bp. The size strongly indicates that this a conjugative plasmid as well as the presence of two replicons; IncFII (FII-51) and IncFIB (FIB-10) and potentially a third replicon FIC (FIC-5, novel allele). Due to the size of the plasmid and its content it may be a hybrid plasmid. The plasmids present in the O78 ETEC strain sequenced by Smith et al. [3] show similarity across pAvM_E36_12 (Figure 1c). Plasmid features: This plasmid contains three tra regions, some which are complete and others only partial, and multiple copies of several stability genes are present: psiAB (6 copies), repAB (3 copies), sopAB (2 copies) and stbAB (4 copies). The plasmid also contains the hok and mok genes, but lack sok, which is a short ssRNA antitoxin. Virulence: The pAvM_E36_12 harbours the CFA/I gene cluster cfaA, B, C, D and the CS21 gene cluster lngX1, R, S, T, X2, A-J, P. The gene encoding STh, estA2 is also, eatA and etpBAC. pAvM_E36_13 By PlasmidFinder this plasmid was assigned to the less common IncB/O/K/Z plasmid type, and further comparisons revealed that the repA gene was 860/873 bp identical to the Z replicon reference (M93064). It is related to a plasmid present in several Shigella sonnei genomes as well as a plasmid present in a number of E. coli strains that were isolated from faeces (including from that of healthy adults, pCERC10, MF156268) or blood from patients in a hospital. Plasmid features: The plasmid contains a full transfer region and an oriT was identified. Several plasmid stability genes are present: psiAB, relE and stbAB. In addition, the plasmid also has a pil locus encoding for a thin pilus. The plasmid harbours the ssRNA sok and a putative hok gene part of the Hok/Gef protein family. AMR: This plasmid carries a complete Tn10 containing the tet(B) tetracycline resistance genes. IS10L of Tn10 is interrupted by ISSbo1.

ETEC strain E2980 (L3) The ETEC strain E2980 belong to the same lineage, L3, as the CFA/I-expressing strain E36. Strain E2980 express CS7 and LT, specifically LT1. In total three plasmids are present, pAvM_E2980_14-16. Two virulence plasmids and one antibiotic resistance plasmid. pAvM_E2980_14 The virulence plasmid is assigned to IncI1, however one of the marker genes for this Inc group is not present (sogS) and the identified marker gene (trbA) is not 100% identical to the reference indicating that this may be a new ST allele. This plasmid also contains a pil locus with >90 % amino acid similarity with the pil operon present in the reference plasmid R64 (AB027308.1). The plasmid is near identical (93-97% coverage and >99% identity) three plasmids all belonging to O144 ETEC strains; strain 90-9276 plasmid unnamed2 (CP024298.1) and strain 90-9280 plasmid unnamed1 (CP024241.1), both collected in Bangladesh 1988, and strain E2264 plasmid unnamed1 (CP023350.1) collected in Bangladesh 18 years later in 2006 (Figure 1d). Plasmid features: A partial tra region and several plasmid stability genes, parA, relE and stbAB, are present as well as the ssRNA sok is present but the plasmids lack the hok and mok gene. Virulence: This is a virulence plasmid containing the CS7 locus (csvA, B, C, E, F, D) as well as the eltAB encoding LT, specifically the variant LT1. The CS7 regulator, CsvR, is also present in this plasmid. In a recent paper it was proposed that the expression of the putative virulence factor cexE is translocated with the help from the aat locus because the cexE gene was located directly downstream of the aat operon [11]. However, in this plasmid the aat locus is present roughly 17,500 bases upstream of cexE adjacent to the regulator csvR (Figure 1d). pAvm_E2980_15 This is an IncFII (FII-17, novel allele) resistance plasmid of with a classical IncFII copA replication region including an extensive conjugation machinery. The plasmid lacks an oriT but the relaxases traI is present. The oriT may have been lost in the circularisation process as the traI relaxases is positioned just at the end of the plasmid. The plasmid is closely related to another plasmid of the above mentioned CS7+ ETEC strain E2264 plasmid unnamed2 (CP023351.1) with 92% query coverage and 99.9% sequence identity. The plasmid is also related to the O25 strain F5505C1 plasmid unnamed1 (81% query coverage and 99.1% identity) which belong to a different ETEC lineage. Similar plasmids are present in other E. coli strains. Plasmid features: A complete tra region and several plasmid stability genes, psiAB, relE and stbAB, are present. The ssRNA sok is present sok and a putative hok gene part of the Hok/Gef protein family. AMR: This plasmid harbours a 6 kb resistance region, containing a fragment of Tn5393 containing the strA and strB resistance genes, sul2 in a fragment of GIsul2 and blaTEM derived from Tn2. The IR of Tn5393 bounds one end of the region, while IS26 is at the other. pAvM_E2980_16 This is a quite small plasmid (48,305bp) in comparison to other plasmids presented here. It was not possible to determine the incompatibility group for the plasmid (pMLST and PlasmidFinder predictions were inconclusive), but is similar to Inc1 and hence called Inc1-like. It is highly similar to a plasmid (CP023352.1) part of the CS7+ ETEC strain E2264 with 94% query coverage and 99.9% sequence identity which has a similar size. Plasmid features: No oriT or relaxases was found. The plasmid stability stbAB and vapBC genes were present in this plasmid. The plasmid also harbours the TA-system vapBC. Virulence: This plasmid contains genes encoding two putative ETEC virulence factors, eatA and etpBAC.

ETEC strain E1441 (L4) E1441 express LT, CS6 and CS21 and is similar to ETEC isolates ATCC 43886/E2539C1 and 2014EL-1346-6 [20]. These isolates were collected in the 70-ties [12] and 2014 (from a CDC collection), respectively, and assigned as O25:H16 which is the O group determined for E1441 in silico. pAvM_E1441_17 This is a multi-replicon virulence and resistance plasmid, IncFII (FII-35, novel allele) and IncFIB (FIB-35, novel allele). The plasmid is similar (78% coverage, >99% identity) to a plasmid (CP024258.1) from the ETEC strain F5505-C1 collected 1998 (location unknown). Plasmid features: A 29.3 kb segment of the transfer region has been inverted between two inversely oriented copies of ISCro1b. Despite all of the transfer genes being intact, the functionality of this locus is unknown. A predicted oriT was identified. Several plasmid stability genes are present: stbAB, sopAB as well as the TA-systems pemI/K. The ssRNA sok is present and a putative hok gene part of the Hok/Gef protein family. Furthermore, the srnAC plasmid antisense RNA-regulated system [13] is also present. Virulence: The gene cluster (DlngX1, lngR, S, T, X2, A-J, P) encoding the colonisation factor CS21. Notably, several SNPs as well as a 9bp insertion has occurred in the first 31 bases of DlngX1 (in comparison with lng locus in pAvM_E925_5). The function of lngX1 is not known, hence whether the mutations and insertion affect the expression of CS21 is difficult to discern. AMR: This plasmid harbours a complex resistance region composed of fragments of transposons. The inverted repeat of a partial Tn1000 bounds one end. A class 1 integron is located in a partial Tn21 segment and carries the cassette array dhfr15 and aadA1 as well as sul1 in the 5’-conserved segment. Following IS186B and a fragment of Tn1721 carrying the tet(A) tetracycline resistance module, is the mercury resistance module of Tn21, bounding the other end of the region [14,15]. pAvM_E1441_18 This is an IncFII (FII-106) plasmid that carries several ETEC virulence and putative virulence factors. The same plasmid is present in two O25 ETEC strains, F5504-C1 and ATCC 43886. The plasmids (CP024259.1 and CP024255.1) share >99% identity with 100% coverage (Figure 1e). Plasmid features: Stability genes present are: parA, psiAB and stbAB as well as a tra region. The ssRNA srnC is present but the srnB gene was interrupted by an insertion element (interrupted ISEc12). Another ssRNA, sok, was identified. Virulence: The plasmid harbours the gene cluster encoding the CF CS6 (cssA, B, C, D) and eltAB (LTh). An interrupted eatA is present with multiple non-synonymous SNPs and gaps most likely resulting in a non-functional gene. Another putative virulence factor, cexE, is located upstream of the aat locus (aatPABCD) (Figure 1e).

ETEC strain E1779 (L5) The CS5+CS6 ETEC strain E1779 belong to lineage 5 (L5) [1] and is LT+STh positive. CS5 and CS6 positive strains are commonly isolated globally and has been reported to be the most common virulence profile in Bangladesh and India [16,17]. pAvM_E1779_19 This is an IncFII (FII-11) virulence plasmid. It is 100% identical to multiple ETEC plasmids with the same virulence profile. The 142 kb plasmid has high similarity and similar plasmid size to several plasmids isolated from ETEC in India, Mali and Gambia as part of the GEMS study and subsequently sequenced [56] including e.g. p504237_142 (CP025863.1), p204446_146 (CP025911.1), and p120899_146 (CP025917.1) as well as the 165 kb plasmid F5176C6 plasmid unnamed1 isolated by CDC in 1997 and the plasmid F5176C6 plasmid unnamed1 isolated in Bangladesh in 2009 [18]. This indicates a high level of conservation over time (Figure 1f). Plasmid features: Several tra genes are present, but is not complete. Most likely the transfer genes of other plasmid within the same cell is used for its mobility. The following plasmid stability genes: psiAB, repAB, and stbAB are present as well as the TA-systems ccdAB and vapBC. A ssRNA sok is located upstream of a putative gene related to the Hok/Gef family. Virulence: CS5 (csfABCDEFG) and CS6 (cssABCD) and estA3/4 (STh). CsfR (often annotated as csvR, like the regulator of CS7), a transcription factor known to regulate CS5, is located on the same plasmid but non-adjacent to the csf locus. A disrupted eatA (split into two CDSs) which is most likely non-functional (Figure 1f) pAvM_E1779_20 The 88,759 bp long plasmid is an IncFII (FII-106) and contain the genes for the ETEC toxin LT. It is 100% identical to unnamed plasmid2 (CP023348.1) from ETEC isolate E2256 isolated in Bangladesh 2006 and strain 204446 plasmid p204446_92 (CP025912.1) collected in Mali 2010. In addition, high similarity to plasmids; p103605_83 (CP025922.1), p120899_76 (CP025918.1), and p204576_83 (CP025909.1), isolated as part of the GEMS study in Gambia 2010, 2012 and in Mali 2010, respectively, was evident. Finally, high similarity to F5176C6 plasmid unnamed2 (CP024669.1) isolated from an O167:H5 ETEC collected by CDC in 1997 was also found confirming earlier findings that LT STh CS5+CS6 isolates can be both O167 and O115 but they share nearly identical plasmids. Plasmid features: An incomplete tra regions and the stability genes psiAB and stbAB are present. The plasmid antisense RNA-regulated system hok-sok is present, however the ssRNA sok is located in the same orientation as the hok and mok gene. Virulence: The gene eltAB encoding one of the ETEC enterotoxins, LTh, is located on the plasmid. pAvM_E1779_21 This an IncFIIY plasmid that contain plasmid-related stability genes as well as hypothetical genes. No virulence or antibiotic resistance genes are present. It is similar (99% coverage and >99% identity) to plasmid p503440_68 (CP025885.1) of the CS6+CS21 strain 503440 [10]. pAvM_E1779_22 This an IncFII (FII-44) plasmid containing a complete transfer region highly similar to the region of R1 [2]. The plasmid matches regions of the plasmid (CP044404.1) identified in the O102 E. coli strain NMBU-W10C18 collected from surface water outside of Norway in 2018 and a multi-drug resistant plasmid (CP021536.1) from the E. coli strain AR_0119.

ETEC strain E562 (L6) Strain E562 belong to lineage 6 (L6) [1] which encompass CFA/I+STh strains with the novel ON3 genotype [19]. In total five plasmids were identified, three plasmids carrying ETEC virulence and putative virulence genes and one resistance plasmid. pAvM_E562_23 The Inc group analysis was inconclusive but subtyping the plasmid identified a potentially new Inc group related to IncI1 ST288 (for details see Additional file 4). The plasmid is most likely originated from Salmonella strains as several plasmids from different serotypes share ~67% coverage and >96% identity with pAvM_E562_23. Plasmid features: The plasmid contains a full tra region as well as the P pili conjugation machinery (pilI-V). However, this P-pilis is different from the one found in p26 as it contains pilJ and pilK and show no sequence homology. Blastn analysis showed that the pil operon is near identical to the pil operon found on plasmids in Salmonella enterica subsp. enterica. In fact, both the tra and pil loci are indicated to have originated from Salmonella plasmids. Stability genes parB, psiAB and stbAB are present. The plasmid also harbours the ssRNA sok and a putative hok gene part of the Hok/Gef protein family. Virulence: The genes encoding for the putative ETEC virulence genes EatA and the EtpBAC is present. The genes are located close to each other and there are full and partial IS in the flanking sequence, but there is no strong evidence for how these genes were acquired. pAvM_E562_24 The plasmid harbours two replicons: IncFII (FII-73, novel allele) and IncFIB (FIB-42). The plasmid is similar (94% coverage; >99% identity) to plasmid p504239_155 in the O128 ETEC strain 504239 collected in India 2010 [10], however this plasmid is almost twice the size (155,850bp) compare to pAvM_E562_24 with its 86,655bp. Plasmid features: A partial transfer region is present and the stability genes present are psiAB and stbAB. The plasmid also harbours the vapBC genes, a toxin-antitoxin system involved in plasmid maintenance and the ssRNA sok and a putative hok gene part of the Hok/Gef protein family. Virulence: The plasmid contains the lng locus which encodes CS21. pAvM_E562_25 It belongs to the IncFII (FII-84) replicon and carries multiple virulence genes. It is near identical (100% coverage; >99% identity) to plasmid p504239_101 in the CFA/I STh O128 ETEC strain 504239 part of lineage L6 [20,21] (Figure 1g). Similarity across additional plasmids from CFA/I positive O128 plasmids is also found, however these ETEC strains belonging to a different ETEC lineage (L15) [1,10]. Plasmid features: Only traM is present, in opposite direction to the predicted oriT. The plasmid may be mobilizable with the help of another plasmid. Since the circularisation of the plasmid was not complete there may be genes missing that could be involved in conjugation. Several stability genes present are: parB, psiAB (truncated psiA), relE/B (TA-system), repAB. The plasmid also harbours the ssRNA sok and a putative hok gene part of the Hok/Gef protein family. Virulence: CFA/I locus cfaABCE and estA2 (STh), which are clustered together. The cfaD gene is interrupted by a premature stop codon and is identical to cfaD’ (M55661.1) from a previous study [22]. Whether cfaD’ is functional or not remains unanswered. Although, the AraC-regulator Rns is also located within the plasmid which might be involved in the regulation of the CFA/I fimbriae. Furthermore, the putative virulence factor cexE is located downstream of the aat locus, however the aatD is located non-adjacent (12kb downstream of cexE) to the rest of the aat locus (aatPABC) (Figure 1g). pAvM_E562_26 This plasmid belongs to the less prevalent IncB/O/K/Z group. More detailed comparisons revealed that it’s repA gene is 1075/1077 bp identical to that of the K reference plasmid (M93063). Blastn analysis of the pAvM_E672_26 reveals that the same plasmid is found several additional species. It is similar (92% coverage; >98% identity) to p15648-2 from the Escherichia coli strain NCCP15648. The NCCP15648 strain was isolated from a patient with diarrhoea and has Shiga-like toxin genes and eaeA (intimin) which is involved in adherence. The same type of plasmid is also found in Shigella flexneri 2a strain 1508 (pSF150802: CP030917.1) and Salmonella enterica subspecies enterica strain H-185(pR805a: MK088173.1). Plasmid features: The plasmid contains a complete tra region and is most likely conjugative and may aid in the mobilization of other plasmids. The genes psiAB involved in plasmid stability are present as well as the plasmid antisense RNA-regulated system pndAC [86]. Virulence: A pil operon is present, although missing pilJ and pilK, and it is highly similar to the pil operons found in the plasmids mentioned above (99.43-99.52% identity). This pil operon was first described in a Shiga-toxigenic Escherichia coli and shown to be functional but did not appear to be involved in adherence to human cell lines [23]. pAvM_E562_27 This is a resistance plasmid belonging to IncFII (FII-35, novel allele). A plasmid (CP023351.1) in the ETEC strain E2264 is the most closely related plasmid (71% coverage; >99% identity). Plasmid features: The tra region present is similar (81% coverage; >96% identity) to the transfer region of R1 [2]. However, traD has been interrupted by IS2. The stability genes parB, psiAB and stbAB are present. The plasmid also carries the hok-sok plasmid antisense RNA- regulated system is also present. AMR: Several antibiotic and heavy metal resistance genes are clustered together in a region bounded at one end by IS1R and IS26 at the other. A complete Tn2 containing blaTEM-1b is inserted in IS1R. A partial Tn1721 containing the tetA(A) and tetR(A) tetracycline resistance genes and the mercury resistance locus from Tn21 are also present in this region.

ETEC strain E1373 (L7) The ETEC strain E1373 belong to lineage L7 (L7) encompassing CS6+STp ETEC strains with ST182 [1]. CS6+STp strains with ST-type 182 have previously been identified in the feces of travelers in Guatemala and Mexico [24]. The ETEC strain E1373 also has the same O-serotype (O169) as strains known to have caused multiple outbreaks [25–27]. pAvM_E1373_28 This is a multi-replicon plasmid, IncFII (FII84, novel allele) and IncFIB (FIB-56, novel allele). Multiple plasmids from CS6+ ETEC strains are near identical (99% coverage; >99% identity) to pAvM_E173_28. For example, the plasmid pEntYN10 (AP014654.2) of the ETEC strain O169:H41 YN1 isolated in 1991 from a patient with diarrhoea in Japan (Figure 1h). The pAvM_E1373_28 has previously been submitted under a different name (pCss_E1373, LN) but was at the time not manually annotated, except for the virulence genes. Plasmid features: The plasmid does not have a transfer region but may be mobilizable with the help of other plasmids. The stability gene parB and psiAB are present. Two copies of the ssRNA sok is present. Virulence: The plasmid pAvM_E1373_28 harbours the cssABCD operon encoding CS6 as well as the gene (estA1) encoding for STp. Interestingly, this plasmid harbours two additional CF-like loci, one CS8-like and a F4-like locus. The similar plasmid, pEntYN10 also harbours the CS6, CS8-like and F4-like locus (Figure 1h). pAvM_E1373_29 This is phage-like IncFIB plasmid shows nucleotide sequence homology to the E. coli plasmids AnCo1 (KY515224.1) and AnCo2 (KY515225.1) as well as the Salmonella typhi phage-like plasmid pHCM2 (AL513384.1). However, the roi (Roi is involved in the lytic growth phase [28]) from pHCM2 has been replaced by terB, a tellurite resistance gene. Plasmid features: The plasmid stability genes psiAB, however the additional plasmid stability genes stbAB, parM, pemK and pemI are missing [29]. Phage features: This plasmid does not confer any obvious phenotype upon the bacteria and due to its content should be considered a phage-like plasmid. The plasmid does not encode virulence- or antibiotic resistance determinants, however it contains a full complement of bacteriophage genes and shows a remarkably high synteny to not only Phage-Plasmid pHCM2 but also phage SSU5. The genes shared with pHCM2 and SSU5 encode a cluster of genes similar to lambda-like phage tail fibre proteins, major capsid, collar and other important structural proteins. Furthermore, the Phage-Plasmid encodes a wide variety of putative genes potentially involved in DNA metabolism and replication in bacteria and bacteriophage. Some notable genes are similar to rnhA, dhfR, thyA and nrdAB, which are involved in DNA metabolism and replication. Other genes identified in pAvM_E1373_29 contribute to a DNA synthesis complex with similarities to those encoded in the T4 bacteriophage genome as well as a ppGpp synthase/hydrolase that is known to be involved in increased tolerance to antibiotics [30,31].

References

1. von Mentzer A, Connor TR, Wieler LH, Semmler T, Iguchi A, Thomson NR, et al. Identification of enterotoxigenic Escherichia coli (ETEC) clades with long-term global distribution. Nature genetics. 2014;46:1321–6.

2. Cox KEL, Schildbach JF. Sequence of the R1 plasmid and comparison to F and R100. Plasmid. 2017;91:53–60.

3. Smith P, Lindsey RL, Rowe LA, Batra D, Stripling D, Garcia-Toledo L, et al. High- Quality Whole-Genome Sequences for 21 Enterotoxigenic Escherichia coli Strains Generated with PacBio Sequencing. Genome announcements. 2018;6:6167.

4. Jalajakumari MB, Thomas CJ, Halter R, Manning PA. Genes for biosynthesis and assembly of CS3 pili of CFA/II enterotoxigenic Escherichia coli: novel regulation of pilus production by bypassing an amber codon. Molecular microbiology. 1989;3:1685–95.

5. Roy K, Hilliard GM, Hamilton DJ, Luo J, Ostmann MM, Fleckenstein JM. Enterotoxigenic Escherichia coli EtpA mediates adhesion between flagella and host cells. Nature. 2009;457:594–8.

6. Roy K, Hamilton D, Allen KP, Randolph MP, Fleckenstein JM. The EtpA exoprotein of enterotoxigenic Escherichia coli promotes intestinal colonization and is a protective antigen in an experimental model of murine infection. Infection and immunity. 2008;76:2106–12.

7. Roy K, Hamilton D, Ostmann MM, Fleckenstein JM. Vaccination with EtpA glycoprotein or flagellin protects against colonization with enterotoxigenic Escherichia coli in a murine model. Vaccine. 2009;27:4601–8.

8. Caron J, Coffield LM, Scott JR. A plasmid-encoded regulatory gene, rns, required for expression of the CS1 and CS2 adhesins of enterotoxigenic Escherichia coli. Proceedings of the National Academy of Sciences. 1989;86:963–7.

9. Kim S-H, Jia W, Bishop RE, Gyles C. An msbB homologue carried in plasmid pO157 encodes an acyltransferase involved in lipid A biosynthesis in Escherichia coli O157:H7. Infection and immunity. 2004;72:1174–80.

10. Hazen TH, Nagaraj S, Sen S, Permala-Booth J, Canto FD, Vidal R, et al. Genome and Functional Characterization of Colonization Factor Antigen I- and CS6-Encoding Heat- Stable Enterotoxin-Only Enterotoxigenic Escherichia coli Reveals Lineage and Geographic Variation. mSystems. 2019;4:209.

11. Nicklasson M, Sjöling Å, von Mentzer A, Qadri F, Svennerholm A-M. Expression of colonization factor CS5 of enterotoxigenic Escherichia coli (ETEC) is enhanced in vivo and by the bile component Na glycocholate hydrate. PLoS One. 2012;7:e35827.

12. Wachsmuth K, Wells J, Shipley P, Ryder R. Heat-labile enterotoxin production in isolates from a shipboard outbreak of human diarrheal illness. Infect Immun. 1979;24:793–7. 13. Nielsen AK, Thorsted P, Thisted T, Wagner EGH, Gerdes K. The rifampicin-inducible genes srn6 from F and pnd from R483 are regulated by antisense RNAs and mediate plasmid maintenance by kiiling of plasmid-free segregants. Mol Microbiol. 1991;5:1961–73.

14. Nascimento AMA, Chartone-Souza E. Operon mer: bacterial resistance to mercury and potential for bioremediation of contaminated environments. Genetics and molecular research : GMR. 2003;2:92–101.

15. Wireman J, Liebert CA, Smith T, Summers AO. Association of mercury resistance with antibiotic resistance in the gram-negative fecal bacteria of primates. Applied and environmental microbiology. 1997;63:4494–503.

16. Begum YA, Baby NI, Faruque ASG, Jahan N, Cravioto A, Svennerholm A-M, et al. Shift in phenotypic characteristics of enterotoxigenic Escherichia coli (ETEC) isolated from diarrheal patients in Bangladesh. PLoS neglected tropical diseases. 2014;8:e3031.

17. Bhakat D, Debnath A, Naik R, Chowdhury G, Deb AK, Mukhopadhyay AK, et al. Identification of common virulence factors present in enterotoxigenic Escherichia coli isolated from diarrhoeal patients in Kolkata, India. J Appl Microbiol. 2018;126:255–65.

18. Wajima T, Sabui S, Fukumoto M, Kano S, Ramamurthy T, Chatterjee NS, et al. Enterotoxigenic Escherichia coli CS6 gene products and their roles in CS6 structural protein assembly and cellular adherence. Microbial pathogenesis. 2011;51:243–9.

19. Iguchi A, von Mentzer A, Kikuchi T, Thomson NR. An untypeable enterotoxigenic Escherichia coli represents one of the dominant types causing human disease. Microbial genomics. 2017;3.

20. Khalil IA, Troeger C, Blacker BF, Rao PC, Brown A, Atherly DE, et al. Morbidity and mortality due to shigella and enterotoxigenic Escherichia coli diarrhoea: the Global Burden of Disease Study 1990-2016. The Lancet Infectious diseases. 2018;18:1229–40.

21. Nada RA, Shaheen HI, Khalil SB, Mansour A, El-Sayed N, Touni I, et al. Discovery and phylogenetic analysis of novel members of class b enterotoxigenic Escherichia coli adhesive fimbriae. Journal of clinical microbiology. 2011;49:1403–10.

22. Jordi BJ, Willshaw GA, Zeijst BA van der, Gaastra W. The complete nucleotide sequence of region 1 of the CFA/I fimbrial operon of human enterotoxigenic Escherichia coli. DNA sequence. 1992;2:257–63.

23. Srimanote P, Paton AW, Paton JC. Characterization of a Novel Type IV Pilus Locus Encoded on the Large Plasmid of Locus of Enterocyte Effacement-Negative Shiga-Toxigenic Escherichia coli Strains That Are Virulent for Humans. Infection and immunity. 2002;70:3094–100.

24. Nicklasson M, Klena J, Rodas C, Bourgeois AL, Torres O, Svennerholm A-M, et al. Enterotoxigenic Escherichia coli multilocus sequence types in Guatemala and Mexico. Emerging infectious diseases. 2010;16:143–6. 25. Nishikawa Y, Helander A, OGASAWARA J, MOYER NP, HANAOKA M, HASE A, et al. Epidemiology and properties of heat-stable enterotoxin-producing Escherichia coliserotype O169[ratio ]H41. Epidemiology and infection. 1998;121:31–42.

26. Pan J-C, Ye R, Meng D-M, Zhang W, Wang H-Q, Liu K-Z. Molecular characteristics of class 1 and class 2 integrons and their relationships to antibiotic resistance in clinical isolates of Shigella sonnei and Shigella flexneri. The Journal of antimicrobial chemotherapy. 2006;58:288–96.

27. El-Gendy AM, Mansour A, Shaheen HI, Monteville MR, Armstrong AW, El-Sayed N, et al. Genotypic characterization of Egypt enterotoxigenic Escherichia coli isolates expressing coli surface antigen 6. J Infect Dev Ctries. 2013;7:090–100.

28. Clerget M, Boccard F. Phage HK022 Roi protein inhibits phage lytic growth in Escherichia coli integration host factor mutants. J Bacteriol. 1996;178:4077–83.

29. Tobias J, von Mentzer A, Frykberg PL, Aslett M, Page AJ, Sjöling A, et al. Stability of the Encoding Plasmids and Surface Expression of CS6 Differs in Enterotoxigenic Escherichia coli (ETEC) Encoding Different Heat-Stable (ST) Enterotoxins (STh and STp). PLoS One. 2016;11:e0152899.

30. Syal K, Flentie K, Bhardwaj N, Maiti K, Jayaraman N, Stallings CL, et al. Synthetic (p)ppGpp Analogue Is an Inhibitor of Stringent Response in Mycobacteria. Antimicrob Agents Ch. 2017;61:e00443-17.

31. Wexselblatt E, Katzhendler J, Saleem-Batcha R, Hansen G, Hilgenfeld R, Glaser G, et al. ppGpp analogues inhibit synthetase activity of Rel proteins from Gram-negative and Gram- positive bacteria. Bioorgan Med Chem. 2010;18:4485–97.