bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes 2 revealed through bacterial Hi-C 3 4 Alyssa Kent1*, Albert Vill1*, Qiaojuan Shi1, Michael J. Satlin2, Ilana Lauren Brito1 5 6 1 Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY 7 2 Department of Internal Medicine, Weill Cornell Medicine, New York, NY 8 9 * Denotes equal contribution 10 11 Abstract 12 13 The gut microbiome harbors a ‘silent reservoir’ of antibiotic resistance (AR) genes that is thought to 14 contribute to the emergence of multidrug-resistant pathogens through the process of horizontal gene 15 transfer (HGT). To counteract the spread of AR genes, it is paramount to know which organisms harbor 16 mobile AR genes and with which organisms they engage in HGT. Despite methods to characterize the 17 bulk presence1, abundance2 and function3 of AR genes in the gut, technological limitations of short-read 18 sequencing have precluded linking bacterial taxa to specific mobile genetic elements (MGEs) and their 19 concomitant AR genes. Here, we apply and evaluate a high-throughput, culture-independent method for 20 surveilling the bacterial carriage of MGEs, based on bacterial Hi-C protocols. We compare two healthy 21 individuals with a cohort of seven neutropenic patients undergoing hematopoietic stem cell 22 transplantation, who receive multiple courses of antibiotics throughout their prolonged hospitalizations, 23 and are thus acutely vulnerable to the threat of multidrug-resistant infections4. We find that the networks 24 of HGT are surprisingly distinct between individuals, yet AR and mobile genes are more dispersed across 25 taxa within the neutropenic patients than the healthy subjects. Our data further suggest that HGT is 26 occurring throughout the course of treatment in the microbiomes of neutropenic patients and within the 27 guts of healthy individuals over a similar timeframe. Whereas most efforts to understand the spread of AR 28 genes have focused on pathogenic species, our findings shed light on the role of the human gut 29 microbiome in this process. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

30 The acquisition of antibiotic resistance (AR) genes has rendered important pathogens, such as 31 multidrug-resistant (MDR) Enterobacteriaceae and Pseudomonas aeruginosa, nearly or fully 32 unresponsive to antibiotics. It is widely accepted that these so-called ‘superbugs’ acquire AR genes 33 through the process of horizontal gene transfer (HGT) with members of the human microbiome with 34 whom they come into contact5. The emergence of these MDR threatens our ability to perform 35 life-saving interventions, such as curative hematopoietic cell transplants for patients with hematologic 36 malignancies6. Furthermore, antibiotic use, required for vital prophylaxis in these patients, has been 37 proposed as a trigger for HGT. Although tools are available to identify AR genes within the gut 38 microbiome, and characterize their function7, abundance8,9 and their host-associations10, no studies have 39 attempted to monitor the bacterial host associations of AR genes and mobile elements during relatively 40 short periods, such as during these patients’ hospitalizations. 41 42 To determine the bacterial hosts of mobile AR genes, we utilized a high-throughput chromatin 43 conformation capture (Hi-C) method aimed at sampling long-range interactions within single bacterial 44 genomes11,12,13. Briefly, while cells are still intact, DNA within individual cells is crosslinked by 45 formaldehyde. Cells are then lysed and the DNA is cut with restriction enzymes, biotinylated, and 46 subjected to dilute ligation to promote intra-molecular linkages between crosslinked DNA. Crosslinking 47 is reversed and then ligated DNA molecules are pulled-down and made into DNA libraries for 48 sequencing. As is, this protocol has been used to improve metagenomic assemblies of bacterial genomes14 49 and has identified a handful of strong plasmid- and phage-bacterial host associations15,16,17,18, suggesting 50 that this technique could be applied to link mobile genes with specific taxa more broadly and to observe 51 the process of HGT over time. 52 53 Here, we develop a modified version of current Hi-C protocols and analytical pipelines (Extended Data 54 Figure 1) in conjunction with metagenomic shotgun sequencing to surveil the bacterial taxa harboring 55 specific mobile AR genes in the gut microbiomes of two healthy individuals and seven patients 56 undergoing hematopoietic stem cell transplantation. These patients have prolonged hospitalizations 57 during their transplant (21 ± 4 days) and often receive multiple courses of antibiotic therapy, increasing 58 the likelihood of an MDR infection. As a result of their condition and treatment, these patients face 59 mortality rates of 40-70% when bacteremic with carbapenem-resistant Enterobacteriaceae (CRE) or 60 carbapenem-resistant Pseudomonas aeruginosa19, and therefore represent a salient population for 61 surveillance and one in which MDR pathogens may emerge and/or amplify under antibiotic selection. Gut 62 microbiome samples for patients and healthy subjects were collected over a 2-3-week period, which, for 63 the neutropenic patients started upon admission for transplant and continued during their hospitalization 64 until neutrophil engraftment (Figure 1A, Extended Data Table 1,2). 65 66 We introduced a number of improvements to current bacterial Hi-C protocols. We changed sample 67 storage and optimized restriction enzymes to improve the congruence between the composition of 68 metagenomic and Hi-C sequencing libraries (Extended Data Figure 2). We also integrated Nextera XT 69 sequencing library preparation directly into the Hi-C experimental protocol, streamlining operations and 70 decreasing sample preparation time. Importantly, within diverse bacterial communities such as the gut 71 microbiome, MGEs may be highly promiscuous and recombinogenic, complicating both assembly20 and 72 linkage analyses21. Therefore, we implemented a computational workflow to assemble genomes, 73 separating large integrated MGEs onto their own contigs, and allowing them to associate with genomes 74 via binning or Hi-C connections. In a mock community of three organisms, each harboring an identifiable 75 plasmid, we are able to confidently link each plasmid to its nascent genome (Extended Data Figure 3). 76 77 Our Hi-C experimental and computational approach results in robust linkages between contigs in human 78 microbiome samples. Hi-C read pairs linking non-mobile contigs with contradictory taxonomic 79 annotations are rarely observed (3.4% at the genus-level) and likely represent homologous sequence 80 matches, highlighting the purity of our Hi-C libraries (Figure 1B). Hi-C read pairs linking two contigs are bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

81 preferentially recruited to contigs that are longer and more abundant, but to a lesser degree than expected, 82 reducing potential bias in our dataset toward highly abundant organisms (Extended Data Figure 4). We 83 binned contigs using several tools (Maxbin22, MetaBat and Concoct), and applied a binning aggregation 84 strategy, DAS Tool23, to obtain a set of draft genomic assemblies. As misassembly can resemble HGT, we 85 removed assemblies with greater than 10% contamination, as determined by CheckM, resulting in 86 taxonomically coherent assemblies (Extended Data Figure 5), albeit a greater number of unbinned contigs 87 (24.6% of the total) (Extended Data Table 3). We then apply conservative criteria to link mobile and 88 mobile AR-containing contigs with the genomic draft assemblies, considering an MGE part of a genome 89 assembly only if it is directly linked to it by at least two uniquely-mapped Hi-C read-pairs. As MGEs are 90 known to recombine, this mitigates the potential for falsely linking contigs that merely share common 91 mobile genes. However, this also potentially reduces our ability for overall detection, especially for larger 92 MGEs, since mobile contigs are often fragmented in metagenomic assemblies24. Nevertheless, we 93 restricted our analysis to those AR-organism and MGE-organism linkages derived from high-confidence 94 read mappings. 95 96 Hi-C significantly improved our ability to detect mobile gene-bacterial host linkages beyond standard 97 metagenomic assembly alone. Hi-C confirms many of the AR gene-taxa and mobile gene-taxa 98 associations observed in the metagenomic assemblies (30.49% ± 11.49% of the AR genes; 30.1% ± 99 9.52% of the mobile genes), but importantly adds on average 31.81% ± 16.28% AR gene associations and 100 36.64% ± 11.56% mobile gene associations to those observed by metagenome assembly alone (Figure 101 1C). Furthermore, whereas metagenomic assembly methods can generally link a single mobile gene 102 cluster to one or two organisms, our Hi-C method was able to identify up to 15 bacterial hosts harboring 103 the same AR or mobile gene, requiring two or more Hi-C linkages within a single individual (mean = 104 3.53 ± 5.69 bacterial hosts per AR gene, Figure 1D; mean = 6.85 ± 10.88 bacterial hosts per gene, 105 Extended Data Figure 6). A larger percentage of AR and mobile genes overall (8.1%±5.2% vs. 106 0.9%±0.9% for AR genes and 6.1%±4.8% vs. 1.9%±1.7% for mobile genes) can be assigned to multiple 107 taxonomies. These results were consistent with more stringent thresholds for Hi-C associations (Extended 108 Data Figure 7). 109 110 Our data increases mobile and AR gene-taxa assignments above those observed using publicly available 111 reference genomes, while focusing on those immediately relevant to the individual patient. We first 112 investigated phage-host associations identified through Hi-C and compared them with those in NCBI, as 113 many phage are host-specific25. Indeed, 43.5% of the phage genes with Hi-C genera-level assignments 114 recapitulate known interactions (Figure 1E). However, broader genera-level associations are obtained for 115 64.2% of the unique phage genes in our database, reflecting apparent selection biases within our reference 116 databases and the promiscuity of certain phage26,27,28. A greater percentage of AR genes with Hi-C genera- 117 level taxonomic assignments, 82.8%, were evident in reference genomes. Yet, Hi-C expands genera-level 118 assignments for 37.6% of the AR genes. Despite having a limited number of reads linking each mobile or 119 AR gene to a particular taxa, our annotations are supported by the fact that Hi-C reads preferentially map 120 near to these genes on the overall contig (Extended Data Figure 8). 121 122 We next sought to determine the extent to which we could capture associations using Hi-C. First, we 123 performed a modified rarefaction analysis to determine whether the number of AR gene-taxa associations 124 and mobile gene-taxa associations saturated with increased sequencing depth of our Hi-C libraries. Most 125 of our samples saturated within our target sequencing depth (roughly 15 million paired reads), and 126 sequencing samples to roughly four-fold this amount did not significantly increase the number of gene- 127 taxa associations (Extended Data Figure 9). The number of contigs that recruited Hi-C reads (on average 128 18.3 ± 10.9%) was not dependent on sequencing depth, yet 88.2% ± 9.5 of our genome bins recruited two 129 or more Hi-C reads, which amounts to 90.7% ± 9.0% of the taxa recruiting reads. This breadth is 130 supported by the congruence of Hi-C libraries and metagenomic libraries (Extended Data Figure 2). We 131 suspect that the variation in recruitment of Hi-C reads across the genome reflects either recurrent bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

132 structural patterning of DNA29, differences in DNA-binding proteins available for cross-linking, and the 133 distribution of restriction enzyme cut sites30. We next measured our ability to detect the same mobile 134 genes across timepoints. If we consider only the AR gene-taxa associations we observe at least once, and 135 we conservatively assume that should continue to observe the gene-taxa association, i.e. that the lack of 136 repeated observation was due to the stochastic sampling process of Hi-C rather than HGT, we repeatedly 137 detect an average of 66% of all possible associations where both the organism and AR genes were 138 detectable in the draft assemblies but were not linked through Hi-C (Extended Data Figure 10). 139 140 Overall, within each person’s microbiome, mobile genes, including AR genes and HGT machinery genes, 141 were distributed across a wide range of taxa (Extended Data Figures 11 and 12). Less than 10% of unique 142 mobile genes and 19% of unique AR genes (clustered at 99% identity) were found across multiple 143 patients, a finding consistent with previous surveys of MGEs across individuals31, indicating limited inter- 144 personal or nosocomial transmission. Furthermore, for these MGEs found across patients, few of their 145 host associations were conserved. We speculate that HGT may result in their dispersal within individual’s 146 gut microbiomes and that selection may affect MGE-taxa associations at the level of individuals32. 147 Despite heavy administration of antibiotics, the abundances of AR genes, even those conferring resistance 148 to administered antibiotics, did not correspond with patient-specific therapeutic courses, a finding 149 consistent with other patient-timecourses of mobile AR genes33, and possibly reflective of the low 150 plasmid-based resistance to levofloxacin or combination antibiotic therapies (Extended Data Figure 13). 151 152 When comparing the networks of HGT within each individual’s gut microbiome, we expected to observe 153 a strong preference for gene exchange between more closely related organisms, as previously observed 154 when comparing exchange networks using reference genomes34. Whether HGT occurs more frequently in 155 an individual’s gut is an essential question to understand the development and maintenance of the 156 reservoir of AR genes in the gut microbiome, yet it has been difficult to answer for technical reasons. 157 Using Hi-C, we find that the spread of AR genes and other mobile genes is significantly higher within an 158 individual’s gut microbiome than between different individuals’ gut microbiomes (Figure 2A, Extended 159 Data Figure 14). Beyond closely related pairs of organisms, there was considerable variation in the 160 networks of shared AR and mobile genes across individuals (Extended Data Figures 15). Despite this, we 161 find that those microbiomes similar in composition shared more of the same connections among the 162 organisms present in both microbiomes (Figure 2B), most notably between the two healthy individuals. 163 164 Given their clinical importance, we focused on the gene-sharing networks of , and more 165 specifically, Enterobacteriaceae. Within all patients, gene exchange was most frequent within members of 166 the same phylum (Figure 2C,D). In neutropenic patients, Proteobacteria shared genes outside their 167 phylum most often with . The main transfer partners with Enterobacteriaceae were different 168 across patients, but notably included both opportunistic pathogens (i.e. Veillonella parvula and 169 Enterococcus faecium), commensals that may flourish post-antibiotic use (i.e. sp.35), 170 and even those organisms that have been considered as probiotic (i.e. Faecalibacterium prausnitzii36 and 171 Roseburia intestinalis37). 172 173 Antibiotic treatment38 and inflammation39 are putative triggers for HGT, through the production of 174 reactive oxygen species and DNA damage. We hypothesized that mucositis caused by cytotoxic 175 chemotherapy, along with the selective pressures imposed by antibiotics and inflammation, would create 176 conditions amenable to HGT in these neutropenic patients. We noticed that the average density of 177 connections (percentage of actual connections of the total possible connections) between taxa and AR or 178 mobile genes is greater in the neutropenic patients than the healthy individuals (Figure 3A). Several 179 patients, B316, B320, B335 and B370, experienced increases in the proportion of overall gene-taxa 180 connections, referred to as network density, during their timecourses. This was unrelated to the abundance 181 of Enterobacteriaceae in the samples, which have been proposed as mediators of HGT40, the total 182 abundance of AR genes, or the number of Hi-C reads (Extended Data Figure 16). Rather, we found that bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

183 the only correlate was the number of taxa in a sample: as patients’ microbiomes became less diverse, the 184 gene-taxa network density increased (Figure 3B). We hypothesize that this is caused either by an 185 undefined selective pressure acting to preserve more connected organisms; or that once selection has 186 occurred, organisms in less diverse populations will have increased contact rates and therefore greater 187 opportunity for transfer. 188 189 Next, we more closely examined those timecourses with putative HGT events for which we had the 190 highest confidence. To distinguish between the migration of new bacterial strains and HGT, we only 191 considered HGT between strains present at the start of the timecourse. Potential donor strains were 192 required to have Hi-C-verified connections with specific mobile or AR genes in the first patient sample. 193 Individuals’ initial Hi-C samples were sequenced 3-4-fold deeper than the remainder of their timecourses 194 to ensure that gene-taxa associations were adequately sampled (Extended Data Figure 9) and that putative 195 recipient strains did not harbor those specific genes of interest at the start. We enforced this by requiring a 196 complete absence of gene-recipient taxa connections inferred by Hi-C or metagenomic assembly, 197 including connections with taxa that could only be annotated at higher taxonomic levels. Finally, we 198 considered HGT as occurring between these donor strains and recipient strains with Hi-C-verified 199 associations with the transferred genes in later timepoints. Providing additional support, 12.2% of the 200 putative transfer events (19 of 155) were supported by Hi-C across multiple timepoints and 32.9% (51 out 201 of 155) were supported by Hi-C links across multiple contigs in both the donor and recipient genomes. 202 Most of the transfers (60%) were between members of the same phylum. Ultimately, evidence of HGT 203 was found in all individuals in our study (Figure 3C, Extended Data Table 6). 204 205 Within these relatively short timecourses, we observed the expansion of the gut commensal reservoir of 206 resistance genes. Although we did not observed the transfer of AR genes conferring resistance 207 specifically to the antibiotics used in this cohort, namely levofloxacin, pipercillin-tazobactam, or 208 trimethoprim-sulfamethoxazole, we did observe transfer of multi-drug resistance cassettes with beta- 209 lactam- and fluoroquinolone-resistance genes, covering two of the corresponding antibiotic classes. 210 Notably, within a few days post-transplant, we see transfer of a plasmid encoding mdtEF, a multidrug 211 efflux pump conferring resistance to fluoroquinolones, and their transcriptional regulators, CRP and 212 gadW, from an Escherichia coli strain in patient B331 to a strain most similar to Bacteroides sp. A1C1. 213 Despite their ubiquitous antibiotic prophylaxis, only a minority (19.4%) of transfer events involved 214 annotated AR genes in the neutropenic patients. 215 216 Additionally, we observe the emergence of novel AR genes in enteric pathogens, originating either from 217 gut commensals or other enteric pathogens, including Enterobacteriaceae. Enterobacteriaceae species are 218 among the most common causes of infection and sepsis in these patients and Enterobacteriaceae from the 219 gut have been shown previously to harbor excessive numbers of AR genes41 and serve to promote HGT of 220 AR genes42. We see the exchange of AR gene-containing plasmids between members of the 221 Enterobacteriaceae, namely Klebsiella pneumoniae and Citrobacter brakii in patient B335, and between 222 E. coli and Klebsiella species in B314, and one instance of K. pneumoniae in patient B335 acquiring 223 DNA harboring a plasmid-based efflux pump from a commensal, Blautia hansenii. We also note the 224 overall transfer of mobile elements between these pathogenic species and other opportunistic pathogens, 225 such as Streptoccocus parasanguinis, S. salivarius, and E. faecium, exposing the potential for HGT to 226 alter the AR profiles of these bacteria over short periods of time. 227 228 These examples highlight the dynamic nature of HGT within the gut ecosystem, especially in the context 229 of gut inflammation, immune dysregulation and antibiotic use. Nevertheless, our method has several 230 limitations. First, we can only assign bacterial hosts for those MGEs and host genomes that we are able to 231 assemble and annotate. Although 95.9% ± 2.8% of our metagenomic reads contribute to assembled 232 contigs and 80.4% ± 10.4% (median 82.4%) of Hi-C reads align to our assemblies, we were only able to 233 annotate 47.8% of our draft assemblies at either the genus- or species-level. Second, the assembly of bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

234 MGEs can be confounded by their high rates of recombination creating multiple genomic arrangements 235 and transfer, and, thus, redundancy within and across genomes. To mitigate the potential for false positive 236 interactions, we examined only those mobile gene-containing contigs with multiple Hi-C reads directly 237 linking them to taxonomically annotated genome assemblies. We cannot however rule out the possibility 238 that our sensitivity is actually higher, and that our inability to detect linkages at specific time-points 239 reflects true strain-level variation within the microbiome, or undetected real-time mobilization of genetic 240 elements. Third, for those HGT events that we observed, we cannot always confirm the transfer of an 241 entire contig and its associated genes. This issue underscores several observed HGT events, involving 242 plasmids comprising prophage and transposable elements. This is mitigated by the requirement for more 243 Hi-C read linkages and the overall proximity between Hi-C read linkages and the inferred transferred 244 genes (Extended Data Figure 7). Future studies should leverage long reads in hybrid assemblers to better 245 capture co-occurring AR genes and large MGEs43. We expect to overcome these limitations with 246 additional technical improvements to the bacterial Hi-C protocol. 247 248 Here, we observe extensive transfer of mobile and AR genes within a single individual’s gut microbiome 249 across distant phylogenetic backgrounds and over relatively short timespans. The transfer networks within 250 each individual’s gut microbiome are unique and are likely explained by personal ecological niches that 251 govern local contact rates between organisms. Few of the total AR gene-taxon associations are observed 252 across individuals, which may suggest limited dispersal rates and/or strong selective pressures that prevail 253 within each individual’s gut. Although the molecular dynamics of HGT in the gut microbiome are not 254 well-understood, our data from healthy subjects point to a basal level of transmission, even in the absence 255 of inflammation or antibiotic use. Many of the observed transfers appeared transient, which may be due to 256 the limited detection of our method, or by the neutral or deleterious nature of most HGT events44,45. 257 258 The ramifications of HGT in this neutropenic patient population are acute. Our results show increased 259 pathogen load and elevated gene-taxa network densities in neutropenic patients as compared with healthy 260 individuals, suggesting an increased risk of emergence of MDR pathogens in this at-risk patient 261 population. How to translate these findings into the prevention of the emergence of MDR pathogens is 262 paramount. This technology highlights the potential for screening the burden of AR genes and the 263 carriage of enteric pathogens to guide empirical antibiotic therapy. These findings also expose the 264 limitations of taxa-specific therapies to remove AR genes from the gut microbiome46,47,48, in favor of 265 mechanisms to limit HGT more generally. Overall, these results emphasize a view of the population-wide 266 dissemination of AR genes that includes diverse members of the gut microbiome. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

267 Figures 268 269 Figure 1. Hi-C can be used to track mobile genetic elements. 270 (A) Neutropenic hematopoetic stem cell transplant (B) recipients’ and healthy (H) individuals’ 271 timecourses included in the study are depicted, with periods of neutropenia (gray) and antibiotic 272 use (green). Black lines indicate time points for which metagenomic and Hi-C libraries were 273 constructed. Red lines indicate gastrointestinal colonization with MDR enteric pathogens. 274 (B) Each Hi-C read pair that maps to two non-mobile contigs is plotted according to the taxonomic 275 assignment of each read. Color depicts the number of reads linking contigs according to 276 . 277 (C) The percent of the total taxa-mobile gene (left) and taxa-AR gene (right) associations observed 278 from metagenomic assembly that are supported by two or more Hi-C links (brown) is plotted, 279 along with the percent additional interactions gained by using Hi-C (red). 280 (D) Stacked barplots showing the number of species-level taxa to which each AR gene (clustered at 281 99% identity) is assigned within each patient, and across patients. Only those genes assigned to 2 282 or more taxa are shown. We either used metagenomic assemblies alone to assign taxonomies 283 (left) or combined with Hi-C libraries considering those taxa-gene assignments with evidence 284 from at least two Hi-C reads. The numbers above each stacked barplot represent the total number 285 of AR genes with 2 or more taxonomic associations. 286 (E) Horizontal stacked bar plots show the percentage of unique phage genes (defined as 95% similar) 287 (above) or AR genes (below) in the metagenomic assemblies and the origins of their taxonomic 288 associations, identified either by BLAST to NCBI’s NT (for phage) or PATRIC’s reference 289 database (for AR genes) or through Hi-C linkages. 290 291 Figure 2. Networks of bacterial HGT are unique to each individuals’ gut microbiomes. 292 (A) HGT rates (per 100 comparisons) of AR genes between organisms within each individual versus 293 between individuals, according to those that share the same species, genus, family, order, class, 294 and phylum are plotted for comparison. Significance was measured with Mann Whitney U-tests 295 (two-sided; *, p<0.05; **, p<0.01; ***, p<0.005, ****, p<0.001;*****, p<0.0005). Boxplot 296 represents the interquartile range where ends of the whiskers represent ±1.5*interquartile range 297 and median value is indicated. 298 (B) For each pair of individuals, the Jaccard distance of their composite microbiome compositions are 299 plotted against the average Jaccard distance of the HGT network connections of mobile genes 300 exchanged between organisms present in both individuals. Points are colored according to the 301 health status of the donors being compared. 302 (C) Network plots showing bacterial AR gene exchange according to phyla within healthy (left) and 303 neutropenic (right) individuals’ microbiomes. n refers to the number of people included in the 304 plot. 305 (D) Network plots showing bacterial mobile gene exchange according to phyla within healthy (left) 306 and neutropenic (right) individuals’ microbiomes. n refers to the number of people included in the 307 plot. 308 309 Figure 3. HGT results in greater network density and occurs on short timescales in some members 310 of our neutropenic cohort compared to healthy individuals. 311 (A) Boxplots showing the gene-taxa network linkage densities, or the proportion of total possible 312 gene-taxa links that are observed to be linked, for each individuals’ samples. A dotted line is 313 shown at the maximum network density observed in the healthy samples. 314 (B) Individual patient samples are plotted according to the alpha diversity, assessed using Metaphlan, 315 and their gene-taxa network density. An ANOVA showed that gene-taxa network density was 316 related to alpha diversity (F(1,39), p = 3.3x10-5) and health status (F(1,6), p=0.01501). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

317 (C) All observed HGT events across different genera are plotted for each individual. Each genus is 318 colored according to its phylum. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

319 Extended Data Figures 320 321 Extended Data Figure 1. Experimental and computational pipeline. 322 Our pipeline for assigning mobile and AR genes to bacterial taxa utilizes data from metagenomic (black) 323 and Hi-C (red) sequencing libraries. In brief, metagenomic samples are assembled using standard 324 approaches. The resulting contigs (circles) are binned into draft assemblies and quality filtered. Bins are 325 taxonomically annotated at the lowest level with >50% of bps assigned to a taxon using a weighted 326 Kraken approach. Contigs containing AR or mobile genes are associated with metagenomic assemblies by 327 residency or by Hi-C linkages requiring at least 2 readpairs linking the contig with the metagenomic 328 assemblies. Associations of mobile or AR genes with specific taxa are made by clustering the genes of 329 interest at 99% identity and counting each unique taxon once. 330 331 Extended Data Figure 2. Congruence between metagenomic and Hi-C sample composition. 332 (A) Class-level compositions of individuals’ gut metagenomes and Hi-C libraries as determined by 333 MetaPhlAn2. The human microbiome sample from Press et al49. was included in our analysis. 334 (B) Dendrogram of metagenome and Hi-C library compositions. Sample compositions (class-level) 335 were hierarchically clustered according to their Bray-Curtis distances. 336 (C) Compositional differences between metagenomic and Hi-C libraries in samples processed 337 according to the restriction enzyme(s) used. 338 339 Extended Data Figure 3. Hi-C performed on a mock community of three organisms each harboring 340 an organism-specific plasmid. 341 Number of Hi-C reads linking genomic regions to themselves (left), to their plasmids (middle) and within 342 each plasmid (right). Blue linkages are correct, whereas brown hues are incorrect associations. Note that 343 there is a region of homology between the plasmid backbones of the RP5 and pKJK5 plasmids carried by 344 Pseudomonas putida and Escherichia coli, respectively. Nevertheless, none of the incorrect host-plasmid 345 linkages would have been surpassed our threshold for assigning gene-taxa associations. 346 347 Extended Data Figure 4. Recruitment of Hi-C read pairs linking separate contigs according to the 348 length and abundance of those contigs. 349 (A) A histogram showing the distribution of abundances (RPKM) of contigs that recruit (purple) or 350 do not recruit (orange) Hi-C contig-connecting read pairs. 351 (B) A histogram showing the distribution of lengths (bp) of contigs that recruit (purple) or do not 352 recruit (orange) Hi-C contig-connecting read pairs. 353 354 Extended Data Figure 5. Genome bins are high-quality in terms of completeness and 355 contamination. 356 (A) Completeness of each assembled genome bins from each patients’ samples as scored by CheckM. 357 Boxplots show 25th, median and 75th percentile. 358 (B) Contamination of each assembled genome bins from each patients’ samples as scored by 359 CheckM. Boxplots show 25th, median and 75th percentile. 360 (C) Length of each assembled genome bins within each patients’ samples. Boxplots show 25th, 361 median and 75th percentile. 362 363 Extended Data Figure 6. Hi-C associates mobile genes with multiple taxa. 364 Stacked barplots showing the number of species-level taxa to which each mobile gene (clustered at 99% 365 identity) is assigned within each patient, and across patients. Only those genes assigned to 2 or more taxa 366 are shown. We either used metagenomic assemblies alone to assign taxonomies (left) or Hi-C libraries 367 considering those taxa-gene assignments with evidence from at least two Hi-C read pairs. The numbers 368 above each stacked barplot represent the total number of mobile genes with 2 or more taxonomic 369 associations. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

370 371 Extended Data Figure 7. Trends in gene-taxa associations are consistent with more stringent cutoffs 372 of Hi-C read linkages. 373 (A) The percent of the total taxa-mobile gene (left) and taxa-AR gene (right) associations observed 374 from metagenomic assembly that are supported by five or more Hi-C links (brown) is plotted, 375 along with the percent additional interactions gained by using Hi-C (red). 376 (B) Stacked barplots showing the number of species-level taxa to which each AR gene (clustered at 377 99% identity) is assigned within each patient, and across patients as determined by 5 or more Hi- 378 C links. Only those genes assigned to 2 or more taxa are shown. We either used metagenomic 379 assemblies alone to assign taxonomies (left) or combined with Hi-C libraries considering those 380 taxa-gene assignments with evidence from at least two Hi-C reads. The numbers above each 381 stacked barplot represent the total number of AR genes with 2 or more taxonomic associations. 382 (C) Stacked barplots showing the number of species-level taxa to which each mobile gene (clustered 383 at 99% identity) is assigned within each patient, and across patients as determined by 5 or more 384 Hi-C links. Only those genes assigned to 2 or more taxa are shown. We either used metagenomic 385 assemblies alone to assign taxonomies (left) or combined with Hi-C libraries considering those 386 taxa-gene assignments with evidence from at least two Hi-C reads. The numbers above each 387 stacked barplot represent the total number of AR genes with 2 or more taxonomic associations. 388 389 Extended Data Figure 8. Hi-C read pairs are situated nearby mobile or AR genes on contigs. 390 (A) As illustrated, Hi-C read pairs may map anywhere on a contig containing a mobile or AR gene. 391 The boundaries of an MGE may be elusive and MGEs may integrate into contigs that have 392 incomplete annotations. We assessed the linear distance (bp) between where Hi-C read pairs 393 aligned and the positions of mobile or AR genes used for taxon-gene associations on the contigs 394 to ensure that Hi-C read pairs were mapping at distances relevant for their assignments. In the 395 example, both Hi-C reads A and B align to the same taxonomically annotated contig, yet read A 396 maps at a minimum distance that is closer to the mobile gene, and therefore more confidently 397 links the mobile gene with the contig. 398 (B) For each taxon-mobile gene connection, we plot the minimum distance between a Hi-C read pair 399 and the start/end of the mobile gene on that contig, according to contig length. The inset shows 400 distances of less than 200,000bp broken down more finely. 401 (C) The same analysis as (B) but for AR genes. 402 403 Extended Data Figure 9. Accumulation of unique gene-species connections with increasing 404 sequencing depth. 405 (A) The number of unique mobile gene-species connections within each sample after subsampling 406 reads from each Hi-C dataset. The first Hi-C sample from each timecourse (noted with an 407 asterisk) was sequenced significantly more deeply than the rest of the timecourse so that we could 408 better assess whether there were any new gene-species connections that arose in any subsequent 409 samples. 410 (B) The number of unique AR gene-species connections within each sample after subsampling reads 411 from each Hi-C dataset. 412 413 Extended Data Figure 10. Repeat detection of gene-taxa associations within the Hi-C libraries. 414 (A) Examples of the detection of AR genes, bacterial hosts and their linkage during patients’ 415 timecourses. The last scenario depicts an instance where a mobile or AR gene is linked with a 416 specific taxon with Hi-C during at least one time point, but is detected in the metagenomic data at 417 other time points but not linked with Hi-C. Although this may be explained by changes in strain- 418 level composition or gene loss, we assessed the repeatability of detecting associations, assuming 419 that these genes are truly linked in any instance when the mobile or AR gene and the bacterial 420 taxon are both present. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

421 (B) A bar chart showing the extent to which we repeatedly detect specific gene-taxa associations 422 observed within each patients’ microbiomes. Assuming that we should observe associations 423 present in one timepoint in all timepoints (i.e. that there is no HGT), we define the true positives 424 (TPs) as the number of unique mobile gene-bacterial taxon connections observed; and the false 425 negatives (FNs) as the total number of instances where both the bacterial taxon and the mobile 426 gene are detected in the metagenomic assemblies. Repeat detection is calculated as 427 TPs/(TPs+FNs), with the caveat that a portion of genome mobile gene-taxon linkages that did not 428 depend on Hi-C sequencing read pairs are included here. 429 (C) A bar chart showing the amount of repeat detection of AR gene-taxon connections observed 430 within each patients’ microbiomes. This was calculated as described in (B), with the same caveat 431 applied. 432 433 Extended Data Figure 11. AR gene-taxa linkages are not common across individuals. 434 A heatmap of taxa-specific assignments, colored by class, for AR genes that are present in three or more 435 patients. 436 437 Extended Data Figure 12. Mobile gene-taxa linkages are not common across individuals. 438 A heatmap of taxa-specific assignments, colored by class, for mobile genes that are present in three or 439 more patients. 440 441 Extended Data Figure 13. Antibiotic resistance gene abundances do not correspond to patients’ 442 antibiotic regimens. 443 For each patient-timecourse (columns), AR gene abundances (RPKM) are plotted for each gene according 444 to the antibiotic to which it confers resistant. The antibiotics administered to each patient over their 445 timecourse is denoted. 446 447 Extended Data Figure 14. Transfer of AR genes between organisms is higher within individuals 448 than across individuals. 449 HGT rates (per 100 comparisons) of AR genes between organisms within each individual versus between 450 individuals, according to those that share the same genus, family, order, class, phylum and are 451 plotted for comparison. Significance was measured with Mann Whitney U-tests (two-sided; *, p<0.05; **, 452 p<0.01; ***, p<0.005, ****, p<0.001;*****, p<0.0005). 453 454 Extended Data Figure 15. Networks of bacterial HGT of AR genes are unique to each individuals’ 455 gut microbiomes. 456 (A) Circle relationship plots showing networks of bacterial mobile gene exchange in the gut 457 microbiomes across individuals (top left) and within each individual. Taxa present in each 458 individual’s microbiome are depicted by black circles. The thickness of the lines corresponds to 459 the number of unique mobile genes associating the two taxa. 460 (B) Circle relationship plots showing networks of bacterial AR gene exchange in the gut microbiomes 461 across individuals (top left) and within each individual. Taxa present in each individual’s 462 microbiome are depicted by black circles. The thickness of the lines corresponds to the number of 463 unique AR genes associating the two taxa. 464 465 Extended Data Figure 16. Gene-taxa linkage density is related to the number of taxa, but not to the 466 relative abundance of Enterobacteriaceae, number of Hi-C reads or abundance of AR genes. 467 Each patient’s timecourse is shown according to their gene-taxa linkage density, as defined by both Hi-C 468 and metagenomic assembly; the number of taxa in that sample’s metagenome (calculated by Metaphlan), 469 the number of total Hi-C reads; and the abundance of AR genes (RPKM). Individuals were ordered 470 according to the trend of their gene-taxa linkage density over their timecourse. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

471 Extended Data Tables 472 473 Extended Data Table 1. Patients and clinical data 474 Patient data, including type of stem cell transplantation, dates of admission, pre-transplant conditioning 475 (radiation, chemotherapy, or mAb therapy), neutropenia, and antibiotic treatment are provided, relative to 476 the date of hematopoietic cell transplantation. 477 478 Extended Data Table 2. Sample quality and information 479 For each patient-time point sample, quality information is provided about the metagenomic and Hi-C 480 libraries. Metrics included are the read depth and assembly metrics of the metagenomes, the number of 481 Hi-C reads, restriction enzymes used in the Hi-C data, percent of inter-contig chimeras. 482 483 Extended Data Table 3. Completeness and contamination of genomic clusters. 484 We clustered contigs using Maxbin, MetaBat, Concoct, and DAS. We ran CheckM on all of the clusters 485 and report completeness, heterogeneity, and contamination. We applied a bp weighted taxonomic 486 annotation of Kraken contig assignments for each bin and annotated lowest taxonomic levels that attained 487 >50% of bin length. 488 489 Extended Data Table 4. Antibiotic resistance mechanisms 490 AR genes were identified using HMMer against the Resfams database, and using CARD’s Resistance 491 Gene Identifier against the CARD database. Due to slight variations in AR gene classification, we 492 combined annotations according to the mechanism specified in this look-up table. We used a slightly finer 493 resolution categorization for the subset of clinically relevant genes. 494 495 Extended Data Table 5. Mobile PFAMs included in this study. 496 A table of the PFAM IDs that were used, according to search terms described in the Methods, and how 497 they were annotated according to MGE (i.e. plasmid, phage, transposon, mixed). 498 Extended Data Table 6. Putative HGT events during individuals’ timecourses. 499 For each patient, we list the observed HGT events that meet the inclusion criteria outlined in the Methods. 500 We include the full taxonomies of the donor and recipient taxa, the gene IDs and gene names involved in 501 the transfer, whether specific genes were found to be transferred across multiple timepoints, whether there 502 was any evidence of transfer between two taxa across multiple timepoints, and whether multiple contigs 503 were associated with transfer between two taxa at a single timepoint. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

504 Methods 505 506 Sample collection 507 Fresh stool was collected from individuals in accordance with IACUC protocol for Weill Cornell Medical 508 College (#1504016114) and Cornell University (#1609006586). Neutropenic patients were all admitted to 509 the Bone Marrow Transplantation Unit at New York Presbyterian Hospital/Weill Cornell Medicine 510 between December 2016 and July 2017. Healthy samples were collected similarly in 2019. 511 Approximately 0.25 g replicates of each time-point were either frozen ‘as is’ (for metagenomic 512 sequencing) or homogenized in phosphate-buffered saline (PBS) + 20% glycerol before freezing (used for 513 Hi-C sequencing). 514 Metagenomic sequencing 515 Frozen stool was thawed on ice and DNA was extracted using the PowerSoil DNA Isolation Kit (Qiagen) 516 with additional Proteinase K treatment and freeze/thaw cycles recommended by the manufacturer for 517 difficult-to-lyse cells. Extractions were further purified using 1.8 volumes of Agencourt AMPure XP bead 518 solution (Beckman Coulter). DNA was diluted to 0.2 ng/uL in nuclease-free water and processed for 519 sequencing using the Nextera XT DNA Library Prep Kit (Illumina). 520 521 Proximity Ligation 522 Stool stored in PBS + 20% glycerol was thawed on ice for 15 minutes and homogenized in 5 mL PBS 523 containing 4% v/v formaldehyde. Sample were crosslinked at room temperature with continuous 524 inversion for 30 minutes, then incubated on ice for 30 minutes. Unreacted formaldehyde was quenched 525 by adding glycine to a final concentration of 0.15 M and incubating for 10 minutes on ice. Crosslinked 526 cell mixtures were pelleted (10,000 g, 4° C, 5 min.), the supernatant was removed, and pellets were flash- 527 frozen on dry ice/ethanol and stored at -80° C. 528 529 Frozen crosslinked stool cell pellets were thawed on ice then resuspended in 450 μL TES (10 mM Tris, 1 530 mM EDTA, 100 mM NaCl, pH 7.5) and transferred to 2 mL screw-cap tubes. 50 μL freshly prepared 531 Lysozyme solution (20 mg/mL in TES, Amresco lyophilized powder, 23500 U/mg) was added to each 532 resuspended pellet and incubated at room temperature for 15 minutes with continuous inversion. Sodium 533 dodecyl sulfate (SDS) was added to a final concentration of 0.5% w/v and samples were incubated at 534 room temperature for 10 minutes with continuous inversion. Samples were pelleted and the volume was 535 reduced to 400 μL. 50 μL 10X Lysis Buffer (100 mM Tris pH 7.5, 100 mM NaCl, 1% IGEPAL CA-630 536 v/v) was added to each sample, followed by 50 μL freshly prepared 10X protease inhibitor (Roche 537 cOmplete mini EDTA-free tablets). Cells were resuspended by pipetting and incubated on ice for 15 538 minutes. Manual lysis of cells was carried out by adding 400 μL 0.5 mm sterile glass beads to each tube 539 and vortexing at maximum Hz for 30 seconds, followed by 30 seconds incubation on ice. Vortexing and 540 ice incubation was repeated for 10 cycles. Bead-beaten samples were allowed to settle upright on ice for 541 15 minutes, then the liquid supernatant (~250 μL) was transferred to a new 1.5 mL tube. Sample volume 542 was equilibrated to 500 μL with cold 2X NEBuffer 1.1 and incubated at 50° C for 10 minutes. After 543 incubation, 30 μL 10% Triton X-100 v/v was added to each tube, mixed by inversion. Cross-linked DNA 544 fragments were digested overnight with 50 U Sau3AI. Digested DNA complexes were pelleted (20,000 545 g, 4° C, 5 min.) , gently washed with cold 1X NEBuffer 2, and resuspended in 200 μL NEBuffer 2. 546 547 Digested DNA was heated to 50° C for 5 minutes to melt paired sticky ends then put into a 200 μL 548 Klenow fragment (exo-, NEB) fill-in reaction containing 36 μM biotin-14-dCTP (Thermo Fisher) and 549 equimolar amounts of dATP, dTTP, and dGTP. Reactions were carried out for 2 hours at room 550 temperature and the polymerase was quenched by adding EDTA to a final concentration of 10 mM. The 551 full volume of each fill-in reaction was put into a dilute blunt-end ligation reaction (640 U T4 DNA 552 Ligase, NEB) and allowed to incubate overnight at 15° C. Protein and crosslink digestion was carried out 553 by adding 50 μL freshly prepared 20 mg/mL Proteinase K (VWR, freeze-dried powder suspended in 10 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

554 mM Tris, 1 mM MgCl2, 50% glycerol, pH 7.5) and incubating at 65° C for 6 hours. This digestion was 555 repeated once. Protein was removed by phenol:chloroform extraction and ligated DNA was precipitated 556 from the aqueous fraction with one volume 5M ammonium acetate and 4 volumes cold absolute ethanol. 557 Clean DNA was quantified, and at least 1 μg but no more than 5 μg DNA was put into an end-resection 558 reaction (5 U T4 DNA Polymerase, NEB) to remove biotin from unligated ends. Exonuclease activity of 559 the polymerase was quenched with 5 mM EDTA and free biotinylated nucleotides were removed via 1.8X 560 Ampure XP bead cleanup. Biotinylated DNA was immobilized on M280 streptavidin beads using the 561 Invitrogen kilobaseBINDER Kit. Bead-bound DNA was quantified and prepared for sequencing using 562 Illumina’s Nextera XT kit. Multiplexed libraries were size-selected with Ampure XP beads, quantified, 563 and pooled for sequencing on an Illumina NextSeq 2x150 paired-end platform. 564 565 Mock Community Methods 566 Bacillus subtilis containing pDR244, Pseudomonas putida containing pKJK5, and Escherichia coli 567 containing RP4 were cultured in LB under antibiotic selection to maintain plasmids (spectinomycin, 568 tetracycline, and kanamycin, respectively). Overnight cultures were washed with PBS, resuspended in 569 PBS + 20% glycerol v/v, and frozen as aliquots, with one aliquot of each retained for titer determination 570 on selective agar media. To create the mock community, 5×108 colony forming units from each frozen 571 stock was thawed and combined, and immediately carried through formaldehyde crosslinking as 572 described for stool. Mock community Hi-C sequences were mapped with HiC-Pro against reference 573 genomes and plasmids using default settings. Valid pairs, i.e. those that map to different restriction 574 fragments, were compartmentalized into groups based on whether or not they connected the genome- 575 genome, genome-plasmid, or plasmid-plasmid and coded according to the expected plasmid-host 576 relationship. 577 578 Quality Filtering and Assembly 579 Metagenomic and Hi-C sequences were quality filtered using Prinseq50 v0.20.2 to derepelicate, 580 Bmtagger51 (v2/21/14) to remove human reads, and Trimmomatic52 v0.36 to remove adapters and quality 581 filter reads (using settings: Leading:3, Trailing:3, Slidingwindow 4:15, Minlen: 50). Metagenomic reads 582 were assembled using SPAdes53v3.13.2 with ‘-meta’ setting with a minimum contig size of 1,000bp. 583 Genes on these contigs were called using Prodigal54 v2.6.3. PhageFinder55 v2.1 was used to identify large 584 prophage regions and were excised from the first to the last phage gene called and considered separate 585 contigs, unless the surrounding regions were less than 1,000bp, in which case they were also included as 586 the excised phage. 587 Metagenomic Binning 588 Contigs were binned using several tools (Maxbin56, MetaBat57, and Concoct58), culminating with a 589 metagenomic binning aggregation strategy, DAS Tool59, we assessed genome contamination using 590 CheckM and removed bins with contamination >10%, resulting in quality metagenomic bins although in 591 many cases partial bins. To prevent overcalling of partial bins, downstream analyses aggregate at the 592 taxon rather than individually calling unique bins. 593 594 Taxonomic Identification 595 Kraken was applied to each metagenomic bin and annotated each contig individually using its algorithm. 596 Then we assigned each bin the lowest taxonomic level at which more then 50% of the bin was assigned 597 by Kraken with contigs weighted by length (bp). Contigs assigned Eukaryotic taxonomies were removed 598 from further analysis. 599 Antibiotic Resistance Genes 600 All contigs were annotated with CARD’s (Comprehensive Antibiotic Resistance Database) Resistance 601 Gene Identifier (RGI)60 3.2.1 against the CARD61 database and with HMMer62 against the Resfams63 602 database with the gathering cutoff. AR genes were clustered using CD-HIT-EST64 (identity:0.99; word bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

603 size:8; length difference cutoff: 0.9) after they were sorted by length. Antibiotic resistance mechanisms 604 are defined in Extended Data Table 4. We focused on AR genes that are commonly harbored by the most 605 problematic MDR bacteria65 and that confer resistance to antibiotics that are most frequently relied on in 606 neutropenic patients: Drug Resistance determinant Oxacillin66 mecA, mecC Penicillin67,68,69 pbp2b, pbp2x 70 Ampicillin blaTEM, blaSHV 71,72 Cephalosporin blaTEM, blaSHV, blaCTX-M, blaCMY, blaMIR, blaMOX, blaLAT, blaFOX, blaDHA, blaACT, blaCFE 73 Carbapenem blaKPC, blaNDM, blaVIM, blaIMP, blaOXA-48, blaOXA-23, oprD Fluoroquinolones74 gyrA, gyrB, parC, parE, qnrA, qnrS (Note: we considered any qnr gene.) Aminoglycosides75 aac(3’), aac(6’), aad 607 Genes in bold above represent those genes that were identified in our cohort’s microbiomes. 608 Mobile Genetic Element annotation 609 All contigs and excised prophage contigs were assessed for the presence of mobile genes using several 610 programs. Contigs were mapped using BLASTN to PlasmidFinder76 database (best hit, minimum 80% 611 identity and 60% coverage), NCBI’s genomic plasmids downloaded (05/10/2017) (best hit, minimum 612 1000bp, minimum 80% identity), and IMMEdb77 (best hit, min 1,000 bp and 80% identity). Contigs were 613 also identified as plasmids using PlasFlow78 with threshold of 0.95. Genes were mapped using BLASTP 614 to ACLAME79 database v0.4 (besthit, min 80% identity and 60% coverage) and PHASTER80 615 prophage/virus database (v8/3/17) (best hit, min 80% identity and 60% coverage). Genes were mapped 616 using HMMER81 v3.1b2 to Pfam82 and known plasmid, phage, and transposons were identified8384. A 617 search of common mobile gene terms against Pfam descriptions was carried out. Terms included for 618 transposon: transpos, insertion element, is element, IS[0-9]; phage: phage, tail protein, tegument, capsid, 619 relaxase, tail fibre, tail assembly, tail sheath, tail tube; plasmid: conjug, Trb, type IV, Tra[A-Z], mob, 620 Vir[A-Z][0-9], t4ss, resolvase, plasmid; other: integrase. All Pfam IDs and descriptions are listed in the 621 Extended Data Table 5. Contigs were also annotated for insertion sequence (IS) elements using ISEScan85 622 v1.5.4. Contigs with taxonomies assigned to the ‘Virus’ were considered phage. Contigs with any 623 mobile annotation were annotated as MGEs. 624 Sequence Mapping 625 Paired-end metagenomic sequences were mapped to the metagenomic contigs using BWA-MEM86 626 v0.7.13 requiring primary only alignments and filtered at 90% identity. Paired-end metagenomic 627 sequences were also mapped separately to the AR and mobile gene clusters and filtered at 99% identity. 628 Contig and individual AR and mobile gene RPKM values were calculated using mapped metagenomic 629 reads (total reads mapped to the contigs with >80% contig coverage, divided by the length of the contigs 630 per kilobase and the total read count in that sample per million). Hi-C reads were mapped with HiC-Pro 631 using default parameterswhich internally uses Bowtie2. HicPro requires valid pairs to map to different 632 restriction fragments and allows only unique mapping of reads. 633 Cleanliness comparisons 634 Reads mapping between two different contigs were included in the analysis if neigher contig carried a 635 mobile gene. Taxonomic associations were determined from residency in a metagenomic bin annotated to 636 at least the taxonomic level of interest. 637 Mobile and AR gene associations 638 All mobile and AR gene-containing contigs, including excised phage, were associated with taxa if they 639 were linked to a Hi-C clustered genomic contig with at least two Hi-C read pairs or if they were clustered bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

640 into an annotated genomic bin. Hi-C linkages between MGEs and their genomic bins are more robust if 641 Hi-C reads map more closely (i.e. smaller linear distance (bp)) with the genes that are annotated as 642 mobile. To assess this, we calculated the genetic distance between the mobile or AR gene and the nearest 643 Hi-C read linking any contig with a particular taxonomy. Often this resulted in multiple linkages between 644 the mobile contig and taxonomic contigs clustered with the same taxa. We therefore assessed the 645 strongest data linking the two, the minimum genetic distance, considering the other reads as further 646 support for this gene-taxon assignment. 647 Comparison of HGT between individual taxa 648 First, we compared HGT observed between species (as shown in Figure 2B,C and Extended Data Figure 649 8), defined above, through Hi-C read pair linkages. To create an HGT network, we examined the number 650 of unique (defined as 99% sequence identity) AR or mobile gene linked to genomic bins for each 651 particular taxa. Consequently we could identify taxa-taxa connections based on these identified gene 652 sharing events. 653 We assessed the rate of HGT per 100 species-species comparisons at different taxonomic levels within 654 and between patients, as a comparison with Smillie et al. (2011)87. For comparisons between species, we 655 compared each species within a single genus to one another. For every other taxonomic level, we 656 compared species that differed according to that taxonomic level (i.e. for comparisons between families, 657 species of one family, e.g. the Enterobacteriaceae, were compared exclusively with species in other 658 taxonomic families). When comparing two species, we considered HGT events as those taxa sharing at 659 least one gene of interest (AR or mobile gene) at >99% identity. We compared HGT within each patient 660 or performed pairwise comparisons between the 9 individuals. For each taxonomic level, we compared 661 within vs. between patients using a Mann-Whitney U-test. 662 AR gene and phage machinery gene host specificity 663 Genes of interest associated with taxa through Hi-C alone (i.e. not including taxa originally assigned to a 664 contig that contained that AR gene or phage machinery gene) were compared to taxonomies identified by 665 comparison using BLASTN (e-value < 1e-100) to PATRIC88 genome database (downloaded October 1, 666 2018) for the AR genes or compared to NCBI nt database (downloaded June 4, 2018) for the phage 667 machinery genes and placed into one of several categories defined in Figure 1. This was assessed at 668 different taxonomic levels. 669 Network density 670 Network density was calculated by dividing the number of observed connections between mobile or AR 671 genes and binned organisms in each sample out of the theoretical maximum number of connections 672 (number of AR genes or mobile genes multiplied by number of distinct organisms). Number of total 673 species in a population were identified from MetaPhlan. 674 675 Measuring novel HGT during individuals’ timecourses 676 Within an individuals’ timecourse, we identified novel HGT events by comparing the first timepoint to 677 subsequent timepoints and requiring that the gene-taxa connection met several criteria. HGT events were 678 only considered between donor organisms strongly linked with a mobile or mobile AR gene at the start of 679 the timecourse and recipient organisms present, albeit unlinked to the mobile or mobile AR gene, at the 680 start of the timecourse. We sequenced the initial timepoint 3-4 times more deeply than the remainder of 681 the timecourse to be able to distinguish between migration of strains and HGT. Mobile or AR gene- 682 containing contigs were required to be linked via at least 2 Hi-C reads (mean = 28.6) to genome 683 assemblies that were taxonomically annotated at the level of genus or species. We required an absence of 684 association between the mobile or AR gene of interest and potential recipient taxa. In other words, one 685 Hi-C read was sufficient to disqualify a putative HGT event, as was any taxonomic marker on that contig 686 associating it with a congruent recipient taxon, or any association with a genome assembly with any 687 congruent higher-order taxonomy. We required recipient taxa to have at least 2 Hi-C reads (mean = 14.8) bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

688 associating each mobile or AR gene-containing contig with the recipient genome assembly. We tallied the 689 number of HGT events that were supported by more than one timepoint, both strictly by the same genes 690 and also by the same taxa; as well as those that were supported across multiple contigs. 691 Data Availability 692 Metagenomic and Hi-C sequences, filtered for quality and human-reads will be uploaded to NCBI’s Short 693 Read Archive. Code relies heavily on published packages listed above. Custom code available upon 694 request. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

695 Acknowledgements 696 This study was funded by the Centers for Disease Control (OADS BAA 2016-N-17812) and by the 697 National Sciences Foundation (Awards #1661338 and #1650122). M.J.S. is funded by the NIAID (K23 698 AI114994). I.L.B. is funded by the NIH (1DP2HL141007-01) and is a Sloan Foundation Research 699 Fellow, a Packard Fellowship in Science and Engineering, and a Pew Foundation Biomedical Scholar. 700 Albert Vill is a Cornell University Centers for Vertebrate Genomics 2019 Distinguished Scholar. 701 702 References 703

1. Li, J. et al. Antibiotic Treatment Drives the Diversification of the Human Gut Resistome. Genomics Proteomics Bioinformatics 17, 39–51 (2019). 2. Kaminski, J. et al. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLoS Comput. Biol. 11, e1004557 (2015). 3. Sommer, M. O. A., Dantas, G., Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science. 325, 1128-1131 (2009). 4. Satlin, M. J. & Walsh, T. J. Multidrug-resistant Enterobacteriaceae, Pseudomonas aeruginosa, and vancomycin-resistant Enterococcus: Three major threats to hematopoietic stem cell transplant recipients. Transpl Infect Dis 19, (2017). 5. Huddleston, J. R. Horizontal gene transfer in the human gastrointestinal tract: potential spread of antibiotic resistance genes. Infect Drug Resist. 7, 167-176. (2014). 6. Satlin, M. J. & Walsh, T. J. Multidrug-resistant Enterobacteriaceae, Pseudomonas aeruginosa, and vancomycin-resistant Enterococcus: Three major threats to hematopoietic stem cell transplant recipients. Transpl Infect Dis 19, (2017). 7. Sommer, M. O. A., Dantas, G., Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science. 325, 1128-1131 (2009). 8. Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J 9, 207–216 (2015). 9. Kaminski, J. et al. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLoS Comput. Biol. 11, e1004557 (2015). 10. Yaffe, E. & Relman, D. A. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat Microbiol (2019) doi:10.1038/s41564-019-0625-0. 11. Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife 3, e03318 (2014). 12. Burton, J. N., Liachko, I., Dunham, M. J. & Shendure, J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4, 1339–1346 (2014). 13. Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2, e415 (2014). 14. Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife 3, e03318 (2014). 15. Stewart, R. D., et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nature Comm 9, 870 (2018). 16. Bickhart, D. M. et al. Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biology 20, 153 (2019). 17. Stalder, T., Press, M. O., Sullivan, S., Liachko, I. & Top, E. M. Linking the resistome and plasmidome to the microbiome. The ISME Journal 13, 2437–2446 (2019).

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18. Marbouty, M., Baudry, L., Cournac, A. & Koszul, R. Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci Adv 3, e1602105 (2017). 19. Satlin, M. J. & Walsh, T. J. Multidrug-resistant Enterobacteriaceae, Pseudomonas aeruginosa, and vancomycin-resistant Enterococcus: Three major threats to hematopoietic stem cell transplant recipients. Transpl Infect Dis 19, (2017) 20. Pop, M. Genome assembly reborn: recent computational challenges. Brief Bioinform. 10, 354-366 (2009). 21. Krawczyk, P. S., Lipinski, L., Dziembowski, A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res. 46, e35 (2018). Apr 6; 46(6): e35. 22. Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016). 23. Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 3, 836-843. (2018) 24. Pop, M. Genome assembly reborn: recent computational challenges. Brief Bioinform. 10, 354-366 (2009). 25. Ross, A., Ward, S. & Hyman, P. More Is Better: Selecting for Broad Host Range Bacteriophages. Front. Microbiol. 7, (2016). 26. Yu, J., Lim, J.-A., Kwak, S.-J., Park, J.-H. & Chang, H.-J. Comparative genomic analysis of novel bacteriophages infecting Vibrio parahaemolyticus isolated from western and southern coastal areas of Korea. Arch. Virol. 163, 1337–1343 (2018). 27. Doulatov, S. et al. Tropism switching in Bordetella bacteriophage defines a family of diversity- generating retroelements. Nature 431, 476–481 (2004). 28. Ross, A., Ward, S. & Hyman, P. More Is Better: Selecting for Broad Host Range Bacteriophages. Front Microbiol 7, 1352 (2016). 29. Crémazy, F. G. et al. Determination of the 3D genome organization of bacteria using Hi-C. Methods Mol Biol. 1837, 3-18 (2018). 30. Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife 3, e03318 (2014). 31. Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016). 32. Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016). 33. Yassour, M. et al. Natural history of the infant gut microbiome and impact of antibiotic treatments on strain-level diversity and stability. Sci Transl Med 8, 343ra81 (2016). 34. Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241–244 (2011). 35. Kaakoush, N. O. Insights into the Role of Erysipelotrichaceae in the Human Host. Front Cell Infect Microbiol. 5, 84 (2015). 36. Rossi, O. et al. Faecalibacterium prausnitzii A2-165 has a high capacity to induce IL-10 in human and murine dendritic cells and modulates T cell responses. Scientific Reports 6, 18507 (2016). 37. Zhu, C. et al. Roseburia intestinalis inhibits interleukin-17 excretion and promotes regulatory T cells differentiation in colitis. Mol Med Rep. 17, 7567–7574 (2018). 38. Modi, S. R., Lee, H. H., Spina, C. S. & Collins, J. J. Antibiotic treatment expands the resistance reservoir and ecological network of the phage metagenome. Nature 499, 219–222 (2013). 39. Diard, M. et al. Inflammation boosts bacteriophage transfer between Salmonella spp. Science 355, 1211–1215 (2017). 40. Bakkeren, E. et al. Salmonella persisters promote the spread of antibiotic resistance plasmids in the gut. Nature 573, 276–280 (2019).

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

41. Sommer, M. O. A., Dantas, G. & Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128–1131 (2009). 42. Bakkeren, E. et al. Salmonella persisters promote the spread of antibiotic resistance plasmids in the gut. Nature 573, 276–280 (2019). 43. Bertrand, D. et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37, 937–944 (2019). 44. Knöppel, A., Lind, P. A., Lustig, U., Näsvall, J. & Andersson, D. I. Minor fitness costs in an experimental model of horizontal gene transfer in bacteria. Mol. Biol. Evol. 31, 1220–1227 (2014). 45. McCarthy, A. J. et al. Extensive horizontal gene transfer during Staphylococcus aureus co- colonization in vivo. Genome Biol Evol 6, 2697–2708 (2014). 46. Citorik, R. J., Mimee, M., & Lu, T. K. Sequence-specific antimicrobials using efficiently delivered RNA-guided nucleases. Nature Biotechnology. 32, 1141-1145 (2014). 47. Vercoe, R. B. et al. Cytotoxic chromosomal targeting by CRISPR/Cas systems can reshape bacterial genomes and expel or remodel pathogenicity islands. PLoS Genet. 9, e1003454 (2013). 48. Yosef, I., Manor, M., Kiro, R., & Qimron U. Temperate and lytic bacteriophages programmed to sensitize and kill antibiotic-resistant bacteria. Proc Natl Acad Sci U S A. 112, 7267-72 (2015). 49. Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions. bioRxiv. doi:10.1101/1987713v1. (2017)

Extended Data References

50. Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011). 51. Rotmistrovsky, K. & Agarwala, R. BMTagger: Best Match Tagger for removing human reads from metagenomics datasets. Unpublished (2011). 52. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). 53. Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017). 54. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). 55. Fouts, D. E. Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res 34, 5839–5851 (2006). 56. Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016). 57. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019). 58. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat Methods 11, 1144–1146 (2014). 59 Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 3, 836-843. (2018) 60. Jia, B. et al. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res 45, D566–D573 (2017). 61. McArthur, A. G. et al. The comprehensive antibiotic resistance database. Antimicrob. Agents Chemother. 57, 3348–3357 (2013). 62. Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A. & Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121 (2013). 63. Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J 9, 207–216 (2015).

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

64. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006). 65. Centers for Disease Control. Antibiotic Resistance Threats in the United States. Atlanta, GA (2013) 66. Lakhundi, S. & Zhang, K. Methicillin-Resistant Staphylococcus aureus: Molecular Characterization, Evolution, and Epidemiology. Clin Microbiol Rev. 31, e00020-18 (2018). 67. Magill, S. S. et al. Prevalence of antimicrobial use in US acute care hospitals, May-September 2011. JAMA 312, 1438-1446 (2014). 68. Dowson, C. G. et al. Penicillin-resistant viridans streptococci have obtained altered penicillin- binding protein genes from penicillin-resistant strains of Streptococcus pneumoniae. Proc Natl Acad Sci U S A. 87, 5858-5862 (1990). 69. van der Linden, M. et al. Insight into the Diversity of Penicillin-Binding Protein 2x Alleles and Mutations in Viridans Streptococci. Antimicrob Agents Chemother. 61, e02646-16 (2017). 70. Paterson, D. L. & Bonomo, R. A. Extended-Spectrum β-Lactamases: a Clinical Update. . Clinical Microbiology Reviews. 18, 657-686. (2005). 71. Paterson, D. L. & Bonomo, R. A. Extended-spectrum beta-lactamases: a clinical update. Clin Microbiol Rev. 18, 657-686. (2005). 72. Strahilevitz, J., Jacoby, G. A., Hooper, D. C., Robicsek, A. Plasmid-mediated quinolone resistance: a multifaceted threat. Clin Microbiol Rev. 22, 664-689 (2009). 73. Queenan, A. M., & Bush, K. Carbapenemases: the Versatile β-Lactamases. Clinical Microbiology Reviews. 20, 440-458 (2007). 74. Hooper D., C., & Jacoby, G., A.Topoisomerase Inhibitors: Fluoroquinolone Mechanisms of Action and Resistance.Cold Spring Harb Perspect Med. 6, a025320 (2016). 75. Doi, Y., Wachino, J., I., Arakawa, Y. Aminoglycoside Resistance: The Emergence of Acquired 16S Ribosomal RNA Methyltransferases. Infect Dis Clin North Am. 30, 523-537 (2016). 76. Carattoli, A. et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob. Agents Chemother. 58, 3895–3903 (2014). 77. Jiang, X., Hall, A. B., Xavier, R. J., & Alm, E. J. Comprehensive analysis of mobile genetic elements in the gut microbiome reveals phylum-level niche-adaptive gene pools BioRxiv. doi:10.1101/214213, (2017) 78. Krawczyk, P. S., Lipinski, L. & Dziembowski, A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res. 46, e35 (2018). 79. Leplae, R., Lima-Mendez, G. & Toussaint, A. ACLAME: a CLAssification of Mobile genetic Elements, update 2010. Nucleic Acids Res. 38, D57-61 (2010). 80. Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, W16-21 (2016). 81. Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A. & Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121 (2013). 82. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279-285 (2016). 83. Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nature Methods 14, 1063–1071 (2017). 84. Schlüter, A., Krause, L., Szczepanowski, R., Goesmann, A. & Pühler, A. Genetic diversity and composition of a plasmid metagenome from a wastewater treatment plant. Journal of Biotechnology 136, 65–76 (2008). 85. Xie, Z. & Tang, H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 33, 3340–3347 (2017). 86 .Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. (2009).

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

87. Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241–244 (2011). 88. Wattam, A. R. et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 42, D581-591 (2014). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1

A C E B314 # of phage machinery genes Mobile genetic elements Antibiotic resistance genes B316 75 0 3,000 6,000 9,000 12,000 MG and Hi-C sequencing B320 50 Species Genus B331 Levooxacin 25 Family B335 Piperacillin-Tazobactam Percent of novel gene-taxa associations found by Hi-C 0 Order Class Trimethoprim- 25 B357 Sulfamethoxazole Phylum 50 level Taxonomic Kingdom B370 Period of neutropenia Individuals 75

-41 -5 0 5 10 15 Ceftriaxone-resistant Percent of 100 # of AR genes Days Post−Transplant Enterobacteriaceae 0 500 1,000 1,500 2,000 Ciprooxacin-resistant H3 H8 H3 H8 Species B314 B316 B320 B331 B335 B357 B370 B314 B316 B320 B331 B335 B357 B370 H3 conrmed by Hi-C Enterobacteriaceae Genus Novel link found by Hi-C

H8 gene-taxa associations Family Hi-C-conrmed taxa-gene link Order 0 5 10 15 20 Link found only through assembly and binning Class Timecourse Phylum B D l ev el Taxonomic Kingdom Hi-C Read 2 Assembly Assembly & Hi-C 100 87 87 86 Multiple database taxa−multiple Hi-C taxa−Overlapping 1 50,000

72 Multiple database taxa−multiple Hi-C taxa−Same overlap Number of reads 67 * Multiple database taxa−one Hi-C taxon−Overlapping

Phylum 56 One database taxon−multiple Hi-C taxa−Overlapping

Euryarchaeota * One database taxon−one Hi-C taxon−Overlapping

Actinobacteria * Multiple database taxa−multiple Hi-C taxa−Non-overlapping * Multiple database taxa−one Hi-C taxon−Non-overlapping

Bacteroidetes in ≥ 2 taxa 24 27 27

TM7 14 * One database taxon−multiple Hi-C taxa−Non-overlapping

Hi-C Read 1 Firmicutes 11 7 8 * One database taxon−one Hi-C taxon−Non-overlapping 4 4 * 1 3 No database taxa−multiple Hi-C taxa 0 0

No database taxa−one Hi-C taxon Proteobacteria # of antibiotic resistance genes H3 H8 H3 H8 * Unannotated database or Hi-C taxa at this level

Verrucomicrobia B314 B316 B320 B331 B335 B357 B370 B314 B316 B320 B331 B335 B357 B370 Denotes categories where Hi-C identied additional taxa Number of taxa

2 15 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2

A Antibiotic resistance genes B

100 * Within Patient 1 Between Patients R=0.80041 p-value < 10 75 0.9

** 50 * 0.8

25 Healthy-Healthy *** Healthy-Neutropenic Neutropenic-Neutropenic

HGT rate (per 100 comparisons) 0.7 0.6 0.7 0.8 0.9 1 Jaccard distance of mobile gene network 0 two individuals’ microbiome communities Between Between Between Between Between Between Jaccard distance between Species Genera Families Orders Classes Phyla connections among overlapping species between individuals’ microbiome communities Taxonomic Levels

C Antibiotic resistance genes D Mobile genes

Healthy Neutropenic Healthy Neutropenic Phyla max: 28 Bac max: 95.5 Bac max: 28 Bac max: 4904 Bac Bacteroides (Bac)

n=2 n=7 Sac n=2 n=7 Sac Act Act Act Act (Act)

Euryarcheaota (Eur) Fir Fir Fir Fir

(Ver) Eur Eur Synergistetes (Syn) Proteobacteria (Pro) Fuso Fuso Fuso Fuso

Fusobacteria (Fuso) Ver Ver Ver Ver Firmicutes (Fir)

Pro Pro Pro Pro Saccharibacteria (Sac) Syn Syn bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 3 available under aCC-BY-NC-ND 4.0 International license.

A B Individual 0.08 0.075 B314 B316 0.06 B320 0.050 B331 0.04 B335 B357 0.025 0.02 B370 H3 Gene-taxa network density Gene-taxa network density H8 0 H8 H3 40 80 120 160 B314 B320 B331 B357 B370 B316 B335 Alpha diversity Individual C

B314 B320 Streptococcus B335 Proteus Bacteroides Alistipes Streptococcus Lactobacillus Erysipelotrichaceae Enterococcus Parabacteroides Enterococcus Erysipelotrichaceae Veillonella Streptococcus Bacteroides Blautia Klebsiella Klebsiella Anaerobutyricum Blautia Enterococcus Escherichia Klebsiella Prevotella Klebsiella Citrobacter Hafnia Anaerostipes Faecalibacterium Eryspipelotrichaceae Schaalia Eubacterium B316 Ruminococcus Alistipes Faecalibacterium B357 Barnesiella Faecalibacterium Bacteroides Alistipes B331 Ruminococcus Paraprevotella Bacteroides Anaerobutyricum Bacteroides Parabacteroides Parabacteroides Collinsella Bacteroides Flavonifractor Bacteroides Paraprevotella Faecalibacterium Prevotella Prevotella Barnesiella Blautia Parabacteroides Blautia Lachnoclostridium Roseburia Lactobacillus Alistipes Streptococcus Parabacteroides Roseburia Streptococcus Ruthenibacterium Escherichia Bacteroides Collinsella H8 Blautia H3 Collinsella Eggerthellaceae Ruminococcus Anaerostipes Bacteroides Faecalibacterium Alistipes Alistipes Bacteroides Alistipes Bacteroides B370 Ruminococcus Bi dobacterium Coprococcus Bacteroides Faecalibacterium Parabacteroides Faecalibacterium Roseburia Ruminococcus Bacteroides Odoribacter Coprococcus Anaerobutyricum Ruminococcus Coprococcus Christensenella

Donor taxa Recipient taxa Phylum/Class Bacteroides Actinobacteria Proteobacteria Enterobacteriaceae Firmicutes bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Figure 1

Sample collection

Large integrated mobile elements are separated from contigs

Metagenomic sequencing & Hi-C Sequencing Antibiotic resistance genes

QC & de novo assembly

Identi cation of open reading frames & annotation

Draft assemblies Excising of MGEs from contigs

Binning of scaoldsand aggregation of bins into draft genome assemblies Unbinned contigs

Unbinning of low-quality genome assemblies

Alignment of Hi-C reads to contigs

Hi-C read pairs

Association of MGEs to unique bacterial taxa bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Figure 2

A B314 B316 B320 B331 B335 B357 B370 t=1 t=2 t=3 t=4 t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=1 t=2 t=3 t=4 t=1 t=2 t=3 t=4 t=1 t=2 t=3 t=1 t=2 t=3 t=4 t=5 t=6 t=1 t=2 t=3 t=4 t=5 100

75

50

25 Class Relative Abundance 0 MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG MG Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C

Press et al. H3 H8 MlucI & Sau3AI t=1 t=2 t=3 t=4 t=5 t=1 t=2 t=3 t=4 t=5 Class 100 Actinobacteria Exobasidiomycetes Alphaproteobacteria Flavobacteriia 75 Fusobacteriia (unspeci ed) Gammaproteobacteria Bacteroidia Methanobacteria 50 Betaproteobacteria Negativicutes Saccharibacteria Saccharomycetes Clostridia Synergistia 25 Deinococci Deltaproteobacteria Verrucomicrobiae Class Relative Abundance 0 Epsilonproteobacteria Viruses MG MG MG MG MG MG MG MG MG MG MG MG Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C Hi-C B C

100

50

B357.1 0 B320.1hic B314.2hic B357.2hic B370.2 B370.2hic (HiC − Metagenome)

B335.2hic −50 B320.1 Relative Abundance Dierence Dierence Abundance Relative B331.2hic B370.1 B335.3hic Height B370.5 B335.1 B370.5hic B314.2 B370.4 B335.1hic B316.3hic B316.5hic −100 B370.4hic B357.6 B357.1hic B320.5 B316.2 B320.5hic B314.4 B370.3 B314.4hic B370.3hic B316.1hic B357.6hic B316.3 B316.1 B331.2 B316.2hic B316.4hic Viruses ProxiMeta.1 B320.3 B316.7 B316.4 B316.5 B320.3hic B316.7hic B357.3 B357.2 B357.5 Firmicutes B357.3hic B331.1 H8.3hic B370.1hic Ascomycota B331.1hic B316.6 Fusobacteria B320.2 B314.3 H3.1 Synergistetes Thermatogae Bacteroidetes Acidobacteria B316.6hic B320.2hic B314.3hic Euryarchaeota Actinobacteria Proteobacteria B331.3hic B331.4hic B357.4 Saccharibacteria Verrucomicrobia B357.4hic H8.2 H3.5 B357.5hic B331.3 B331.4 H3.3 B314.1 Deinococcus/Thermus B314.1hic H3.2 H3.4 B335.3 B335.2 H3.1hic H3.3hic MluCI.1hic H8.1hic H8.2hic

Sau3aI.1hic Phylum H8.4hic H8.5hic H3.5hic H8.1 H8.3 H8.4 H8.5 0.0 0.2Patient/Sample 0.4 0.6 0.8 1.0 Patient/Sample Restriction Enzymes B314 H3 H3.2hic H3.1hic B314 B370 CDC (Sau3AI) B316 H8 B316 MluCI Healthy (Sau3AI) B320 Presse et al. (MluCI/Sau3AI) B320 Sau3aI Press et al. (MluCI/Sau3AI) B331 B331 H3 B335 B335 H8 B357 B357 B370 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Figure 3

1,200,000

40,000

20 800,000 30,000

20,000 10 400,000 Number of Hi-C reads 10,000

0 0 0 Genome- Genome- Plasmid- Genome Plasmid Plasmid

Correct (self) Incorrect (interspecies) E. coli B. fragilis - P. putida P. aeruginosa E. coli - B. fragilis B. fragilis P. putida - E. coli bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Figure 4

A B

Contigs without 300,000 Contigs without Hi-C readpairs Hi-C readpairs spanning two contigs spanning two contigs

200,000 Contigs with Contigs with Hi-C readpairs 200,000 Hi-C readpairs spanning two contigs spanning two contigs Median value Median value 100,000 100,000 Number of contigs Number of contigs

0 0 0 10 1,000 1,000 10,000 100,000 1,000,000 Abundance (RPKM) Length (bp) (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade bioRxiv preprint Extended Data FigureExtended 5

Bin length (Mb) Contamination (%) Completeness (%) 100 2.5 7.5 doi: 10 25 50 75 0 5 0 2 5 8 0 https://doi.org/10.1101/2020.03.19.998526 34B1 30B3 35B5 30H H8 H3 B370 B357 B335 B331 B320 B316 B314 available undera CC-BY-NC-ND 4.0Internationallicense ; this versionpostedMarch20,2020. Individuals . The copyrightholderforthispreprint bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 6

Assembly Assembly & Hi-C 4,000

3206

3,000 2732

2056 2104 2,000 1906 1835 1447 in ≥ 2 taxa 1054 995 968

# of mobile genes 1,000 657 760 674 660 364 404 236 104 0 H3 H8 H3 H8 B314 B316 B320 B331 B335 B357 B370 B314 B316 B320 B331 B335 B357 B370

2 14 Number of taxa Extended Data FigureExtended 7 (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526 C B A

Percent of Percent of novel # of mobile genes # of antibiotic resistance genes gene-taxa associations gene-taxa associations

in ≥ 2 taxa 5,000 con rmed by Hi-C found by Hi-C in ≥ 2 taxa 100 100 available undera 75 50 25 25 50 75 0 0 0 Mobile geneticelements Assembly Assembly 104 0

B314 1447 B314 B314 11 B316 B316 B316 657 7

B320 1054 B320 B320 Link foundonlythroughassemblyandbinning Hi-C-con rmed taxa-genelink Novel linkfoundbyHi-C 14 B331 B331 B331 236 B335 B335 4 4 B335 995 CC-BY-NC-ND 4.0Internationallicense 2 B357

364 B357 B357 1 Number oftaxa B370 1906 B370 B370 8 H3 ; 404 H3 H3 this versionpostedMarch20,2020. H8 H8 3 H8 Assembly &Hi-C Assembly &Hi-C Antibiotic resistancegenes 968 61

B314 2732 B314 B314 38

B316 1835 B316 B316 35 14 B320 3206 B320 B320 69 B331 B331 B331 760 48

B335 2056 B335 B335 36 B357 B357 B357 674 11

B370 2104 B370 B370 10 H3 H3

660 H3 14 H8 H8 H8 . The copyrightholderforthispreprint mge2 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 8

A taxonomically assigned sca old A B mobile gene

distance for A

distance for B

B C Antibiotic resistance genes 125,000 Mobile genes

2,000 400 100,000 20,000

1,500 200 75,000 10,000

1,000 0 50,000 0 0 20,000 0 20,000

1,500 25,000

0 Number of gene-taxonomy combinations 0 Number of gene-taxonomy combinations 0 200,000 400,000 600,000 0 250,000 500,000 750,000 1,000,000 Minimum distance (bp) Minimum distance (bp)

Contig sizes (bp) Contig sizes (bp) 1,000-10,000 L 1,000-10,000 10,000-50,000 10,000-50,000 50,000-1,000,000 50,000-100,000 1,000,000-5,000,000 100,000-500,000 5,000,000-10,000,000 500,000-1,000,000 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 9

A B Mobile genes Antibiotic resistance genes 300 * 10000 * * * * Individual * * B314 7500 * * * 200 B316 * B320 * * B331 * 5000 * B335 * B357 100 B370 * * H3 2500 H8

* Timepoint 1 Unique gene-species connections Unique gene-species connections 0 0 0 20 40 60 0 20 40 60 Reads (millions) Reads (millions) bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 10

A time

Stable plasmid-host association, detected with Hi-C

Loss of plasmid, or strain-level variation, detected with Hi-C

Gain of plasmid, or strain-level variation, detected with Hi-C

Abundance changes, and plasmid association with other organisms, detected with Hi-C

Strain-level variation, association of plasmid with other organism, or association undetected with Hi-C

B C

1.00 Mobile genes 1.00 Antibiotic resistance genes

0.75 0.75

0.50 0.50

0.25 0.25

Repeat detectionRepeat of gene-taxa timepoints across associations 0.00 detectionRepeat of gene-taxa timepoints across associations 0.00 B314 B316 B320 B331 B335 B357 B370 H3 H8 B314 B316 B320 B331 B335 B357 B370 H3 H8 (n=4) (n=7) (n=4) (n=4) (n=3) (n=6) (n=5) (n=5) (n=5) (n=4) (n=7) (n=4) (n=4) (n=3) (n=6) (n=5) (n=5) (n=5) Patients Patients bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 11

Bacterial class Antibiotic resistance Actinobacteria Number of individuals categories Coriobacteria None E ux Bacteroidia 1 Gylcopeptide Resistance Unknown 2 Gene Modulating Resistance Bacilli 3 Tetracycline Resistance Clostridia 4 Phosphoethanolamine Transferase Erysipelotrichia 5 Quinolone Resistance Negativicutes 6 Beta−Lactamase Fusobacteriia 7 Nucleotidyltransferase Betaproteobacteria 8 Acetyltransferase Deltaproteobacteria 9 rRNA Methyltransferase Epsilonproteobacteria Gammaproteobacteria Synergistia Verrucomicrobiae

Bacterial class Antibiotic resistance genes resistance Antibiotic bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 12

Bacterial class Methanobacteriactinobacteria Number of individuals Actinobacteria None Mobile Element Coriobacteria 1 Machinery Gene Bacteroidia 2 Plasmid Flavobacteria 3 Phage Unknown 4 Transposon Bacilli 5 Multiple Clostridia 6 Erysipelotrichia 7 Negativicutes 8 Fusobacteriia 9 Betaproteobacteria Deltaproteobacteria Epsilonproteobacteria Gammaproteobacteria Synergistia Verrucomicrobiae

Bacterial class Mobile machinery element genes bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Figure 13

B314 B316 B320 B331 B335 B357 B370 100 Levo oxacin

75

50

25

0 Piperacillin 0.75

0.50

0.25

0.00 100 Sulfamethoxazole

75

50

25

Gene abundances (RPKM) 0 Trimethoprim

100

50

0 0 5.0 7.5 10.0 0 10 20 30 40 50 0 5 10 2.5 5.0 7.5 10.02 4 6 8 8 12 16 5 10 15 20 Patient timepoints

Antibiotic use Levo oxacin Piperacillin-Tazobactam Trimethoprim- Sulfamethoxazole bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 14

Mobile genes 100 Within Patient Between Patients ** 75 *

50

of mobile genes ** 25 * HGT rate (per 100 comparisons)

0 Between Between Between Between Between Between Species Genera Families Orders Classes Phyla Taxonomic Levels bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 15

A B Mobile genes Antibiotic resistance genes B314 B316 B320 B314 B316 B320 max: 2122 max: 3414 max: 1856 max: 132 max: 42 max: 46

B331 B335 B357 B331 B335 B357 max: 1816 max: 1084 max: 1160 max: 28 max: 78 min:max: 1 30

B370 H3 H8 B370 H3 H8 max: 534 max: 1296 max: 1042 max: 24 max: 10 max: 22

Phyla Bacteroides Actinobacteria Verrucomicrobia Gamamproteobacteria Enterobacteriaceae Fusobacteria Firmicutes Euryarcheaota Synergistetes Saccharibacteria bioRxiv preprint doi: https://doi.org/10.1101/2020.03.19.998526; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Figure 16

Loss of gene-taxa linkage density Gain of gene-taxa linkage density

B314 B331 B357 H3 H8 B320 B370 B335 B316 0.075 Gene-taxa 0.050 linkage density 0.025

160 120 Number of taxa 80 40 60 Relative 40 Enterobacteriaceae 20 abundance (%) 0 6e+07 # of Hi-C reads 4e+07 2e+07

4000 Total AR gene 3000 abundance (RPKM) 2000 1000 1 2 3 4 1 2 3 4 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 1 2 3 4 1 2 3 4 5 1 2 3 1 2 3 4 5 6 7

Timepoints per individual timecourse