Supplemental Materials

Supplemental Materials Contents Page Experimental Methods 17 1 Study design and sample collection 17 2 High molecular weight DNA extraction 17 3 Linked-read library preparation and sequencing 18 4 Postprocessing and read cloud assignment 18 Computational Methods and Analysis 19 5 Metagenomic pipeline 19 5.1 Quantifying relative abundance at the species level................... 19 5.2 Quantifying gene content for each species........................ 19 5.3 Identifying SNVs within species.............................. 20 5.3.1 Quantifying genetic diversity over time...................... 20 5.3.2 Synonymous and nonsynonymous variants.................... 20 5.3.3 Allele prevalence across hosts........................... 21 5.4 Tracking read clouds associated with SNVs and genes................. 21 5.5 Quantifying frequency trajectories of SNVs....................... 22 5.5.1 Detecting SNV differences over time....................... 22 5.5.2 Private marker SNVs............................... 24 6 Antibiotic resistance gene profiling 24 7 Inferring genetic linkage from shared read clouds 25 7.1 Empirical estimates of read cloud impurity....................... 25 7.2 Empirical estimates of long-range linkage within read clouds.............. 26 7.3 Statistical null model of read cloud sharing....................... 27 7.4 Inferring linkage between pairs of SNVs......................... 29 7.4.1 Quantifying linkage at the SNV level....................... 29 7.4.2 Quantifying linkage at the allelic level...................... 30 7.5 Inferring linkage between SNVs and species backbones................. 31 8 Clustering SNVs into haplotypes 32 9 Distinguishing between genetic drift and natural selection in allele frequency time series 35 1 Supplemental Data Files 40 List of Figures S1 Read cloud statistics for each sample...........................3 S2 Microbiota density over time...............................4 S3 Species abundance dynamics before, during, and after antibiotics...........5 S4 Relative abundance of antibiotic resistance genes as a function of time........6 S5 Reference genome coverage as a function of species relative abundance........6 S6 Analogous version of Fig. 3 for more example species (1/3)..............7 S7 Analogous version of Fig. 3 for more example species (2/3)..............8 S8 Analogous version of Fig. 3 for more example species (3/3)..............9 S9 Read clouds link putative within-species sweeps to the correct genomic backbone.. 10 S10 Read clouds reveal haplotype structure in B. vulgatus ................. 11 S11 Estimated fragment length distribution from HMW DNA extraction protocol.... 12 S12 Linkage disequilibrium between SNVs that fail the 4 haplotype test.......... 13 S13 Peak-to-trough ratio and relative abundance for select species............. 14 S14 Schematic illustration of private marker SNV sharing over time............ 15 S15 Co-inheritance of A. finegoldii SNV differences across a larger cohort......... 16 S16 Calibrated null model of read cloud sharing as a function of read cloud coverage.. 29 S17 Putative B. vulgatus strains detected by the StrainFinder algorithm......... 33 S18 Distribution of frequency trajectory distances between pairs of SNVs in B. vulgatus 35 List of Tables S1 List of samples and associated metadata......................... 39 S2 DNA yield and purity from different extraction methods described in Supplemental Materials, Section2.................................... 39 S3 Evidence for natural selection in SNV frequency trajectories in Fig. 3......... 39 2 Day 1 (84 Gbp) Day 5 (85 Gbp) Day 11 (8 Gbp) Day 26 (87 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 36 (90 Gbp) Day 38 (18 Gbp) Day 47 (181 Gbp) Day 50 (15 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 68 (87 Gbp) Day 70 (32 Gbp) Day 73 (166 Gbp) Day 75 (26 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 78 (22 Gbp) Day 81 (84 Gbp) Day 93 (18 Gbp) Day 111 (45 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 119 (40 Gbp) Day 130 (22 Gbp) Day 154 (165 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 Read pairs per barcode Read pairs per barcode Read pairs per barcode Figure S1: Read cloud statistics for each sample. Analogous versions of Fig. 1E for each of the 19 samples. 3 Baseline Disease Antibiotics Intermediate Final Bacteroides vulgatus 80 Bacteroides uniformis 60 Bacteroides coprocola 40 Bacteroides cellulosilyticus (ng/ul) 20 Bacteroides eggerthii DNA conc 0 Bacteroides faecis Bacteroides caccae Bacteroides massiliensis Alistipes sp Alistipes onderdonkii Alistipes finegoldii Parabacteroides distasonis Paraprevotella clara Butyrivibrio crossotus Coprococcus comes Coprococcus sp Eubacterium rectale Eubacterium eligens Eubacterium siraeum Phascolarctobacterium sp Absolute species abundance (a.u.) Other 0 20 40 60 80 100 120 140 160 Day Figure S2: Microbiota density over time. Top: Microbiota density estimated by microbial DNA quantification (concentration of extracted DNA per ∼200mg feces) for a subset of the study timepoints. Vertical lines indicate ±20% error bars, as estimated from the precision of our sample mass measurements. The shaded region denotes the antibiotic treatment period. These data show that overall microbiota density was not significantly reduced at the end of antibiotic treatment. Bottom: Absolute species abundance dynamics obtained by scaling the relative abundance measurements in Fig. 1C by the microbiota density measurements in the top panel. 4 100 10 1 Bacteroidaceae 10 2 Rikenellaceae Eubacteriaceae 10 3 Lachnospiraceae (baseline units) ABX Abundance Prevotellaceae 4 10 Acidaminococcaceae Porphyromonadaceae 5 10 Ruminococcaceae Oscillospiraceae 0 10 Sutterellaceae Burkholderiales 1 10 Guyana Clostridiales 2 10 Erysipelotrichaceae Oxalobacteraceae 10 3 Pseudomonadaceae Clostridiaceae (baseline units) Final Abundance 10 4 Bifidobacteriaceae Synergistaceae 10 5 10 4 10 3 10 2 10 1 100 Avg Baseline Abundance Figure S3: Species abundance dynamics before, during, and after antibiotics. Absolute abundances of individual species at the antibiotic (top) and final (bottom) timepoint as a function of the average absolute abundance during the two baseline timepoints in Supplemental Fig. S2. Absolute abundances are estimated by multiplying relative abundance estimates by the microbial density measurements in Supplemental Fig. S2, and scaling by the average microbial density at the two baseline timepoints. Symbols denote individual species and are colored by taxonomic family. Open symbols denote species that fall below the detection threshold (≈ 2×10−5). These data show that only a minority of species experienced dramatic reductions in absolute abundance by the end of antibiotic treatment; several of these species recovered in abundance by the end of the study. 5 tetracycline 500 beta-lactam (tet-linked) beta-lactam bacitracin 400 300 200 100 Reads (per million) mapped to ABX genes 0 0 20 40 60 80 100 120 140 160 Day Figure S4: Relative abundance of antibiotic resistance genes as a function of time. Lines show fraction of reads mapping to different classes of antibiotic resistance genes in the ARGs-OAP [1] database (Methods). The shaded region indicates the antibiotic treatment period. 103 t Full lane samples , 1/3 lane samples e g a r 102 e v o c e d i w - 1 e 10 m o n e G 100 10 3 10 2 10 1 100 Species relative abundance Figure S5: Reference genome coverage as a function of species relative abundance. Symbols denote the typical number of unique read clouds per site, Dt, for each reference genome in each sample as a function of the estimated relative abundance of the species in that sample. Blue points indicate samples that were sequenced to a high target coverage (Supplemental Table S1). 6 A Alistipes finegoldii D Eubacterium eligens 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 B Alistipes sp E Phascolarctobacterium sp 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 C Alistipes onderdonkii F Bacteroides coprocola 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 -2yr 0 20 40 60 80 100 120 140 -2yr 0 20 40 60 80 100 120 140 Time (days) Time (days) Figure S6: Analogous version of Fig. 3 for more example species (Part 1/3). The -2yr timepoint is estimated using data from a previous study that sampled the same individual [3]. As in Fig. 3, the shaded region indicates the antibiotic treatment period. 7 A Bacteroides faecis D Bacteroides stercoris 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 B Bacteroides eggerthii E Bacteroides finegoldii 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 C Bacteroides caccae F Bacteroides massiliensis 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 -2yr 0 20 40 60 80 100 120 140 -2yr 0 20 40 60 80 100 120 140 Time (days) Time (days) Figure S7: Analogous version of Fig. 3 for more example species (Part 2/3). 8 A Eubacterium siraeum D Butyrivibrio crossotus 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 B Eubacterium rectale E Paraprevotella clara 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 C Barnesiella intestinihominis F Alistipes senegalensis 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 -2yr 0 20 40 60 80 100 120 140 -2yr 0 20 40 60 80 100 120 140 Time (days) Time (days) Figure S8: Analogous version of Fig. 3 for more example species (Part 3/3). 9 SNVs tested ( 1000) Confirmed linked to correct core genome Different core genome 103 with most of theobserve negatives an clustering average in positive justalleles a confirmation had few rate only species. of acore minority of genes shared in read thespecies clouds correct from was species confirmed the if, correct (blue).clouds for species Linkage both (red). across to alleles, all Across a the species, timepointsalternate different majority we alleles (and species of of the was at these shared inferred least≥ read SNVs, if 5 clouds we one shared were retained or derived the readof both from read five clouds cloud genes in sharing with total).

Supplemental Materials

Direct Solutions of the Wright-Fisher Model

Joint Bayesian Estimation of Mutation Location and Age Using Linkage Disequilibrium

A Maximum Likelihood Approach

The Evolutionary History of the CCR5-Δ32 HIV-Resistance Mutation

Population Genetics of Rare Variants and Complex Diseases Authors

Joint Nonparametric Coalescent Inference of Mutation Spectrum

Direct Solutions of the Wright-Fisher Model

The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data

Convergent Adaptation of Human Lactase Persistence in Africa and Europe

Estimating Time to the Common Ancestor for a Beneficial Allele

The Geographic Spread of the CCR5 D32 HIV-Resistance Allele

Linkage Disequilibrium — Understanding the Evolutionary Past and Mapping the Medical Future