© 2015 Nature America, Inc. All rights reserved. D.W. ( Medicine, Brigham and Women’s Hospital, Boston, USA. Massachusetts, should Correspondence be addressed to L.R.H. ( Washington University School of Medicine, St. Louis, Missouri, USA. 1 that to indicating the the virome effects host may provide beneficial challenges pathogenic against resistance increased can confer tion RNA eukaryotic many includes which virome, enteric the of expansion with associated is AIDS and illness with organ transplant outcome and is an febrile of indicator pediatric is correlated with directly the degree of host immunosuppression and in human health. The burden of anellovirus (a DNAeukaryotic ) that indicates the virome evidence plays Emerging a teriophages. role RNA and of and DNAeukaryotic bac consisting community viruses their with individuals unrelated with than microbiome co-twin bacterial similar more a share infants that setting geographical and use otic antibi route, delivery nutrition, as such factors interacting multiple community structure bacterial ‘adult-like’ toward a years stereotypical several next over the changes composition its and birth, after soon established is microbiome rial community microbial bacterial the modulate to aim ome, such as probiotics, prebiotics and fecal microbial transplantation, disease bowel inflammatory and diabetes cirrhosis, including diseases human of have in microbiome implicated a been range wide bacterial intestinal the with interact microorganisms immune system and their host’sinfluence health these of been some has that It archaea. established and fungi viruses, (bacteriophages), eukaryotic viruses bacterial bacteria, includes microbiome intestinal The associated with early life changes in the composition of bacteria, viruses and with bacteriophages age. prey model. Thus, in contrast to the stable microbiome observed in adults, the infant microbiome is highly dynamic and relationship begins from bacteriophage-bacteria birth with a high predator–low prey dynamic, consistent with the Lotka-Volterra microbiome expanded, but this was accompanied by a contraction of and shift in the virome bacteriophage composition. The similar between co-twins than between unrelated infants. From birth to 2 years of age, the eukaryotic virome and the bacterial and bacterial microbiome in a longitudinal cohort of healthy infant twins. The virome and bacterial microbiome were more of and bacteriophages eukaryotic RNA and DNA viruses, during the first years of life. Here, we the characterized gut virome that the gut bacterial microbiome is rapidly acquired after birth, less is known about the viral microbiome (or ‘virome’), consisting The early years of life are important for immune development and influence health in adulthood. Although it has been established BWarnerBarbara Efrem SLim microbiome in infants Early life dynamics of the human gut virome and bacterial nature medicine nature Received 22 May; accepted 20 August; published online 14 September 2015; 4 Department Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, USA. Present Present address: The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA, and Channing Division of Network Medicine, Department of Much less is known about the viral microbiome (virome) [email protected] 3– 1 5 . Most therapeutic strategies targeting the microbi the targeting strategies therapeutic Most . , 2 20– , Yanjiao Zhou

2 advance online publication online advance 3 . Pathogenic simian immunodeficiency virus virus immunodeficiency simian Pathogenic . 3 , Phillip ITarr, Phillip ). 8– 2 4 1 . Additionally, chronic virus infec virus chronic Additionally, . 1 8– . . This process can by be influenced 1 3 5 . Studies of twins demonstrate demonstrate twins of Studies . , 4 , Guoyan Zhao 14 1 , , 16– 3 , David Wang, David 1 8 . 1 , 2 . . Alterations in the

6 , 7 . The bacte The . 1 1 9 3 , Irma KBauer , a diverse Department Department of Pediatrics, Washington University School of Medicine, St. Louis, Missouri, USA. 1 , 2 & Lori RHoltz &Lori 2 doi:10.1038/nm.395 5 1 - - - - - , .

In this study, infants’‘healthy we defined as having those no apparent healthy monozygotic twin pair and three healthy dizygotic twin pairs. and intraindividual variability in the virome, we stools sequenced of a To pany human development. of interindividual degree the elucidate the changes and that in viruses bacteriophages accom the eukaryotic and infancy is likely to long-term affect health infants. healthy of cohort a of virome the of analysis longitudinal no been has there as such diseases have with and paralysis diarrhea described acute been flaccid children of virome gut the of analyses omic infants healthy of stools in found frequently be can anelloviruses, and as such viruses, eukaryotic ing sequenc Sanger modest-depth using point time single at a analyzed was DNA virome the which in infant single a of study one to limited biome micro intestinal human the in observed be to yet has bacteria and model “predator-prey” Lotka-Volterra a follow to known are bacteriophage-bacteria interactions of dynamics population in changes where and ulcerative colitis Crohn’s disease with associated been have composition community stable community over time family and in healthy adults consist mostly of members of the order The intestinal microbiota also contains diverse bacteriophages, which Given that the bacterial microbiome is established during early early during established is microbiome bacterial the that Given 3 3 . . Targeted that some PCR and RT-PCR have determined studies 3 1 30– , Lindsay Droit, Lindsay 7 . Metagenomic studies of the healthy infant gut virome are are virome gut infant healthy the of studies Metagenomic . 3 2 , the predator-prey relationship between bacteriophages bacteriophages between relationship predator-prey the , 3 0 2 2 9 Department Department of Pathology & Immunology, . However, unlike in environmental ecosystems, . These bacteriophages typically maintain a maintain typically bacteriophages These . 1 , 2 17 , IMalick Ndao , 26– [email protected] 2 8 . . Shifts in the enteric bacteriophage 3 8 4 , 9 . Although metagen Although . 3 e c r u o s e r , 14 ,

, 16 ) ) or , 3 8

, , we examined 35–

3 7 , , to date, 

- - - -

© 2015 Nature America, Inc. All rights reserved. RNA amplification; MDA, multiple displacement amplification. ( amplification. displacement multiple MDA, amplification; RNA and DNA independent sequence SIA, (C2-6). 6 months at C2 infant and (C1-6) 6 months at C1 infant (B2-18), months 18 and (B2-6) 6 months at B2 infant (B1-18), months 18 and (B1-6) 6 months at B1 infant (A2-24), months 24 at A2 infant from specimen fecal shown: is specimens representative ( pairs). twin (4 infants 1 Figure development. infant human healthy during occurs naturally that dynamics relationship bacteriophage-bacteria predator-prey of existence the suggest and the virome of intestinal infant kinetics the of reconstruction timeline in-depth an provide results Our microbiome. bacterial and virome intestinal human ing RNA genes ribosomal of the same stools 16S to generate an integrated view of the bacterial develop of the years sequenced 2 we to Additionally, birth samples age. from stool points time from six at virome collected intestinal prospectively the compared we age, increas with ing evolution its and ( composition illness virome the acute define To of the episodes naturally, had although, infants disorders, chronic or genetic underlying e c r u o s e r  life. of 2 years first the during months) 24 and 18 12, 6, 3, (0, point time and D2) and D1 C2, C1, B2, B1,

a c Unclass. dsDNAphages Unclass. ssDNAviruses Environmental samples Unclass. Caudovirales Age (0–24months)

Unclass. phages ssDNA satellites Unclass. viruses Study design and metagenomic analysis of the infant gut virome. ( virome. gut infant the of analysis metagenomic and design Study Picobirnaviridae Picornaviridae Corticoviridae displacement amplification Microviridae Astroviridae Total nucleicacidextraction Tectiviridae Multiple Inoviridae Filtered through0.4 3 0 (MDA) b ) Heatmap of reads assigned to virus families show that the profile is influenced by the sequencing method. Comparison of Comparison method. sequencing the by influenced is profile the that show families virus to assigned reads of ) Heatmap 6 8 infants(4pairsoftwins): Next-generation sequencing amplification independent DNA &RNA Sequence A1 (SIA fecal specimens µ M ) 1 1 2 Total nucleicacidextraction A2 Bead beatingdisruption upeetr Fg 1a Fig. Supplementary amplification 8 PCR 16S 24 months B1 c ) Presence-absence heatmap shows the viruses identified by subject (infants A1, A2, A2, A1, (infants subject by identified viruses the shows heatmap ) Presence-absence b B2 Unclass. dsDNAphages Unclass. Caudovirales ). ). - -

Unclass. phages Alphaflexiviridae a Lipothrixviridae Tombusviridae Picornaviridae ) Sequencing strategy to characterize the microbiome of 8 healthy 8 healthy of microbiome the characterize to strategy ) Sequencing Geminiviridae Chrysoviridae Corticoviridae Adenoviridae tion methods: multiple displacement amplification (MDA) and and (MDA) amplification displacement from multiple amplifica extracted methods: complementary two was tion to acid subjected and nucleic specimens total stool viruses, RNA and DNA development early infants of during microbiome intestinal the define to as so age of months 24 and 18 12, 6, 3, at and 0) month as (defined 1–4 life of day from Fig. 1a ( USA Missouri, Louis, St. of area itan from specimens eight healthy infants fecal (four twin of pairs) residing in the sequencing greater metropol metagenomic performed We Virome of infants during early development RESULTS Anelloviridae Parvoviridae Siphoviridae Caliciviridae Microviridae Nanoviridae Podoviridae Virgaviridae Circoviridae Tectiviridae Myoviridae C1 Inoviridae Method ). Samples analyzed in this study were collected longitudinally longitudinally in ). study this Samples analyzed were collected

SIA A2-24

C2 MDA

advance online publication online advance SIA B1-6 MDA SIA B1-18 MDA SIA B2-6

D1 MDA SIA B2-18 3

MDA 9 SIA C1-6 both detect comprehensively To . MDA SIA C2-6

D2 MDA Archaeal viruses Fig. 1 Fig. Bacteriophage 1 RNA viruses DNA viruses Eukaryotic Eukaryotic Reads 100 a and and ssDNA ssDNA ssRNA dsDNA dsDNA dsRNA dsDNA

>1,000 nature medicine nature s Supplementary Supplementary Bacteriophages Archael viruses Other viruse DNA viruses RNA viruses Absence Presence Eukaryotic Eukaryotic s - ­

© 2015 Nature America, Inc. All rights reserved. assignment from from assignment ( colors. indicated in highlighted are Genera method. maximum-likelihood the by generated alignment, acid amino ORF1 the from inferred strains reference 12 and contigs ( branches. the on indicated ( genera. virus RNA eukaryotic indicated harboring specimens ( shown. are intervals confidence R regression, Linear infants). ( ages indicated at genera virus RNA and DNA eukaryotic of taxa) observed of (number ( co-twins. between viromes shared of evidence and age with viruses DNA and RNA eukaryotic the 3 Figure ( family virus than RNAeukaryotic and DNAeukaryotic Only viruses. one archaeal detected frequently more were families bacteriophage Additionally, specimens. fecal infant the in picornaviruses and ruses, studies ( heatmap presence-absence a into SIA and MDA by identified families virus Methods). Online of profiles the in merged we composition, (details virome global the Todefine analyses the for library sample per reads 200,000 to subsampled randomly and assigned taxonomically 1b Fig. 551,592 549,301 On average, we platform. obtained MiSeq on the Illumina sequenced of that MDA than lower ( ally viruses RNA as DNAas well of detecting is capable SIAwhich the approach, viruses DNA circular small also of use amplification preferential its to and leads viruses, RNA detect not can MDA However, approach. this in used polymerase phi29 the of processivity high studies virome in used ( (SIA) fication sequence-independent DNA and RNA ampli ages. indicated the at genera) bacteriophages; and viruses (eukaryotic communities virome of dissimilarity Bray-Curtis of clustering hierarchical Agglomerative 2 Figure nature medicine nature significance was assessed by Student’s Student’s by assessed was significance ( co-twins between life of 2 years first the * nonparametric); (paired, a 2

value and 95% 95% and value Eukaryotic virus richness 10 15 0 5 35 3 ). Sequencing reads were adaptor trimmed, quality filtered, filtered, quality trimmed, adaptor were reads Sequencing ).

4 ± Alterations in Alterations diversity. beta virome of Analysis ± , w ietfe ekroi RA iue sc a calicivi as such viruses RNA eukaryotic identified we , 5 0 3 a 207,521 (mean (mean 207,521 6 2,1 ras e SA ape irr ( library sample SIA per reads 229,210 Fig. 1 Fig. ) Richness ) Richness b , although its sensitivity for , its is DNA sensitivity gener although detection virus ) Number of ) Number Lipothrixviridae Age (months) Fig. 1 Fig. e c 10 . . ( n

). Consistent with results from other PCR-based PCR-based other from results with Consistent ). = 8

g f

) Presence-absence heatmap sequencing reads mapped to the anellovirus contigs. Contigs are colored by their phylogenetic phylogenetic genus their by colored are Contigs contigs. anellovirus the to mapped reads sequencing heatmap ) Presence-absence ) Richness of anelloviruses species at indicated ages ( ages indicated at species anelloviruses of ) Richness

15 advance online publication online advance

R a

4

2

). MDA is commonly commonly is MDA ). 0 17 =0.192 . To complement MDA, therefore, we used used we therefore, MDA, complement To .

P e 20 ,

= 0.01–0.05, ** ** = 0.01–0.05, 26 d ± ) Number of specimens harboring indicated eukaryotic DNA virus genera. ( genera. virus DNA eukaryotic indicated harboring specimens of ) Number , s.d.) reads per MDA sample library and and library sample MDA per reads s.d.) 2 9 0.2 25 because of the the of because Fig. 1 Fig. ) was identified. was )

Enterovirus No. of samples b Parechovirus b 0 2 4 6 t ). The libraries were pooled and were pooled libraries ). The

-test; * -test; Tobamovirus n = 4 co-twin comparisons) and unrelated infants ( infants unrelated and comparisons) = 4 co-twin Betatorquevirus Eukaryotic RNAviruses Sapovirus Alphatorquevirus P MamastrovirusCarmovirus < 0.01. ( < 0.01. P Gammatorquevirus - -

< 0.05. Unclass. Chrysovirus Unclass. Picobirnavirus Norovirus Potexvirus Bray-Curtis Bray-Curtis h Supplementary Supplementary

dissimilarity dissimilarity ) Comparison of the proportion of shared anellovirus taxa (genome contigs) acquired during during acquired contigs) (genome taxa anellovirus shared of proportion the of ) Comparison 0.25 0.15 0.20 0.30 0.1 0.6 0.2 0.3 0.4 0.5 c ) Maximum-likelihood phylogeny of parechovirus sequences. Bootstrap values are are values Bootstrap sequences. parechovirus of phylogeny ) Maximum-likelihood D1 C2 Age f 0.05 c D2 Anellovirus taxa C1 100 90 A1 74 C1 D1 99 A2 100 100 12 months 0 months - - D2 A1

B1 A2-6 A1-6 Human parechovirus6(FJ888592) Human parechovirus6(EU077518) n Human parechovirus1(GQ183018) Human parechovirus1(FM178558) B2-6 B1-6 C1- C2-3 = 8 infants). Statistical significance was assessed by Wilcoxon test test Wilcoxon by assessed was significance Statistical = 8 infants). Human parechovirus1(FM242866) B2 A2 B1 composition was typically more similar between co-twins than than co-twins between ( age for controlling after similar infants unrelated between more typically was composition community virome the that hypothesis the supported clustering cal ( infants these of biodiversity in virome variation to the and age contributed edness virome the that relatsuggested co-twin and viruses bacteriophages) (eukaryotic of analysis unweighted Principal-coordinate the using distance. level Bray-Curtis genus virus the at diversity beta ured obmvrs n spvrs ( sapovirus and tombamovirus most commonly detected viral genera were enterovirus, parechovirus, the RNA identified, Among the viruses mental exposures. eukaryotic environ through established is primarily virome eukaryotic the that specimens ( thereafter and increased earliest-in-life the in low was richness population viral We next focused on the assembly of the eukaryotic virome. Eukaryotic The eukaryotic virome is acquired after birth 3 C1 B2 A1 To compare we the individuals, meas virome between biodiversity C2 B1 A2 D1 C2 B2 D2 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 n g = 24 comparison between unrelated infants). Statistical Statistical infants). unrelated between comparison = 24 No. of anelloviruses 2a Fig. Supplementary Shared Shared Shared D1 B1 10 15 20 40 60 0 5 D2 B2 6 3 0 A1 A1 18 months * e 3 months Age (months) B1 ** A2 ) Phylogenetic relationships of 61 anellovirus anellovirus 61 of relationships ) Phylogenetic Fig. Fig. 3 d B2 Unclass. Anelloviridae C1 12 A2 C2 a , , Wilcoxon test 18 i. 3 Fig. C1 Betatorquevirus No. of samples D1

Alphatorquevirus 10 20 30 40 0 24 C2 D2 Gammatorquevirus , b b 0.4 Unclass. CircoviridaeCircovirus 0.2 0.3 0.5 0.1 0.2 0.3 0.4 0.5 ). Agglomerative hierarchi Agglomerative ). h Mastadenovirus of sparsity relative The ). Proportion of

shared anelloviruses Eukaryotic DNAviruse B1 C1 0. 0. 0. 0. 0 2 4 6 8 B2 C2 e c r u o s e r BegomovirusBabuvirus P < 0.05), suggesting < suggesting 0.05), A1

Unclass. Polyomaviridae A2 Within twin 24 months Fig. Fig.

Gyrovirus 6 months

Unclass. Geminiviridae A2 Mastrevirus A1 s

Nanovirus C1 D1 2 * ). C2 Between Bocaparvovirus D2 twin s D1 B1 s D2 B2  - - - -

© 2015 Nature America, Inc. All rights reserved. unrelated infants (shaded) ( (shaded) infants unrelated ( (white) pairs twin within compared community ( sequences. gene rRNA 16S on based families bacterial of regression, Linear ( shown. are intervals ( OTUs bacterial of OTUs) 5 Figure in instances observed also we virus, same the with infected highly frequently are pair twin a within infants that other indicate findings these the identity), RNA ( virus nucleotide eukaryotic prevalent (>99.6% enterovirus for observed also was co-twins in infection strain-identical finding, this but different twin identity), pairs harbored distinct strains nucleotide ( (>99.9% parechovirus of strains identical nearly shared pair twin a within the infants that demonstrated to analyses contigs resulting the mapping human genome parechovirus ( and reads the assembling ruses, one of the most prevalent RNA viruses detected in our study, by of parechovi sequences we analyzed points, time at same the strains virus same the harbored co-twins common whether Todetermine of rar efaction). and measurements assessment diversity (including accurate parameters ecological precluded viruses eukaryotic ( abundance of bacteriophage families. ( (shaded) ( ( regression, ( plotted is species bacteriophage of index) (Shannon diversity ( colors. in indicated are age same the at samples from Curves permutations). (500 richness species bacteriophage of ( shown. are intervals confidence 95% and ( ages indicated at species bacteriophage ( composition. bacteriophage in a shift with coincides age with diversity and 4 Figure e c r u o s e r  Student’s by assessed was n d b n

a a = 48 sampling time points). Linear regression and 95% confidence intervals are shown, and the Spearman correlation coefficient ( ) Bray-Curtis distance of the bacteriophage virome at the genus level within twin pairs (white) ( Bacterial richness acquisition the show curves ) Rarefaction regression, Linear = 8 infants). Bacteriophage richness 200 400 600 800 100 120 20 40 60 80 0 0

5 0 Decrease in bacteriophage richness richness bacteriophage in Decrease Bacterial community expansion with age. ( age. with expansion community Bacterial n 5 0 = 24 comparison between unrelated infants). Statistical significance was assessed by Student’s R 2 value and 95% confidence intervals. intervals. confidence 95% and value Age (months) Age (months) 10 R n 10 2 = 8 infants). Linear Linear = 8 infants). value and 95% confidence intervals are shown. ( shown. are intervals confidence 95% and value b ) Bacterial alpha diversity (Faith’s phylogenetic diversity) ( diversity) phylogenetic (Faith’s diversity alpha ) Bacterial 15 15 R R 2 n 2 =0.554 =0.320 t = 8 infants). Linear regression, regression, Linear = 8 infants). 20 -test; ** -test; n 20 a = 24 comparison between unrelated infants). Statistical significance significance Statistical infants). unrelated between comparison = 24 ) Richness of ) Richness 25 Supplementary Fig. 3a Supplementary 25 R

2 P Supplementary Fig. 3b Fig. Supplementary b value value < 0.01. b

Bacterial diversity f c ) Plot shows the relationship of the 10 20 30 40 Bacteriophage richness ) Alpha ) Alpha 0

400 100 200 300

0 5 0

8 6 4 2 0

n Fig. 3 = 4 co-twin comparisons) and between between and comparisons) = 4 co-twin Age (months) 10 No. ofsamples

a ) Richness (number of observed bacterial bacterial observed of (number ) Richness c e ). Consistent with

15 Relative abundance R 0.2 0.4 0.6 0.8 1.0 ). ). Phylogenetic 2 d R 0 =0.523 ) Unifrac distance of the bacterial bacterial the of distance ) Unifrac 2 20 value and 95% confidence confidence 95% and value ). Although ). Although 3 0 25 c Age (months) c

) Relative abundance abundance ) Relative Relative abundance 1 6 0.2 0.4 0.6 0.8 1.0 24 months 18 months 12 months 6 months 3 months 0 month Microviridae - - 0

1 2 qRT-PCR results independently validated the presence of the viruses of the viruses the presence validated qRT-PCR independently ( results A1 co-twin its as isolate parechovirus same human the harbored A2 infant ‘discordant’ that verified sequencing ( stool) virus by qRT-PCR had very low viral loads (18–730 viral copies/15 mg ‘sequencing-negative’ samples that for were positive human parecho ( the assay by positive qRT-PCR also were we samples sequencing-positive load, five All ies. viral cop viral parechovirus the human of (qRT-PCR) number the RT-PCR in measured and quantitative assay a differences with samples from the all or screened sequencing the limitations sensitivity of from arise might discordance observed the the in in whether not To detected determine A2. co-twin but in were not but months) (3 A1 reads twin infant one parechovirus only human in example, For detected other. was virus the which 3 0 n = 8 infants). = 8 infants). 2 8 c Age (months) Bacteriophage diversity family abundance compared to 0 2 4 6 8 upeetr Fg 3c Fig. Supplementary 1 6 4 5 0

1 2

n

advance online publication online advance Age (months) Microviridae Siphoviridae Inoviridae Myoviridae Podoviridae Corticoviridae Unclass. Caudovirales Unclass. phages Unclass. dsDNAphages = 4 co-twin comparisons) compared to unrelated infants 2 8

10 upeetr Fg 3c Fig. Supplementary

4 d 15

UniFrac distance Bacteroidetes; Bacteroidia Actinobacteria; Actinobacteri Firmicutes; Erysipelotrichi Firmicutes; Clostridia R 0.2 0.4 0.6 0.8 1.0 2 0 =0.286 t -test; * 20 Prevotellaceae Bacteroidaceae Corynebacteriaceae Bifidobacteriaceae Erysipelotrichaceae Peptostreptococcaceae Tissierellaceae Clostridiaceae Veillonellaceae Unclassified Clostridia Ruminococcaceae Lachnospiraceae ** 3 0 25 P ** ). Finally, RT-PCR and amplicon amplicon and RT-PCR Finally, ). = 0.01–0.05, ** d Age (months)

Caudovirales Bray-Curtis dissimilarity ** 1 6 0.2 0.4 0.6 0.8 1.0 1.2 0 Microviridae f a ** 0.2 0.4 0.6 0.8 1.0 1 2 ns 3 0 0 Other Bacterialfamilies Fusobacteria; Fusobacteriia Verrucomicrobia; Verrucomicrobiae Proteobacteria; Betaproteobacteri Proteobacteria; Gammaproteobacteria Firmicutes; Bacilli ). The three additional additional three The ). 0 ** 2 8 * * order abundance

r 0.2 Other Fusobacteriaceae Verrucomicrobiaceae Comamonadaceae Moraxellaceae Pseudomonadaceae Enterobacteriacea Staphylococcaceae Streptococcaceae ) is indicated. P r Age (months) P Fig. 3 Fig. ** =–0.888 nature medicine nature 4 <0.0001 < 0.01. ( Caudovirales 1 6 * * 0.4 Between twins Within twins c Between twins Within twins 1 2 0.6 ). Thus, the the Thus, ). e e ns ) Relative 0.8 2 8

ns a 1.0 4 - -

© 2015 Nature America, Inc. All rights reserved. Microviridae ( age of months of abundance relative increased an toward composition community the in shift marked a was there Microviridae ( The most abundant were bacteriophages from the ( infants unrelated between than co-twins between more similar) was virome the is, (that lower was virome bacteriophage the ( age pling bias ( sam to attributable be to unlikely was richness in decrease the that suggesting age, with decreased indeed accumulation bacteri species ophage of rate the that demonstrated curves rarefaction Richness age at with ( 0 months and decreased specimens life in earliest the in greatest was richness bacteriophage MDA used have that studies virome bacteriophage other of that to findings comparable our be could that so data MDA-generated the on bacteri analyses subsequent ophage the the focus In to chose phages. we RNA phage, RNA any of yield absence not did data SIA-generated the of analysis in samples, all were detected DNA Although bacteriophages with age Bacteriophage community contracts and shifts in composition infection. recurrent of source stable a or persistence up ( to apart 12 months were collected that same anelloviruses could be detected from stools from the same infant ( infants unrelated did than anelloviruses of proportion higher a shared Moreover, co-twins age. of months 12 at species anellovirus 47 least at harbored (C1) infant ( detected earlier than 3 months of age, but soon increased significantly ( contigs virus concordance with PCR assays designed to detect three anello specific contig ( each of prevalence the determine to contigs reference virus genomes’. ‘reference as serve We then mapped reads sequencing from each to specimen the anello functionally to identity) nucleotide we detected, curated a set of contigs unique (shared anellovirus <95% reads anellovirus novel of number large the of Because abundance. or prevalence anellovirus in by changes reflected may be antibodies) maternal of waning instance, (for immunity infant in changes that with changes in associated host immune status and and highly divergent from ( previously anelloviruses described detected ( co-twins that confirm viromes. similar shared further and sequencing deep by identified months. 24 0 to from progression ( coefficient correlation Spearman the and regression, linear indicates Line points). ( richness bacterial and richness bacteriophage ( months. 0–24 from progression age indicates spectrum Color shown. is coefficient correlation Spearman ( ( 6 Figure nature medicine nature ( bias method of artifact an not was it that n a Siphoviridae Fig. 3 Fig. P ) Correlation between bacteriophage diversity and bacterial diversity diversity bacterial and diversity bacteriophage between ) Correlation = 48 sampling time points). Line indicates linear regression, and the the and regression, linear indicates Line points). time sampling = 48 < 0.05), peaking at 6–12 months of age ( age of months 6–12 at peaking 0.05), < Anelloviruses were the most prevalent eukaryotic DNA virus family Supplementary Fig. 3d Supplementary Fig. 4 Fig. f

and and Inverse relationships between bacteriophages and bacteria. bacteria. and bacteriophages between relationships Inverse Fig. Fig. 3 c Fig. Fig. 4 , Wilcoxon test test Wilcoxon , Supplementary Fig. 3e Fig. Supplementary family, consistent with other studies other with consistent family, abundance was also seen in the SIA data, indicating indicating data, SIA the in seen also was abundance , , Inoviridae upeetr Fg 3f Fig. Supplementary d Fig. 4 Fig. b ). ). Almost all wereanelloviruses previously unknown ). Likewise, bacteriophage diversity decreased with decreased diversity bacteriophage ). Likewise,

advance online publication online advance r 17 ) is indicated. Color spectrum indicates age age indicates spectrum Color indicated. ) is e , and and 26 , , , Myoviridae 2 9 P ). ). As anellovirus load has previously been Fig. 3 Fig. . In contrast to the eukaryotic virome, virome, eukaryotic the to contrast In . < 0.01). The interpersonal variation in in variation interpersonal The 0.01). < Supplementary Fig. 4a Fig. Supplementary Microviridae h b ) Correlation plot between between plot ) Correlation ). Further, in some instances, the the instances, some in Further, ). Fig. 4 Fig. ). This approach yielded 98.6% 98.6% yielded approach This ). and and ). Anelloviruses were rarely rarely were Anelloviruses ). a Podoviridae Fig. 3 Fig. Supplementary Fig. 4b Fig. Supplementary n , , Wilcoxon test = 48 sampling time time sampling = 48 Fig. 3 Fig. bacteriophages by 24 24 by bacteriophages 20– f 2 ), suggesting either either ), suggesting Caudovirales 2 , we hypothesized , we hypothesized 26 g

, ). Notably, one one Notably, ). ). This shift in in shift This ). 29 families) and and families)

, 3 3 . However, . P Fig. 4 Fig. < 0.01). Fig. Fig. 3 order

d ). ). ). ). e - - - - ­

Bacterial microbiome changes in infants Microviridae richness and diversity, accompanied bycommunity a shift toward bacteriophage a of predominantly contraction a by marked was opment devel infant early Thus, in life. early is not acquired that crAssphage suggesting reads), 50,355 month; 24 A2, (infant specimen one of only that bacteriophage ubiquitous with globally a crAssphage, correlated inversely was ( species particular a by driven not was expansion this that indicating in increase An ing, bacteriophage and bacterial richness were richness inversely in correlated and bacterial ing, bacteriophage find this with Consistent of age. by months 24 community diversity bacterial bacteriophage–high low a toward months 0 at community bacterial microbiome shifted from a with high bacteriophage–low bacterial diversity correlated inversely ( diversity cohort, our was In infants. diversity in defined bacteriophage be to yet has relationship this intestine adult the in stable relatively are populations display dynamics oceans in predator-prey relationships bacteriophage-bacteria Although Predator-prey–like bacteriophage and bacteria relationships observed previously changes of trajectory expected the with consistent was study this in infants of microbiota was ( co-twins infants unrelated between between than variation less interindividual The months. 24 and an increase in and in increase an months, 0 at by of was This the preceded predominance abundant P and studies other ( age with associated was UniFrac unweighted of that indicated matrices distance community variation in the bacterial analyses Principal-coordinate threshold. (OTUs) at identity a 97% units taxonomic operational into clustered (s.d. sample per reads of 67,569 average an generate to sequencing gene bacterial rRNA 16S bacterial performed we Therefore, microbiome. bacterial the in changes with correlated changes these whether understand to sought we virome, bacteriophage the in observed we changes matic age of years by 2–3 ‘adult-like’ population stable more a into mature populations increasing bacterial as by diversity and characterized richness been have micro infancy bacterial early during intestinal biota healthy a of signatures ecological The Supplementary Fig. 4c Fig. Supplementary a < 0.001) also increased with age. Overall, we identified increasingly increasingly we with age. < identified Overall, increased also 0.001) Bacteriophage diversity 0 2 4 6 Actinobacteria Supplementary Fig. 5c Fig. Supplementary 1 0 Clostridia Bacterial diversity Fig. 6 Fig. 2 0 composition. 8– ± 39,862 reads). Quality-filtered sequence reads were were reads sequence Quality-filtered reads). 39,862 1 0 a 3 0 , bacterial richness ( richness , bacterial Microviridae ). By examining temporal trends, we found that the the that we found trends, temporal By examining ). P Bacteroidia r =–0.631 <0.0001 ( ( Firmicutes Actinobacteria 4 0 , Supplementary Fig. 5a Fig. Supplementary d 30 ). The relative abundance of of abundance relative The ). 0 , Age 3 ) and diversity ( diversity and ) Gammaproteobacteria 1 24 months 18 months 12 months 6 months 3 months 0 month ( species richness was also observed, observed, also was richness species , both bacteriophage and bacterial bacterial and bacteriophage both , Bacteroidetes ) ( Fig. 5 ) abundance at 3 and 6 months, months, 6 and 3 at abundance ) Fig. 5 Fig. Fig. 5 Fig. c b

and Bacteriophage richness 100 150 8– 50 a Bacilli 0 ) ) abundance at 12, 18 and Microviridae 1 , , Wilcoxon test Supplementary Fig. 5d d 0 0 Fig. 5 Fig. ). Hence, the bacterial bacterial the Hence, ). . e c r u o s e r 8 , 200 , Bacterial richness b ( 4 9 , 1 ). Consistent with with Consistent ). Firmicutes 1 , was detected in in detected was , ( b 1 . Given the dra the Given . , Wilcoxon test Wilcoxon , Proteobacteria 17 400 Caudovirales , 2 ( 6 P . However, . r =–0.700 <0.0001 i. 4 Fig. P 600 < 0.001 ) ) OTUs 800 f ). ). ).  - - - - )

© 2015 Nature America, Inc. All rights reserved. drives a shift in the bacteriophage composition (including increased increased (including composition bacteriophage the in a shift drives commu nity, bacterial it allowing to and establish the colonize gut ( the on pressure ( predatory virome the relieving bacteriophage thereby the in contraction a to leads This ( density colonization bacterial low of because able ( Volterra model early infant development with begin the latter of dynamics the Lotka- precede peaks Our studycycle) ( predator-prey predator reversed a as is, to referred also (that peaks, prey abundance predator controls also model diversity prey the limited the whereby relationship of reciprocal the model, describes aspect “predator-prey” recognized Lotka-Volterra commonly most classical the is this Although (predator), which subsequently decreases bacterial (prey) populations. bacteriophages the in increase in the precedes peak population a (prey) whereby bacterial microbiota, intestinal human the in occur wane. antibodies maternal IgG as human of nadir the with expansion could this be the result of that lowered immune state, as it Wecoincides speculate assays. PCR using detect deep- to difficult be unbiased may they using as virome, the of define systematically to value approaches sequencing the underscoring anelloviruses, known from divergent highly were anelloviruses these of most age; gut in at the 6 and of 12 months richness of anellovirus an expansion infants of serum the in PCR by detected frequently levels immunosuppression recipients with transplant in associated been have serum in the load anellovirus in changes because immunocompetence tional points. time same at the sampled of when viruses strains eukaryotic near-identical of co-twins in detection the by evidenced was This environment. common a share still generally twins infant the as composition, virome of drivers primary the are exposures environmen tal that fact the reflect may this infancy, during matters individuals unrelated were than other each to similar more not were twins and individual each to unique was virome DNA the which in co-twins than a of study adult with contrasts co-twins result This infants. unrelated between between similar more is community) bacterial and (virome microbiome infant the that determine to study us enabled twin design The development. virome infant healthy of milestones of communities 8 infants (4 pairs) and co-twin uncover the microbial bacterial and virome complete the defined longitudinally we Here, health. human in role its or microbiome bacterial the on impact its infant, the in develops virome the way the about known is less much allergies and food disease bowel obesity, inflammatory as such phenotype host of aspects for intestinal implications long-term will have infant that microbiome adult the an ordains of likely microbiome Assembly bacterial immunity. and development Interactions among the intestinal microbiota influence host physiology, DISCUSSION life. of years early the during in trajectory a evolves dynamic microbiome and bacterial negative correlations ( by dominated were model, mixed linear a using calculated bacteria, cal trends, correlations genera between specific of bacteriophages and an manner ( age-dependent e c r u o s e r  Microviridae Fig. Fig.

To date, no evidence indicates that “kill-the-winner” that indicates evidence no Todate, func of biomarkers as serve to proposed been have Anelloviruses 4 , 0 month), but the bacteriophage population is unsustain is population bacteriophage the but month), 0 , 1 Fig. 7 . . One interpretation is possible that ‘twin-ness’although abundance) that has been selected for in the newly newly the in for selected been has that abundance) 3 6 2 . . We posit that bacteriophage diversity is high at birth ) suggests that bacteriophage-bacteria interactions in Supplementary Supplementary Fig. 6 20 , 2 1 Fig. Fig. 6 . Additionally, anelloviruses have been been have anelloviruses Additionally, . b ). Further reflecting these ecologi these ). reflecting Further ). ). Thus, the infant virome Fig. Fig. Fig. Fig. 4 2 5 . We observed Weobserved . ). ). In turn, this 4 , 5 1 4 4 3 , 0 month). month). 0 , . . However, dynamics dynamics Fig. Fig. 32 , 4 4 4 ), ), - - - - - .

early development. early the into insight during gut the in bacteria detailed and viruses between interactions dynamic provides study current our Regardless, microbiome. on the factors of role of these the assessment precluded zygos size cohort of small the terms delivery, of mode in and status differed breastfeeding ity, who individuals microbiome included cohort bacterial our the influence delivery as (such factors external of diet and geography basis the on differ might composition bacteriophage (identities) founder specific the that possible is phenotype, it generalizable a be might birth at microbiota development healthy infant with healthy associated for contraction bacteriophage reference Although a development. as serve will data our Nonetheless, life). of 2.6 day was sampling first of day median (the transmission prepartum or vertical of possibility the raise data our diversity, bacteriophage early the of source the address to unable is that has been reported in adults community into stable the transitions infancy during state microbial present taxon study adult, single longitudinal a of 2.5-year a In community. bacterial established 4. 3. 2. 1. reprints/index.htm at online available is information permissions and Reprints The authors declare no competing interests.financial metadata. D.W.E.S.L., L.R.H., and P.I.T. wrote and edited the manuscript. P.I.T., B.B.W. and I.M.N. recruited the study participants and managed the experiments. E.S.L., G.Z. and Y.Z. processed and analyzed the sequencing data. and E.S.L. prepared samples for sequencing. E.S.L and I.K.B. performed PCR andE.S.L., L.R.H. D.W. conceived and the designed experiments. L.D., I.K.B. E.S.L. is an Eli & Edythe Broad Fellow of the Life Research Sciences Foundation. the Pathogenesis of Infectious awardDisease from the Burroughs Wellcome Fund. possible by support from the Foundation).Gerber D.W. holds an Investigator in Development and from the Foundation for the National Institutes of Health (made the Eunice Kennedy Shriver National Institute of HealthChild and Human and to UH3AI083265 P.I.T. and B.B.W.). P.I.T. and B.B.W. received funding from (5P30 DK052574 (Biobank, Digestive Research Diseases Core Centers) to P.I.T. InstituteDiscovery (MD-FR-2013-292) and the US National Institutes of Health cooperation with, the study. This work was supported in part by the Children’s We thank the infants’ families and physicians for their participation in, and online version of the pape number Note: Any Supplementary Information accession and Source Data files are available in the BioProject under SRP05839 Archive Read Sequence codes. Accession the o version in available are references associated any and Methods M CO AUTH Ac

ethods It has been well described that diet, antibiotics and mode of of mode and antibiotics diet, that described well been has It kn MPETING FINAN MPETING Cho, I. & Blaser, M.J. The human microbiome: at the interface of health and disease. N. Qin, and metagenomics Kingdom-agnostic H.W. Virgin, & S.A. Handley, J.M., Norman, disease. and physiology H.W.mammalian Virgin, in virome The Nat. Rev. Genet. Rev. Nat. 513 communities. microbial enteric of Gastroenterology characterization complete of importance the (2014). o O , 59–64 (2014). 59–64 , wledgments R R CO f the pape the f t al. et 9 . NTRIBUTI 2 leain o te ua gt irboe n ie cirrhosis. liver in microbiome gut human the of Alterations 6 l . , raising a question about how , a and the dynamic raising question when advance online publication online advance

146 13 Microviridae Sequence data have been deposited to the NCBI NCBI the to deposited been have data Sequence r , 260–270 (2012). 260–270 , C 9 . , 1459–1469 (2014). 1459–1469 , , r IAL INTERESTS 28 . O , 3 NS 6 ). were the predominant bacteriophage bacteriophage predominant the were 9 , 17 , 2 6 . Additionally, although our study

8 , http://www.nature nature medicine nature 11– 15 Cell , 2 8

157 . Although Although . , 142–150 , online online Nature .com/ -

© 2015 Nature America, Inc. All rights reserved. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. nature medicine nature

Barton, E.S. Barton, Handley, S.A. L. Li, McElvania TeKippe, E. K. Béland, De Vlaminck, I. J. Oh, Goodrich, J.K. A. Reyes, P.O.of Brown, Development & D.A. Relman, D.B., DiGiulio, E.M., Bik, C., Palmer, evolutionary recent and ecology microbiome: gut human The R. Ley, & J. Walter, Turnbaugh,P.J. P.S.Rosa, La M.G. Dominguez-Bello, F.Bäckhed, S. Subramanian, T. Yatsunenko, J.E. Koenig, review. brief emerging a microbiome: and gut neonatal human transplantation The V. Bhandari, microbiota & E.C. Gritz, Fecal A. Khoruts, & T.J. Borody, and normal in microbiota commensal the of Role E.G. Pamer, & D.R. Littman, infection. virome. enteric the of expansion with (2013). fever. with pathogen. common (2014). a 247–254 about insights new transplantation: therapy. antiviral and metagenome. (2014). mothers. microbiota. intestinal infant human the changes. (2009). 480–484 gut. infant newborns. in habitats USA Sci. body Acad. multiple across microbiota initial the of life. of year first the children. Bangladeshi Nature (2011). microbiome. gut Pediatr Front. applications. responses. immune host pathogenic et al. et

486 t al. et Nature Annu. Rev. Microbiol. Rev. Annu. Nature t al. et AIDS alters the commensal plasma virome. plasma commensal the alters AIDS PLoS ONE PLoS Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. , 222–227 (2012). 222–227 , t al. et et al. et igorpy n idvdaiy hp fnto i te ua skin human the in function shape individuality and Biogeography et al. et Nat. Rev. Gastroenterol. Hepatol. Gastroenterol. Rev. Nat. et al.

t al. et et al. et Nature et al. 3 et al.

t al. et , 17 (2015). 17 , et al. et iue i te acl irboa f ooyoi tis n their and twins monozygotic of microbiota faecal the in Viruses

107

466 Dynamics and stabilization of the human gut microbiome during microbiome gut human the of stabilization and Dynamics oqe eo iu i cide wo newn otooi liver orthotopic underwent who children in virus Teno Torque 447 Herpesvirus latency confers symbiotic protection from bacterial from protection symbiotic confers latency Herpesvirus Pathogenic simian immunodeficiency virus infection is associated t al. et rc Nt. cd Si USA Sci. Acad. Natl. Proc. Patterned progression of bacterial populations in the premature the in populations bacterial of progression Patterned Human genetics shape the gut microbiome. ucsin f irba cnota n h dvlpn infant developing the in consortia microbial of Succession

Temporal response of the human virome to immunosuppression , 11971–11975 (2010). 11971–11975 ,

Cell Host Microbe Host Cell A core gut microbiome in obese and lean twins. lean and obese in microbiome gut core A , 334–338 (2010). 334–338 , ua gt irboe iwd cos g ad geography. and age across viewed microbiome gut Human 514

, 326–329 (2007). 326–329 , Cell advance online publication online advance Nature et al. 7 et al. et , e50937 (2012). e50937 , esset u mcoit imtrt i malnourished in immaturity microbiota gut Persistent

, 59–64 (2014). 59–64 , 155 Increased prevalence of anellovirus in pediatric patients

Delivery mode shapes the acquisition and structure and acquisition the shapes mode Delivery 510 , 1178–1187 (2013). 1178–1187 ,

65 , 417–421 (2014). 417–421 , , 411–429 (2011). 411–429 , Cell Host Microbe Host Cell Cell

111 17 PLoS Biol. PLoS

, 690–703 (2015). 690–703 , 151 , 12522–12527 (2014). 12522–12527 ,

, 253–266 (2012). 253–266 , 9 , 88–96 (2012). 88–96 ,

108 5 , e177 (2007). e177 ,

J. Virol. J. spl 1, 4578–4585 1), (suppl. 10 , 311–323 (2011). 311–323 , . net Dis. Infect. J.

Cell 87 , 10912–10915 ,

159 Nature rc Natl. Proc. , 789–799

457 209 , ,

39. 38. 37. 36. 35. 34. 33. 32. 31. 30. 29. 28. 27. 26. 44. 43. 42. 41. 40.

une E.A. Gurnee, Olszak, T. Kapoor,A. L.R. Holtz, Finkbeiner,S.R. Kapusinszky, B., Minor, P. & Delwart, E. Nearly constant shedding of diverse enteric M. Breitbart, for controlling predator-preycycles. bacteriophages reverse can Coevolution Weitz,J.S. & M.H. Cortez, of Significance M. Simon, & K.P. Hennes, M.W.Lomas, M., Breitbart, R.J., Parsons, reveals time-series Ocean C.A. Carlson, & Norman, J.M. S. Minot, M. Breitbart, S. Minot, hnsa, .. lmns f ter fr h mcaim cnrlig abundance, controlling mechanisms the for theory a of T.F. Elements Thingstad, Rodriguez-Valera,F. H. Okamoto, & T. Shimosegawa, T., Nishizawa, M., Takahashi, M., Ninomiya, B.E. Dutilh, metagenomics. F.Rohwer,Viral & R.A. Edwards, Virology discovery. infants. healthy two by viruses 159 USA Sci. Acad. lake. mesotrophic (1995). 333–340 a in growth bacterioplankton Sea. Sargasso northwestern the in dynamics virioplankton of patterns seasonal recurring disease. bowel diet. to response feces. human USA infdi ciprofloxacin-resistant pathogenic function. cell T killer children. (2008). Asian South in iest, n boeceia rl o ltc atra vrss n qai systems. aquatic in viruses bacterial Oceanogr. Limnol. lytic of role biogeochemical and diversity, predation. infancy. during infection triple or dual of acquisition early and anelloviruses human three of detection differential for specific primers nested with assays PCR of Development metagenomes. faecal human of sequences (2005). , 367–373 (2008). 367–373 ,

s/jiv27 ISME J. ISME 110

468 J. Clin. Microbiol. Clin. J. , 12450–12455 (2013). 12450–12455 , et al. et al. et PLoS Pathog. PLoS Nat. Rev. Microbiol. Rev. Nat. t al. et et al. et 9 et al. et (2015). –

t al. et et al. et 470 et al. et et al. 6 t al. et J. Bacteriol. J. Microbial exposure during early life has persistent effects on natural Cell , 273–284 (2012). 273–284 ,

Rapid evolution of the human gut virome. gut human the of evolution Rapid A highly prevalent and genetically diversified genetically and prevalent highly A 111 h hmn u vrm: ne-niiul aito ad dynamic and variation inter-individual virome: gut human The et al. et Geographic variation in the eukaryotic virome of human diarrhea. human of virome eukaryotic the in variation Geographic Genome Res. Genome , 556–564 (2014). 556–564 ,

hgl audn bceipae icvrd n h unknown the in discovered bacteriophage abundant highly A Disease-specific alterations in the enteric virome in inflammatory 45 Metagenomic analyses of an uncultured viral community from community viral uncultured an of analyses Metagenomic

Viral diversity and dynamics in an infant gut. infant an in dynamics and diversity Viral u clnzto o haty hlrn n ter ohr with mothers their and children healthy of colonization Gut 160 et al. et , 7486–7491 (2014). 7486–7491 , Metagenomic analysis of human diarrhea: viral detection and detection viral diarrhea: human of analysis Metagenomic , 1320–1328 (2000). 1320–1328 , Science

, 447–460 (2015). 447–460 , 4 Explaining microbial population genomics through phage through genomics population microbial Explaining , e1000011 (2008). e1000011 ,

185 46 rc Nt. cd Si USA Sci. Acad. Natl. Proc.

, 507–514 (2008). 507–514 , 336

, 6220–6223 (2003). 6220–6223 , J. Clin. Microbiol. Clin. J. 7 21 , 828–836 (2009). 828–836 , , 1616–1625 (2011). 1616–1625 , , 489–493 (2012). 489–493 , shrci coli Escherichia Nat. Commun. Nat. Nat. Rev. Microbiol. Rev. Nat.

50 . pl Evrn Microbiol. Environ. Appl. , 3427–3434 (2012). 3427–3434 , . net Dis. Infect. J. e c r u o s e r

5

Proc. Natl. Acad. Sci. Acad. Natl. Proc. , 4498 (2014). 4498 , 105 Picornaviridae 20482–20487 , Res. Microbiol. Res.

doi:10.1093/

3 , 504–510 , Proc. Natl. Proc. genus

61  ,

© 2015 Nature America, Inc. All rights reserved. h p viral databases for this study can be downloaded at 449,469the following: sequences and(viral a viral NT) NR database of 621,095 sequences. Theover customized98% of the sequence length), resulting in acustomized viral NT database of CD-HIT 2013). publiclythe available databaseandNCBINRNT (downloaded November7on ofallsequences withthe“Viruses” superkingdom taxonomic classification from againsta customized virus database. The customized virus database is comprised filter of Q30 Phred quality score. Candidate viral readsutils package were identified by querying sequences were trimmed. Overlapping reads were joineduptotaxonomic usingassignment. fastq-joinSequencing reads werein demultiplexed the ea- andadaptor pair,point)duringprocessingtwintimeviromesequences of age, (i.e., cation processing. sequence Virome 4 sequencing MiSeq runs, 2 × 250 paired-end reads, v2 MiSeq reagent kit). of (total UniversityWashington at Biology Systems & Sciences Genome for per sequencing run and sequenced on the Illumina MiSeq platform at the Center Hence, 24 libraries and 1 Orsaycontrol virus wereatlibrary pooled equimolar Orsay virus RNA1 segment nematode the from derived cDNA of library indexed uniquely a etc.), error, misidentification index to lead that clusters contamination that might occur after library construction (for example, mixed from cross- separately specimen of level the Additionally, evaluate to sequenced. not was and sequenced and pooled multiplexed MDA were libraries. One SIA sample (C2-0) failed library construction libraries SIA Multiplexed Technologies). (Agilent Bioanalyzer 2100 a using quantification by followed andfied using size-selected Agencourt Ampure XP (Beckman-Coulter), beads and used for Nextera DNA library construction (Illumina). Libraries were puri (GenomiPhi V2 kit, GE Healthcare) according to the manufacturer’s polymerase phi29 with amplified was acid nucleic (MDA),amplificationtotal instructions displacement multiple For (Illumina). construction library DNA Nextera for random 15-mer (15 Ns) for random a of priming upstream as sequence previously specific (nt) describednucleotide 16 base-balanced a of sisting amplification (SIA) on was performed the total nucleic acid with primers con RNA and DNA independent sequence The methods. amplification following the to subjected then and run) sequencing per groupingsamplesof define (to recommendation. Samples were randomized using a random number generator a through on the COBAS filtered Ampliprep instrument (Roche) according and to the manufacturer’s ratio 1:6 a 0.45- at (PBS) saline phosphate-buffered in sequencing. Virome s.d. days, 718.5 (avg. s.d. days, 545.0 (avg. months 18 mont), 6 days), as 0 s.d. 2.6 months days, (avg. defined was life of age the manuscript, the in clarity of purpose the For age). to a from specimen the second infant in twin pair “A” at collected 24 months of the infant twin pair followed designation by the age (for example, A2-24 refers 1a Fig. ( illness acute of episodes had infants The criteria. exclusion had no apparent or genetic underlying chronic There disorders. were no other and mode of delivery. In this study, we defined these infants as ‘healthy’ females) as 2 they and males (6 sex status, breastfeeding zygosity, of terms in varied pairs (8 infants) in this study were chosen representative of healthy infants who medical of reviews records from the or physicians of the ( twins parents of interviews regular by vomiting/diarrhea) (fever/ illness of episodes and content feeding infants, to given medications −80 °C until analysis, as described at stored and packs frozen containing envelopes insulated in laboratory the years 3 age through their monthly from children specimens fecal collect to infants twin of mothers the from sent con We Louis. obtained St. in Medicine of School Washingtonof University Samples. ONLINE METHODS nature medicine nature a t t t p h : o / µ l / o p m-pore-size membrane. Total nucleic acid was extracted from the filtrate ). The nomenclature used to label specimens in the study begins with with begins study the in specimens label to used nomenclature The ). g a y t . h This study was approved by the Human Research Protection Office Office Protection approved was by study Human the This Research w o u l 4 o s 7 t g .Low quality nucleotides were trimmed and discarded at aquality l . y e 4 . d 8 w a ue t mnmz sqec rdnac (8 identity (98% redundancy sequence minimize to used was u u / v s t i Fecal specimens (approximately 200 mg) were diluted diluted were mg) 200 (approximately specimens Fecal ± r l . u e 11.6 days). 11.6 s d s u e / e 4 v k 6 i e was included in the pool for each sequencing run. r r u / Investigators were blinded to the group allo group the to blinded were Investigators d ± s s a 1.1 days), 3 s.d. days, 1.1 days), 98.0 months (avg. e t 3 a e 9 / k . . Data included collected mode of delivery, V e r i r / u 3 d 9 s a . Fecal specimens were couriered to to couriered were specimens Fecal . D Supplementary Fig. 1a Supplementary t a B / N V 4 5 i T , bioinformatic demultiplexing demultiplexing bioinformatic , r u _ 2 s ± D 0 17.4 days) and 24 months months 24 and days) 17.4 1 B 3 N 1 1 R 0 _ 7 2 _ 0 I D 1 3 Supplementary Supplementary 9 1 8 1 . t 0 ). ). The 4 twin g 7 3 z 5 ; (viral NR) _ , and used I D 9 h 8 ± t . t t 2.7 p g : z / - - - - / .

generatedcontaining from plasmid a region of the interest MEGAscriptusing To s. assay, 30 forthis standardcurve generate a were used: 50 °C for 5 min, 95 °C for 20 s, 40 cycles of 95 °C for 3 s and 58 °C for 10 pmol of each primer, and 5 pmol of probe. The following cycling conditions Mix 20 The Biosystems). (Applied Master 1-Step Virus Fast TaqMan the using performed was qRT-PCR (5 (5 AN344R and primers were used: AN345F (5 parechovirus for samples all screen to used was genome the TaqManRT-PCR 5 the in sequences assay targeting conserved analysis. parechovirus Human at least twice. was assessed by 1,000 nonparametric bootstraps, and analyses were performed (version 2.16) and ProtTest (version 2.4) accordingly (version 3.00) PhyML with constructed were trees phylogenetic (ML) Maximum-likelihood Viromeanalysis. specimens yielded any Orsay sequencing virus reads. (s.d. reads 463,315 was itself library above. described as The average run number of Orsaysequencing virus sequencing MiSeq reads obtained each from the into control pooled was that virus each evaluated we demultiplexedfor presence of datathe set that reads map control to the Orsay cross-contamination, specimen for assess To reads). (2 the analyses: (3 reads), the (1 read) reads, and the in following regions false-positive virus family taxonomic complexity/repetitive assignments were low omitted from of presence the to Due Score: 40.0, Max Expected: 0.01, Top Percent: 10.0, Min-Complexity filter: 0.44 . Megan (version 5.8.6) (ref. was determined using the lowest-common-ancestor algorithm implemented in in-house perl and python scripts. Bacteriophage species taxonomic assignment output using BLAST the from parsed were assignments taxonomicgenus and ing reads was bydetermined the taxonomy ID of the top BLAST result. Family fungal, etc.) as previously described human, example, (for sequence nonviral a to corresponding hit BLAST top a 1E cutoff (e-value BLASTx using database 1E cutoff (e-value MegaBLAST using database NT NCBI the against reads viral candidate the 1E off BLASTn1E using cutoff (e-value tially sequen database viral customized the against queried were reads Sequencing ( reference genomes (human parechovirus 1 ( using Emperor using 500 permutations. Principal-coordinate analyses (PCoA) was performed the using performed ity), agglomerative were analyses hierarchical clustering and rarefaction curve andrichness diversity measurements (Shannon Bray-Curtis dissimilar index, MDA used have historically which ies stud virome bacteriophage other to comparable be to findings our allow to data MDA the it representation better a had because both used of DNA ( viruses we community, virome bacteriophage community. the virome analyze To eukaryotic the of analysis standalone a measurements, in diversity rarefaction) example, (for measurements ecological common was inadequate to accuratelyviruses The mostassess prevalence of eukaryotic and bacteriophages). viruses virome ofin analyses global (eukaryotic the used plotted it as an unweighted (presence/absence) heatmap.and This data merged data MDA was and SIA rarefied the merged we composition, virome global Thus, results obtained from a representative iteration are shown. To the define heatmap. Consistent results iterations. across were all analyses obtainedthe in ( depth Fig. 1b sequencing the on based iterations) (5 method ple sam per reads 200,000 to replacement) without (subsampling rarefied were NC_00147 ′ To examine inter-individual virus isolates, sequencing reads were mapped to /6-FAM/CCTRYGGGTACCTYCWGGGCATCCTTC/TAMRA/-3 −3 ). The ). number for of taxon was plottedreads detected a given viral as a ). False positive viral sequences were filtered by sequentially querying querying sequentially by filtered were sequences viral positive False ). 2 ) and crAssphage ( crAssphage and ) 5 5 5 using appropriate evolution models as assessed by jModelTest2 2 ′ -GGCCCCWGRTCAGATCCAYAGT-3 . Sequencing reads generated by the SIA and MDA methods methods MDASIA and the by generated reads Sequencing −10 vegan , LSn evle uof 1E cutoff (e-value BLASTn ), 5 0 R package R ) with the following parameters: Min Support: 1, Min ′ -GTAACASWWGCCTCTGGGSCCAAAAG-3 µ L reaction included 5 5 included reaction L JQ99553 A previously described pan-parechovirus pan-parechovirus described previously A 4 9 . . The taxonomic assignment for sequenc ± 5 −10 106,337 reads). None of the 48 infant 48 the of None reads). 106,337 1 17 . Rarefaction curves were performed performed were curves Rarefaction . 7 , ), followed by BLASTx (e-value cut (e-value BLASTx by followed ), )) using bowtie 2 and geneious and 2 bowtie using )) 26 −3 , FM17855 2 ) to remove sequences that have that sequences remove to ) 9 . Ecological analyses including including analyses Ecological . invitro 56 −10 8 , µ 5 ), ), human enterovirus B 7 L of extracted sample, extracted of L transcribed RNA was transcribed ), and the NCBI NR NR NCBI the and ), . . Support for ML trees ′ ), with probe AN257 AN257 probe with ), doi: Fig. 1 5 10.1038/nm.3950 8 Supplementary Supplementary ′ . The following The .

UTR region UTR of b ) and so as and) so ′ ). The The ). 53 , 5 4 ′ ------) .

© 2015 Nature America, Inc. All rights reserved. were assigned to closed reference operational taxonomic units (OTUs) at a 97% were quality filtered at PhredQ20 quality score and demultiplexed. Sequences (Quantitative Insights Into Microbial Ecology, version 1.8.0) (ref. analysis. gene rRNA 16S Bacterial sequencing MiSeq run. libraries a subsequentin their andat re-sequenced cycles 4 specimens PCR 40 yielded these for PCR 16S the D2-0) repeated we Hence, reads). (<10,000 D1-0, reads insufficient A2-0, (A1-0, specimens 4 University. Washington at Biology Systems & reads, Sciences Genome for Center paired-end the at kit) 250 reagent v2 × MiSeq (2 sequencer MiSeq Illumina an using sequenced described previously as (F515/R806) previously described region V4 as forthe specific primers Golay-barcoded using performed was PCR beating bead by disrupted were that specimens sequencing. gene rRNA 16S Bacterial 10 min. Products were byvisualized electrophoresis using 2% agarose gels. for °C 72 byfollowed s, 23 for °C 72 s, 30 for °C 58 s, 30 for °C 95 of cycles 40 min, 5 for °C 95 conditions: cycling following the under Technologies) (Life detected across multiple infants. PCR was performed with Taq DNA polymerase infantoran from points multipletime present in wereeither contigs the data, onsequence based (2) and selected, was gamma) and beta, genus (alpha, each virus contigs were chosen based on the following criteria: (1) a representative of 5 reverse primer primer 5 (forward Contig2393 5 5 primer (forward CTAAAACCTGGAAGTTGC-3 5 primer (forward Contig2355 anellovirus alphatorquevirus an anelloviruses: the of ance genomeand 2 SAMtools contigs bowtie using anellovirus curated the to mapped were sample each from reads sequencing metric bootstraps. Analyses were performed at least twice. genomes. Support for ML trees (LG + I + G + F) was assessed by 1,000 nonpara abovereference thatanellovirus the included alignment acid aminoORF1 the from constructed were trees phylogenetic ML analyses. further for used were phylogenetic relationships of the anelloviruses, only contigs that encoded ORF1 identity were combined by taking the consensus. Additionally, to determine the 2 ( 01408 ( LIL-y1 TTV-like ( ( TLMV-CBD279 TTmV1 torquevirus ( reference anelloviusto genomes:aligned alphatorqueviruswere nt TTV1 ( 500 than greater contigs Anellovirus contigs. anellovirus againstqueried aboveusing the database BLASTx to identify viral customized bled we first sought to curate the anellovirus genome sequences. Contigs were assem to known suggesting anelloviruses, that werethey divergent.highly Therefore, Anellovirus analysis. parechovirus from specimens these by 3 human the amplify to unable were we copies), viral Taqmanlow (atthe assay Although human parechovirus was detected in specimens D1-12 and D2-12 by cific for human parechovirus (5 using an oligo (dT) 3 parechovirus-positive samples in order to perform a phylogenetic comparison. negative for parechovirus. We sought to obtain the orthologous 3D region from werecontrols negative 5 All controls.water-onlynegative 5 with formatplate and a limit of detection of 5 copies was defined. Samples were tested in a 96-well 10 × 5 RNAfrom scribed (Ambion) the per manufacturer’s protocol. dilutions Serial of the doi: AB03862 AF29858 ′ ′ ′ RACE was performed with ThermoScript reverse transcriptase (Life Technologies) -GGAACTCCTGGATTGTCCCATC-3 -GTAGCCAGAATAAGAACTATGCCC-3 AB30355 To determine the prevalence of the anelloviruses bioinformatically, bioinformatically, anelloviruses the of prevalence the determine To 10.1038/nm.3950 de novo 8 ); gammatorquevirus TTmV1 MD1-073 ( 5 5 ), TTV SIA109 ( SIA109 TTV ), ), ), TTV-like TTMV_LY2 ( 7 from all QC-filtered reads using Newbler (version 2.8) (ref. n silico in ), TTmV MDJN1 ( ′ -CATGAGCTTTGTTGCAGAAAGTC-3 20 ′ EF53888 -TCCAAGAGACTTTAAACCAGGCC-3 primer, and subsequently PCR amplified with a primer spe prevalence analysis with PCR assays for three curated curated three for assays PCR with analysis prevalence Many of the anellovirus sequences shared limited identity 6 to 5 copies were used to generate a standard curve curve standard a generate to used were copies 5 to FJ 0 , T-ie I-2 ( LIL-y2 TTV-like ), 42628 AB30355 ′ -CCAGGTTAACAATGAACTATGGCAG-3 ′ ); a betatorquevirus anellovirus Contig2737 Contig2737 anellovirus betatorquevirus a ); ′ -CTGATGTAGATGATGGACATGGC-3 16S analysis was performed with QIIME QIIME with performed was analysis 16S 0 JX1340 2 ), TTV8 genotype 22 ( 22 genotype TTV8 ), 9 AB02 . Equimolar libraries were pooled and pooled were libraries Equimolar . Nucleic acid was extracted from fecal fecal from extracted was acid Nucleic 8 ′

). Contigs that shared >95% nucleotide RACE and RT-PCR. ′ ); a gammatorquevirus anellovirus anellovirus gammatorquevirus a ); 4 693 ′ 5 53 rvre rmr 5 primer reverse , ), ), TTmV5 TGP96 ( AB29091 , 6 1 0 ), TTV-like TLMV-CLC062 TTV-like ), . We . concord the evaluated EF53888 AB00839 8 ′ ), TTmV MDJHem8- ). These three anello 1 AB05464 , TV ( TTmV3 ), ′ , reverse primer primer reverse , 6 4 1 ), TTV-P1C1 in vitro ). Sequences AB04196 ′ -TACTGT 7 5 ); beta ); 9 tran ) and NC_ 2 2 ′ 9 ), ), ). ′ ------. ,

(paired, nonparametric) richness applied was virus to compare eukaryotic the groupsthe tribution and variances between Wilcoxon wereequal verified. test Statistics. using Cytoscape. The analyses were performed in R version 3.1.2. plottedphageandcorrelation was Bacteria significant. statistically considered were corrected using ratefalse discovery ( subjects were designated as random effects. were log transformed, sample time collection was designated a fixed andeffect phageand relative the abundanceof bacteria model, ferentInpoints. the time model takes into account repeated measurements from the same subjects at dif mixed linear A time. over bacteriophages and bacteria of changes abundance Correlation network. 500 permutations. plotsPCoA were using visualized Emperor. the using performed were using curves Rarefaction QIIME. calculated were distance UniFrac and richness OTU diversity), netic (Faith’sphyloge diversity Alpha shown. are iteration representative a from obtained results Thus, iterations. all across analyses 16S the in obtained were accepted minimum sequencing depth previously described sample reads per iterations),to generally (10 10,000 rarefied the exceed which ment. To account for inter-sample sequencing depth variability, all samples were duringprocessing ofrRNA 16S up totaxonomic OTU gene sequences assign (ref. 13.8) (version Investigators database were blinded Greengenes to the group the allocation using (i.e., age, threshold twin pair, identity time point) 63. 62. 61. 60. 59. 58. 57. 56. 55. 54. 53. 52. 51. 50. 49. 48. 47. 46. 45. 95% confidence areintervals shown accordingly. GraphPad with correlation andPrism. Spearman performed was months.monthsWilcoxon regressionatsamples compared24 0 to linear test, ( richness bacterial ( Fig. 3 Fig.

Caporaso, J.G. D. McDonald, Caporaso, J.G. H. Li, M. Margulies, W.A. Nix, models, more 2: jModelTest D. Posada, & R. Doallo, G.L., Taboada, D., Darriba, Abascal, F., Zardoya, R. & Posada, D. ProtTest: selection of best-fit models of protein large estimate to algorithm accurate and fast, simple, A O. Gascuel, & S. Guindon, M. Kearse, 2. Bowtie with alignment gapped-read Fast S.L. Salzberg, & for B. tool Langmead, a EMPeror: R. Knight, & A. Gonzalez, M., Pirrung, Y., Vázquez-Baeza, J.F Oksanen, Integrative S.C. Schuster, & N. Weber, H.J., Ruscheweyh, S., Mitra, D.H., Huson, Zhao, G. sets large comparing and clustering for program fast a Cd-hit: A. Godzik, W.& Li, data sequencing biological processing for tools command-line Aronesty,ea-utils: E. M.A. and Félix, production the in challenges Addressing J. Kelso, & P. Heyn, M., Kircher, sequencesper sample. (2012). archaea. and bacteria of analyses evolutionary and ecological data. 25 reactors. Microbiol. computing. parallel and heuristics new evolution. likelihood. maximum by phylogenies data. sequence (2012). 1647–1649 of analysis and organization the for platform data. community microbial high-throughput visualizing https://cran.r- (2011). analysis of environmental sequences using MEGAN4. pipeline. analysis sequences. nucleotide or protein of http://code.goo nodaviruses. to related viruses novel by Methods data. sequencing illumina of analysis , 2078–2079 (2009). 2078–2079 , a ), bacteriophage richness ( richness bacteriophage ), Nat. Methods Nat. t al. et Student’s

et al. 9 Nature

t al. et Bioinformatics , 357–359 (2012). 357–359 , 46 et al. et et al. et h Sqec AinetMp omt n SAMtools. and format Alignment/Map Sequence The project.org/web/pack , 2519–2524 (2008). 2519–2524 , et al. et Identification of novel viruses using VirusHunter—an automated data et al. t al. et gle.com/p/ea-util et al. et t al. et eeto o al nw prcoiue b ra-ie PCR. real-time by parechoviruses known all of Detection

437 Natural and experimental infection of infection experimental and Natural Geneious Basic: an integrated and extendable desktop software desktop extendable and integrated an Basic: Geneious Fig. 5 Fig. PLoS ONE PLoS t vegan: Community Ecology Package. R package version 2.0-10 version package R Package. Ecology Community vegan: QIIME allows analysis of high-throughput community sequencing

lbl atrs f 6 rN dvriy t dph f millions depth of of diversity rRNA a Global patterns 16S at of -test (two-sided) was performed with SAS. Normal dis Normal SAS. with performed was (two-sided) -test Genome sequencing in microfabricated high-density picolitre high-density microfabricated in sequencing Genome 7 A linear mixed model was used to investigate the relative n mrvd regns aooy ih xlct ak for ranks explicit with taxonomy Greengenes improved An , 376–380 (2005). 376–380 , , 335–336 (2010). 335–336 , Proc.Natl. Acad. Sci. USA a

) and bacterial diversity ( diversity bacterial and ) 21 , 2104–2105 (2005). 2104–2105 ,

8 , e78470 (2013). e78470 , s (2011). ages/vegan/index.htm Fig. 4 Fig. Bioinformatics Syst. Biol. Syst. BMC Genomics BMC Nat. Methods Nat. PLoS Biol. PLoS a q ), bacteriophage diversity ( diversity bacteriophage ), P value). A values from multiple comparisons

108 52

, 696–704 (2003). 696–704 , 22 (suppl. 1), 4516–4522 (2011). 9 Genome Res. Fig. 5 Fig. , e1000586 (2011). e1000586 ,

9 , 1658–1659 (2006). 1658–1659 , 12 q l (2013). Gigascience , 772 (2012). 772 , value less than 0.05 was Caenorhabditis vegan , 382 (2011). 382 , nature medicine nature b 6 3 ) between matched between ) ISME J. ISME . Consistent results R package using using package R Bioinformatics

21

Bioinformatics 2 P , 1552–1560

, 16 (2013). 16 , values and values 6 nematodes , 610–618 , Fig. 4 Fig. . Clin. J.

6 Nat. 28 2 c ). ). ), ), - - - - ,

Supplementary Information

Early life dynamics of the human gut virome and bacterial microbiome in infants

Efrem S. Lim1,2, Yanjiao Zhou3,4, Guoyan Zhao1, Irma K. Bauer3, Lindsay Droit1,2, I. Malick Ndao3, Barbara B. Warner3, Phillip I. Tarr1,3, David Wang1,2, Lori R. Holtz3

1Department of Molecular Microbiology, Washington University School of Medicine, St Louis, Missouri

2 Department of Pathology & Immunology, Washington University School of Medicine, St Louis, Missouri

3Department of Pediatrics, Washington University School of Medicine, St Louis, Missouri

4Current address: The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut; Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts

Address correspondence to: L.R.H. ([email protected]), D.W. ([email protected])

Nature Medicine: doi:10.1038/nm.3950 Supplementary Figure 1 a ♂ Male Stool sampling Fever/vomiting/diarrhea ♀ Female Solid food Antibiotic use Zygosity Delivery mode Diet Weaned ♀ A1 dizygotic cesarean section breastfed only Day 404 ♂ A2

♂ B1 dizygotic vaginal breastfed and formula Day 67 ♀ B2

♂ C1 dizygotic cesarean section formula only N/A ♂ C2

♂ D1 monozygotic cesarean section breastfed and formula Day 43 ♂ D2

0 200 400 600 800 Age (days)

b MDA SIA 10 7 Average reads per sample: 549,301 10 7 Average reads per sample: 551,592 (± 207,521 s.d.) (± 229,210 s.d.)

10 6 10 6 No. of reads

10 5 10 5 0 3 6 12 18 24 0 3 6 12 18 24 Timepoint (month) Timepoint (month)

Supplementary Figure 1. Infant cohort description and sequencing depth. (a) Timeline represents stools collected from 8 healthy (4 twin pairs) infants, as indicated by closed circles. Grey diamonds indicate time when solid food was introduced. Reported cases of fever, vomiting or diarrhea are indicated with grey inverted triangles. Reported antibiotic use is indicated with a white triangle. (b) Number of sequencing reads obtained for each sample for the MDA method (left) and SIA method (right). SIA, sequence independent DNA and RNA amplification; MDA, multiple displacement amplification; s.d., standard deviation.

Nature Medicine: doi:10.1038/nm.3950 Supplementary Figure 2 a 0.2 0.2 Twins A B 0.1 0.1 C D

0 0 PC2 (9.02%) PC2 (9.02%)

-0.1 - 0.1

-0.2 - 0.2 -0.6 -0.4 -0.2 0 0.2 0.4 - 0.3 - 0.2 - 0.1 0 0.1 0.2 PC1 (39.25%) PC3 (7.55%) b 0.2 0.2 Age (months) 0 3 0.1 0.1 6 12 18 24 0 0 PC2 (9.02%) PC2 (9.02%)

-0.1 - 0.1

-0.2 - 0.2 -0.6 -0.4 -0.2 0 0.2 0.4 - 0.3 - 0.2 - 0.1 0 0.1 0.2 PC1 (39.25%) PC3 (7.55%) Supplementary Figure 2. Principal coordinate analysis (PCoA) of viromes. PCoA analysis of unweighted bray-curtis distance of virome communities (eukaryotic viruses and bacteriophages; genera) (n = 48 sampling time points). The variance explained by each PC is indicated on the axis. Specimens from the same twin pairs are indicated in (a), and age from 0 – 24 months is shown in the color gradient in (b).

Nature Medicine: doi:10.1038/nm.3950 Supplementary Figure 3 a Sequencing d Parechovirus RT-PCR 300 5’ UTR 3’ UTR VP0 VP3 VP1 2A 2B 2C 3A 3C 3D AAAAA 200 3B B1-6 100

B2-6 comparisons C1-3 No. of pairwise 0 C2-3 0 25 50 75 100 A1-6 % identity A2-6 e A1 A2 Alphatorquevirus bEnterovirus 15 15 Betatorquevirus 5’ UTR 3’ UTR Betatorquevirus AAAAA VP2 VP3 VP1 2A 2B 2C 3C 3D 10 10 Gammatorquevirus VP4 3A 3B Unclassified anellovirus 5 5 B1-18 B2-18 0 0 0 3 6 12 18 24 C1-18 0 3 6 12 18 24 C2-18 B1 B2 C2-3 10 10 99 C1-18 Shared 8 8 100 C2-18 6 6 Human coxsackievirus A16 (JQ746660) 4 4 2 2 B1-18 93 Shared 0 0 B2-18 100 0 3 6 12 18 24 0 3 6 12 18 24 Human coxsackievirus A7 (GU942820) Enterovirus A (NC_001612) C1 C2 92 Enterovirus B (KF878966) 50 50 100 C2-3 40 40 Human enterovirus B (NC_001472) No. of anelloviruses 30 30 Rhinovirus B (DQ473485) 20 20 10 10 0.06 0 0 0 3 6 12 18 24 0 3 6 12 18 24

c Parechovirus qRT-PCR standard curve qRT-PCR Sequencing qRT-PCR copy no./15mg D1 D2 y = -3.41x + 39.35 Sample reads Ct fecal specimen 2 15 15 R = 0.998 40 B1-6 210 25.69 10,110 B2-6 5,299 25.43 12,058 10 10 30 C1-3 307 24.17 28,191 5 5 20 C2-3 1,245 22.75 73,826 A1-6 31 27.97 2,178 0 0 10 A2-6 0 29.59 730 0 3 6 12 18 24 0 3 6 12 18 24 D2-12 0 29.70 675 0 Threshold cycle (Ct) D1-12 0 35.09 18 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Age (months) Copy number f PCR (+) PCR (-)

Sequencing (+) 7 1

Sequencing (-) 1 135

Supplementary Figure 3. Characterization of eukaryotic RNA and DNA viruses. (a) Contigs mapped to the parechovirus genome are shown. Contigs assembled from sequencing reads are indicated in white, and sequence regions obtained from RT-PCR fragments are indicated in grey. (b) Contigs mapped to enterovirus genome are shown. Phylogenetic analysis of enterovirus amino acid sequences from concat- enated 2BC and 3CD regions. Maximum likelihood analyses was performed, bootstrap support are indicated above branches. (c) Standard curve of the qRT-PCR assay for human parechovirus was performed in triplicate. Error bars indicate the standard deviation. Best-fit line, equation and R-squared values are shown. Comparison of the number of parechovirus sequencing reads to qRT-PCR assay threshold cycle (Ct) for parechovirus-positive samples and the viral load measurement (copy number/15mg of fecal specimen) are shown on the right. (d) Histogram plot of the pairwise % identity comparisons of the anellovirus genome contigs. (e) Distribution of anelloviruses from each infant is shown. The genera distribution is colored as indicated. (f) Concordance between PCR assay and in silico sequence mapping was assessed using a 2x2 contingency table. The PCR assay was based on screening results of an alphatorquevirus, betatorquevirus and gammatorquevirus.

Nature Medicine: doi:10.1038/nm.3950 Supplementary Figure 4 a 1.0 0.8 0.6 0.4 0.2 Relative abundance 0 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 Age (months) A1 A2 B1 B2 C1 C2 D1 D2

Siphoviridae Podoviridae Inoviridae Unclassified Siphoviridae Cellulophaga phage phi4:1 Enterobacteria phage f1 Streptococcus phage 5093 Enterobacteria phage vB Propionibacterium phage B5 Croceibacter phage P2559S Cellulophaga phage phi18:3 Microviridae Burkholderia phage KS9 Unclassified Sp6likevirus Bdellovibrio phage phiMH2K Streptococcus phage P9 Bacillus phage MG-B1 Uncultured Microviridae Enterobacteria phage lambda Myoviridae Marine gokushovirus Streptococcus phage SMP unclassified Myoviridae Chlamydiamicrovirus Bacillus phage BCJA1c Cellulophaga phage phiSM Microviridae phi-CA82 Riemerella phage RAP44 Bacillus phage BCD7 Rhodococcus phage ReqiPoco6 Bacillus phage G Microviridae Streptococcus phage PH15 Clostridium phage phiC2 Unclassified bacteriophages Streptococcus phage YMC-2011 Myoviridae Unidentified phage Lambdalikevirus Rhizobium phage RR1-B Environmental Halophage eHP-12 Siphoviridae Caudovirales Streptococcus phage 2167 Clostridium phage PhiS63 Caudovirales Unclassified phages b c d ** Contig 4_4863 * Contig 55_5491 15 Marine gokushovirus SI2 (KC131023) Marine gokushovirus SOG1 (KC131024) 1.0 Chlamydia phage PhiCPG1 (NC_001998) Chlamydia phage PhiCPAR39 (AE002163) Chlamydia phage 4 (AY769964) 10 Chlamydia phage Chp2 (NC_002194) Chlamydia phage 3 (AJ550635) Bdellovibrio phage phiMH2k (NC_002643) 0.5 Contig 3_7260 5 Contig 18_5750 Contig 17_5979 Contig 19_5732 Chlamydia phage Chp1 (BCP1)

Microviridae richness Microvirus CA82 (NC_015785)

Microviridae abundance 0 0 Contig 16_4946 Contig 11_3897 Contig 3_4870 MDA SIA 0 3 6 12 18 24 Contig 1_6081 Contig 2_6739 Age (months) Contig 105_6165 Contig 5_6085 Contig 15_5695 PhiX174 (NC_001422) 0.2 Supplementary Figure 4. Analysis of the bacteriophage community. (a) Relative abundance of bacteriophages genera. Only bacteriophage taxa with greater than 0.05 relative abundance is shown. (b) Comparison of Microviridae abundance of the MDA and SIA methods from specimens at 24 months age is shown (n = 8 infants). The abundance observed for each sample is indicated with a line connecting the paired method. (c) Richness of Microviridae bacteriophage species is shown (n = 8 infants). Statistical significance was assessed by Wilcoxon test (paired, non-parametric); * P = 0.01 - 0.05, ** P <0.01. (d) Maximum likelihood phylogenetic analyses of the Microviridae bacteriophages identified in this study are shown. Phylogenetic relationships were inferred from the amino acid sequence alignment of the major protein.

Nature Medicine: doi:10.1038/nm.3950 Supplementary Figure 5 0.5 a 0.5 0.5 Age b 0.5 Twins (months) A 0.4 0.4 0.4 0.4 0 B 0.3 0.3 3 0.3 0.3 C 6 D 0.2 0.2 12 0.2 0.2 18 0.1 0.1 24 0.1 0.1 PC2 (9.47%) PC2 (9.47%) 0 0 0 PC2 (9.47%) 0 PC2 (9.47%) - 0.1 - 0.1 - 0.1 - 0.1

- 0.2 - 0.2 - 0.2 - 0.2

- 0.3 - 0.3 - 0.3 - 0.3 - 0.4 - 0.3 - 0.2 - 0.1 0 0.1 0.2 0.3 0.4 - 0.3 - 0.2 - 0.1 0 0.1 0.2 0.3 0.4 - 0.4 - 0.3 - 0.2 - 0.1 0 0.1 0.2 0.3 0.4 - 0.3 - 0.2 - 0.1 0 0.1 0.2 0.3 0.4 PC1 (15.98%) PC3 (5.72%) PC1 (15.98%) PC3 (5.72%) c 2000 24 mo 1500 18 mo 12 mo 1000 6 mo 3 mo 0 mo

Bacterial richness 500

0 0 2 4 6 8 No. of samples d 1.0 e c

n 0.8 a d n

u 0.6 b a

e 0.4 v i t a l

e 0.2 R

0 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 0 3 6 12 18 24 Age (months) A1 A2 B1 B2 C1 C2 D1 D2

Clostridia Clostridia Erysipelotrichi Bacilli Unclass Clostridiales Peptostreptococcaceae Erysipelotrichaceae Planococcaceae Tissierellaceae Clostridium Unclass Erysipelotrichaceae Unclass Planococcaceae Finegoldia Ruminococcaceae Eubacterium Staphylococcaceae Clostridiaceae Unclass Ruminococcaceae Actinobacteria Staphylococcus Streptococcaceae Clostridium Ruminococcus Bifidobacteriaceae Streptococcus Unclass Clostridiaceae Oscillospira Bifidobacterium Corynebacteriaceae Lactococcus Lachnospiraceae Faecalibacterium Corynebacterium Gammaproteobacteria Unclass Lachnospiraceae Veillonellaceae Bacteroidia Enterobacteriaceae Blautia Veillonella Paraprevotellaceae Unclass Enterobacteriaceae Ruminococcus Megamonas Paraprevotella Moraxellaceae Dorea Megasphaera Bacteroidaceae Acinetobacter Roseburia Unclass Dialister Bacteroides Pasteurellaceae Coprococcus Unclass Veillonellaceae Porphyromonadaceae Haemophilus Lachnospira Parabacteroides Pseudomonadaceae Epulopiscium Prevotellaceae Pseudomonas Prevotella Verrucomicrobiae Verrucomicrobiaceae Akkermansia Fusobacteriia Fusobacteriaceae Fusobacterium

Supplementary Figure 5. Analysis of the bacterial community. (a) PCoA plots of bacterial unifrac distance matrices (n = 48 sampling time points). The variance explained by each PC is indicated on the axis. Color gradient represents community progression with age. (b) PCoA plots of unifrac distance matrices (n = 48 sampling time points). Twin pairs are colored as indicated. (c) Rarefaction curves of bacterial richness (OTUs) versus an increasing number of specimen subsamplings with replacement are shown. The curves represent the average of 500 iterations at each depth of samples. (d) Relative abundance of the 40 most abundant bacterial genera is shown.

Nature Medicine: doi:10.1038/nm.3950 Supplementary Figure 6

Streptococcus UC_Microviridae Bdellomicrovirus Staphylococcus Staphylococcus UC_Enterobacteriaceae Streptococcus Bacteroides Prevotella Bacteroides Akkermansia Akkermansia Veillonella Prevotella UC_Clostridiales Bifidobacterium Veillonella UC_Lachnospiraceae Clostridium UC_Gokushovirinae Blautia Lambdalikevirus UC_Lachnospiraceae Clostridium UC_Clostridiaceae UC_Ruminococcaceae UC_Lambda-like_viruses UC_Podoviridae Spiromicrovirus UC_Ruminococcaceae Dorea Bacteroides UC_Siphoviridae UC_Erysipelotrichaceae Chlamydiamicrovirus Prevotella UC_Erysipelotrichaceae Microvirus Ruminococcus UC_Erysipelotrichaceae UC_Clostridiaceae UC_Clostridiales Bifidobacterium Ruminococcus Veillonella UC_Lachnospiraceae Blautia UC_Clostridiaceae Bifidobacterium RoseburiClosa tridium UC_Clostridiales Roseburia Dorea Ruminococcus UC_Ruminococcaceae Akkermansia Dorea Blautia Roseburia Siphoviridae Podoviridae

Microviridae

Prevotella Bifidobacterium Prevotella Prevotella Akkermansia UC_Clostridiaceae Blautia Roseburia UC_Lachnospiraceae UC_Erysipelotrichaceae Bacteroides Bifidobacterium Clostridium UC_Clostridiaceae Streptococcus UC_Clostridiales Ruminococcus Bacteroides Blautia Ruminococcus Veillonella Dorea UC_Myoviridae Bacteroides UC_Inoviridae UC_Caudovirales UC_Ruminococcaceae UC_Clostridiales Akkermansia UC_phages PhiCD119likevirus UC_Ruminococcaceae Inovirus UC_Ruminococcaceae Dorea Clostridium Roseburia UC_Clostridiales UC_Erysipelotrichaceae UC_Enterobacteriaceae Staphylococcus UC_Erysipelotrichaceae UC_Lachnospiraceae Ruminococcus Veillonella UC_Lachnospiraceae Akkermansia Clostridium UC_Clostridiaceae Dorea Roseburia Veillonella Blautia Bacteroides Bifidobacterium

Unclassified Caudovirales Inoviridae Myoviridae Unclassified phages

Bacteria Negative correlation

Bacteriophage Positive correlation

Supplementary Figure 6. Bacteriophage-bacteria network analysis. Network between bacteriophage genera (red node) and bacteria genera (grey node) is plotted by indicated bacteriophage family. Linear mixed model was applied to identify significant changes in relative abundance over time. Negative correlations are shown with blue lines and positive correlations are shown in orange lines. UC, unclassified.

Nature Medicine: doi:10.1038/nm.3950