US 2014.0128289A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2014/0128289 A1 Gordon et al. (43) Pub. Date: May 8, 2014

(54) METHODS OF PROMOTING WEIGHT LOSS (60) Provisional application No. 61/076,887, filed on Jun. AND ASSOCATED ARRAYS 30, 2008, provisional application No. 61/101.011, filed on Sep. 29, 2008. (71) Applicant: The Washington University, St Louis, MO (US) Publication Classification (72) Inventors: Jeffrey I. Gordon, St. Louis, MO (US); (51) Int. C. Peter Turnbaugh, St. Louis, MO (US) CI2O I/68 (2006.01) GOIN33/569 (2006.01) (73) Assignee: The Washington University, St. Louis, (52) U.S. C. MO (US) CPC ...... CI2O 1/689 (2013.01); G0IN33/56911 (2013.01) (21) Appl. No.: 14/147,163 USPC ...... 506/16:506/18 (22) Filed: Jan. 3, 2014 (57) ABSTRACT Related U.S. Application Data The invention encompasses methods of modulating body fat (63) Continuation of application No. 13/002,137, filed on or weight loss. In addition, the invention encompasses arrays Mar. 29, 2011, now abandoned, filed as application that comprise biomolecules associated with an obese host No. PCT/US2009/049253 on Jun. 30, 2009. microbiome or a lean host microbiome.

Second timepoit iitiative oint

: Actrix:3cteria 8 is 888t::cieties 8xt::tiac:ria & 8ts: Patent Application Publication May 8, 2014 Sheet 1 of 50 US 2014/O128289 A1

SEE FIG 11

SEE FG 12

8 r:

FG

SEE FG 1.3

SEE FIG 1.4

SEE FG 15 Patent Application Publication May 8, 2014 Sheet 2 of 50 US 2014/O128289 A1

CONT. A FIG. 1.2 FIG 11 Patent Application Publication May 8, 2014 Sheet 3 of 50 US 2014/O128289 A1

CONT. FROM FIG. 1.1

CONT. AT FIG. 1.3 FIG 12 Patent Application Publication May 8, 2014 Sheet 4 of 50 US 2014/O128289 A1

CONT. FROM FIG. 1.2

CONT. ATFIG. 1.4 FIG 1.3 Patent Application Publication May 8, 2014 Sheet 5 of 50 US 2014/O128289 A1

CON. FROM FIG. 1.3

CONT. A FIG. 1.5

F.G. 14 Patent Application Publication May 8, 2014 Sheet 6 of 50 US 2014/O128289 A1

CONT. FROM FIG. 14

FIG 1.5 Patent Application Publication May 8, 2014 Sheet 7 of 50 US 2014/O128289 A1

A is a

8. 0.78 i 0.76 0.74 0.72 t, 3.88. 8. treated a Se: wit: yi-Yi

102 8

8.2.

ea & Othese

{{ 8. 88: { Nikitixer of sequences

FIG 2 Patent Application Publication May 8, 2014 Sheet 8 of 50 US 2014/O128289 A1

A 0.66 raisiegt. s

8 3.8& 83 s: “s i 3 .8 *: i: .8 5

.8

3.38 featex ea: i: 888c: related

C 16 v213 8.2 8

: 8. 88: t Nisixer of sexiences FIG 3 Patent Application Publication May 8, 2014 Sheet 9 of 50 US 2014/O128289 A1

Full-length 3.3.

so too so 200 hitixer of seriences

28 it is 8 Nissaher of sequences

FIG 3 Patent Application Publication May 8, 2014 Sheet 10 of 50 US 2014/O128289 A1

'l Full-length

so o 150 Ninter of secuences

FIG 3F Patent Application Publication May 8, 2014 Sheet 11 of 50 US 2014/O128289 A1

A 8.

, -

,8.

.

8. ea e8: tireiaki &iate

0.81

.8

.

.8

..y

. ses: sess treated reiated F.G. 4 Patent Application Publication May 8, 2014 Sheet 12 of 50 US 2014/O128289 A1

SEE FIG. 5, 1

SEE FIG 52

F.G. 5 SEE FIG 53

SEE FIG 54

SEE FIG 55

SEE FIG 5.6 Patent Application Publication May 8, 2014 Sheet 13 of 50 US 2014/O128289 A1

CONTAT FIG. 5.2

FIG 51 Patent Application Publication May 8, 2014 Sheet 14 of 50 US 2014/O128289 A1

CONT. FROM FIG. 5.1

3: 8:38: *:::::::::: &:

CONT. A FIG. 5.3 FIG 52 Patent Application Publication May 8, 2014 Sheet 15 of 50 US 2014/O128289 A1

CONT. FROM FIG. 5.2

CON AT FIG. 5.4 FIG 53 Patent Application Publication May 8, 2014 Sheet 16 of 50 US 2014/O128289 A1

CONT. FROM FIG. 5.3

CONT. AT FIG. 5.5 F.G. 54 Patent Application Publication May 8, 2014 Sheet 17 of 50 US 2014/O128289 A1

CONT. FROM FIG. 5.4

FSX F:

FE8

F8

Fly"

CON. A FIG. 5.6

FIG 55 Patent Application Publication May 8, 2014 Sheet 18 of 50 US 2014/O128289 A1

CONT. FROM FIG. 5.5

FIG 56 Patent Application Publication May 8, 2014 Sheet 19 of 50 US 2014/O128289 A1

initia inepoint Second fiegint

A O) o al as w 9 (0 Relative abundance (% of 16S rRNA sequences) : Actioacteria & ificates & Bacteroicieties : roteoacteria & their F G 6

Patent Application Publication May 8, 2014 Sheet 21 of 50 US 2014/O128289 A1

4.s8

83 ::::::: 88 it : : 283 :

* 33

: 388 ft 38 : as : 8 83 it fisii gene s

20 30 40 50 60 70 80 90 % identity

FIG 8 Patent Application Publication May 8, 2014 Sheet 22 of 50 US 2014/O128289 A1

se & 8 5 9 so s ce is 40 8 . $ox 20

20 30 40 50 . . 60 to 80 90 100 .::::::::... E."8 Bit-score w:W 38 st ''''''-::::::::::::... 8 : 383 : : 3S D *:::::::: & * - & 83 : 883 : fisii ge:38 Patent Application Publication May 8, 2014 Sheet 23 of 50 US 2014/O128289 A1

E

. 8

.

. 2

38 : 8 : 2 3. : s 80 o::...'... S8 : 8 s identityx (xx 382 : : 3 : : 3S : *::::::::::::::::::::::::::: $8 : 3 : 883 : . 8. ::::::. XX fu:ii ge:8

.& 4. .

www &::::::::::::::::::::::::::ors 2 3i 8 8 Bit-score

F.G. 8 Patent Application Publication May 8, 2014 Sheet 24 of 50 US 2014/O128289 A1

* S3 st 8 : :...'...'. 33 x 23 : 288 it * 33 st * 388 it * & &Si : 883 : f: it ge:8

20 30 40 50 60 70 80 90 % identity

F.G. 8 Patent Application Publication May 8, 2014 Sheet 25 of 50 US 2014/O128289 A1

8

8 .,. : ,

203040 50 60708090 100 Bit-score

* 8. it

8 : : ; it * 3: it

* 3: . * : ; 38 38 ft fisii gene

FIG 8 Patent Application Publication May 8, 2014 Sheet 26 of 50 US 2014/0128289 A1

Patent Application Publication May 8, 2014 Sheet 27 of 50 US 2014/O128289 A1

50 - x Baxteritietes c iricities

so s5 60 65 to is 80 85 90 9s too ercept icientity tx reference gences

FIG 10 Patent Application Publication May 8, 2014 Sheet 28 of 50 US 2014/O128289 A1

P.1 Leanloverweight F G. 11 Obese Ali pes putred is v2.1 1 6O OO 2 O

Bacteroides Caccae v4.1

Bacteroides Ovatus v2.1 100 & 6 O

2 0. 3....

Bacteroides stercoris

M Mb Patent Application Publication May 8, 2014 Sheet 29 of 50 US 2014/O128289 A1

P2 Lean/Overweight F G. 11 Obese Bacteroides thetaiotaOmicron WP15482

Bifidobacterium adolescentis v.

Bifidobacterium longum NCC2705 OO

26OO

1 O O

6 O

Mb Vb Patent Application Publication May 8, 2014 Sheet 30 of 50 US 2014/O128289 A1

P3 Lean/Overweight F G. 11 Obese Coinsella aerofaciens v3.1

Dorea longicatena v2.1 100 60

Eubacterium Siraeum W3.

Faecalibacterium prausnitzii M21/2 v2.1 100

Methanobrevibacter Smithii ACC3506 Patent Application Publication May 8, 2014 Sheet 31 of 50 US 2014/O128289 A1

FIG 11 P4 Lean/Overweight Obese Bacteroides distasonis ATCC8503 100:

60 ...... s.s. s. s. s. s.s.. .s.. .s.. .s.. .s...... s. s. s. X ‘......

1 OO 2 O ------Ruminococcus torques v2.1 100 Patent Application Publication May 8, 2014 Sheet 32 of 50 US 2014/O128289 A1

3MMMMMM $83 28 388 38 388 S8: 883d 88g:

B 250 208 ise 88 88.

88: 28 388 48. 838 888 88.3888g:

F.G. 12 Patent Application Publication May 8, 2014 Sheet 33 of 50 US 2014/O128289 A1

C 38

38 8. 88: s: 8: 3: 38 8 * 8 8: 288 388 400 388 888 8eai i8ng:

8 ce as as as see see 8888 icegis

883 8:8s:xxx: 8-ysisex: e-vau:8x8

8waii:8x8,xits carx88,%icietityx88 FIG 12 Patent Application Publication May 8, 2014 Sheet 34 of 50 US 2014/O128289 A1

PCR-based 16S rRNA gene sequences A is:

& 88%. s s 88. s: $3. s &

3% w too%

& 8 x & 8 & 8 & & : : x s : x & & X: . . S . . . S . . . . 3. . . 3...... i. i. i. i. i. i. i. i. k ...... ii. iricites 88tergicieties Actioacteria geotacteria FIG 13 x tier Patent Application Publication May 8, 2014 Sheet 35 of 50 US 2014/O128289 A1

Vicrobione sequences :

88%

8.

:

3%

& -

888 8 88% s 88.

& s : 28% F ; : : 88% 6.8% s: 3. S. &: 20%

s:------: x& x8 x: x& x8 &: x : x3 x. :. . x ...... itrictites acteroisiestes Actioacteria roteoacteria FG 13 x tier Patent Application Publication May 8, 2014 Sheet 36 of 50 US 2014/O128289 A1

:: *:::::::::::::::::

8:38. e :38 388 3:383 S28. $88. 88.8 8.33s. 3883: Scy 38: 88:8 2:38: Six: assists ass 88:

F.G. 14A Patent Application Publication May 8, 2014 Sheet 37 of 50 US 2014/O128289 A1

B 100

8.

(8.

, 0.4. ), o2 3.4 s PC1 (20%)

0. 9. 9.

0. 9. 8

0.97

0.98

OS 5

0.94 Win WS Win WS Unrelated TWin Mor pairs

FIG 14. Patent Application Publication May 8, 2014 Sheet 38 of 50 US 2014/O128289 A1

3.3

8.8 I

8.

g 3:38: 3888 88: 88: 88: 88:38: 83f segsex8&

FIG 15 Patent Application Publication May 8, 2014 Sheet 39 of 50 US 2014/O128289 A1

Patent Application Publication May 8, 2014 Sheet 40 of 50 US 2014/O128289 A1

Bacteroidetes

GT CE CB CAZy families

FG 16B Patent Application Publication May 8, 2014 Sheet 41 of 50 US 2014/O128289 A1

3A

g

$&##333333333 Patent Application Publication May 8, 2014 Sheet 42 of 50 US 2014/O128289 A1

Bacterial phy

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

S ir ices & 8actero i detes Actioacter a x. Other

FG 18A Patent Application Publication May 8, 2014 Sheet 43 of 50 US 2014/O128289 A1

COG categories

s i

FG 18B

Patent Application Publication May 8, 2014 Sheet 45 of 50 US 2014/O128289 A1

838tegicieties *Sixties:

8

FG 20

Patent Application Publication May 8, 2014 Sheet 46 of 50 US 2014/O128289 A1

i. s 8

s

8

3. s:

s s:8.

88:88'exitiates ificates 8acterigietes iricites gecies geotes geote:8 ge:{xtes A B

FIG 21 Patent Application Publication May 8, 2014 Sheet 47 of 50 US 2014/O128289 A1

Ks A critixitx.gotis grxips 88:

39.3% of g8:38 it 8 tic:38:388s

(Aytes

B 8.3%

$3.2% 8 gakages it 8 girgiies:s

38% of 883:38-38s $338c: fasties is 38 six:risix}x &s FIG 22 Patent Application Publication May 8, 2014 Sheet 48 of 50 US 2014/O128289 A1

C stigiogasS388. gr8:8

2.8 x 3

88.8% 8 gig 88 i: 88:33:38&s

33.2% of sessier888 Yatai groups in 388:::::::::icsix:s

variable microbiome

3.

50 160 150 200 250 Sequences (thousands FIG 22

Patent Application Publication May 8, 2014 Sheet 50 of 50 US 2014/O128289 A1

FIG 24

Variable microbiome Core microbiome Folate biosynthesis Depleted Eriched Fatty acid biosynthesis Membrane and intracellular structural molecules Biosynthesis of siderophore group nonribosomal peptides Methane metabolism Ascorbate and adarate metabolism Arginine and proline metabolism General function prediction only Chemotaxis Bacterial motility proteins Flagellar assembly

Type secretion system Other carbohydrate metabolism Non-enzyme Electron transfer carriers Function unknown nostometabolism Phosphotransferase system (PTS) Transcription factors Inorganic ion transport and metabolism Other energy metabolism Protein kinases WO-Component system Other ion-coupled transporters Protein folding and associated processing Transporters

Other enzymes Pyruvate? Oxoglutarate oxidoreductases Ubiquinone biosynthesis Peptidases Nitrogen metabolism ABC transporters Glyoxylate and dicarboxylate metabolism Fructose and nanose metabolism Ce division & Glycerolipid metabolism Methionine metabolism Aminosugars metabolism Carbon fixation Glycine, serine and threonine metabolism Glycolysis f Giuconeogenesis Cyanoamino acid metabolism Cysteine metabolism

Histicine metabolism DNA polymerase Gutamate metabolism Aminoacyl-tRNA biosynthesis Replication complex Pantothenate and CoA biosynthesis Lysine biosynthesis RNA polymerase Phenylalanine, tyrosine and tryptophan biosynthesis Alanine and aspartate metabolism Waline, leucine and isoleucine biosynthesis Ribosome Galactose metabolism N-Glycan degradation Glycosphingolipid biosynthesis - gangioseries Sphingolipid metabolism Translation actors Nucleotide Sugars metabolism Tyrosine metabolism Prine metabolism Pyrimidine metabolism Protein export Peptidoglycan biosynthesis Pyruvate metabolism Pentose phosphate pathway Seleroamino acid metabolism Starch and Sicose metabolism US 2014/O128289 A1 May 8, 2014

METHODS OF PROMOTINGWEIGHT LOSS weight Subsequently regain it. Although Surgical intervention AND ASSOCATED ARRAYS has had some measured Success, the various types of surgeries have relatively high rates of morbidity and mortality. CROSS REFERENCE TO RELATED 0007 Pharmacotherapeutic principles are limited. In addi APPLICATIONS tion, because of undesirable side effects, the FDA has had to recall several obesity drugs from the market. Those that are 0001. This application claims the priority of U.S. National approved also have side effects. Currently, two FDA-ap application Ser. No. 13/002,137, filed Mar. 29, 2011; which proved anti-obesity drugs are orlistat, a lipase inhibitor, and claims the priority of PCT application number PCT/US2009/ sibutramine, a serotonin reuptake inhibitor. Orlistat acts by 049253, filed Jun. 30, 2009; which claims the priority of U.S. blocking the absorption of fat into the body. An unpleasant provisional application No. 61/076,887, filed Jun. 30, 2008: side effect with orlistat, however, is the passage of undigested and U.S. provisional application No. 61/101.011, filed Sep. oily fat from the body. Sibutramine is an appetite Suppressant 29, 2008, each of which is hereby incorporated by reference that acts by altering brain levels of serotonin. In the process, in its entirety. it also causes elevation of blood pressure and an increase in heart rate. Other appetite Suppressants, such as amphetamine GOVERNMENTAL RIGHTS derivatives, are highly addictive and have the potential for 0002 This invention was made in part with government abuse. Moreover, different subjects respond differently and support under grant DK078669 awarded by the National unpredictably to weight-loss medications. Institutes of Health. The government has certain rights in the 0008 Because surgical and pharmacotherapy treatments invention. are problematic, new non-cognitive strategies are needed to prevent and treat obesity and obesity-related disorders. FIELD OF THE INVENTION 0003. The present invention encompasses methods and SUMMARY OF THE INVENTION arrays associated with body fat and/or weight loss. 0009. One aspect of the present invention encompasses an array comprising a substrate. The Substrate has disposed REFERENCE TO SEQUENCE LISTING thereon at least one nucleic acid indicative of, or modulated in, an obese host microbiome compared to a lean host micro 0004. A paper copy of the sequence listing and a computer biome. Alternatively, the substrate has disposed thereon at readable form of the same sequence listing are appended least one nucleic acid indicative of, or modulated in, a lean below and herein incorporated by reference. Additionally, the host microbiome compared to an obese host microbiome. sequence listing filed with the provisional application is also 0010. Another aspect of the present invention encom hereby incorporated by reference. passes an array comprising a substrate. The Substrate has disposed thereon at least one polypeptide indicative of, or BACKGROUND OF THE INVENTION modulated in, an obese host microbiome compared to a lean 0005 According to the Centers for Disease Control host microbiome. Alternatively, the substrate has disposed (CDC), over sixty percent of the United States population is thereon at least one polypeptide indicative of, or modulated overweight, and greater than thirty percent are obese. This in, a lean host microbiome compared to an obese host micro translates into more than 50 million adults in the United States biome. with a Body Mass Index (BMI) of 30 or above. Obesity is also 0011 Yet another aspect of the invention encompasses a a worldwide health problem with an estimated 500 million method for modulating body fat or for modulating weight loss overweight adult humans body mass index (BMI) of 25.0- in a Subject. The method typically comprises altering the 29.9 kg/m and 250 million obese adults (Bouchard, C microbiota population in the Subject's gastrointestinal tract (2000)N EnglJ Med. 343, 1888-9). This epidemic of obesity by modulating the relative abundance of Actinobacteria. In is leading to worldwide increases in the prevalence of obesity Some embodiments, the relative abundance is increased, in related disorders, such as diabetes, hypertension, cardiac other embodiments, the relative abundance is decreased. pathology, and non-alcoholic fatty liver disease (NAFLD: 0012 Still another aspect of the invention encompasses a Wanless, and Lentz (1990) Hepatology 12, 1106-1110. Sil composition. The composition usually comprises an antibi verman, et al. (1990). Am. J. Gastroenterol. 85, 1349-1355; otic having efficacy against Actinobacteria but not against Neuschwander-Tetri and, Caldwell (2003) Hepatology 37, Bacteroidetes; and a probiotic comprising Bacteroidetes. 1202-1219). According to the National Institute of Diabetes, 0013. Other aspects and iterations of the invention are Digestive and Kidney Diseases (NIDDK) approximately 280, described more thoroughly below. 000 deaths annually are directly related to obesity. The NIDDK further estimated that the direct cost of healthcare in REFERENCE TO COLOR FIGURES the U.S. associated with obesity is $51 billion. In addition, Americans spend S33 billion per year on weight loss prod 0014. The application file contains at least one photograph ucts. In spite of this economic cost and consumer commit executed in color. Copies of this patent application publica ment, the prevalence of obesity continues to rise at alarming tion with color photographs will be provided by the Office rates. From 1991 to 2000, obesity in the U.S. grew by 61%. upon request and payment of the necessary fee. 0006 Although the physiologic mechanisms that support development of obesity are complex, the medical consensus BRIEF DESCRIPTION OF THE FIGURES is that the root cause relates to an excess intake of calories 0015 FIG. 1 depicts the technical replicates (analyzed at compared to caloric expenditure. While the treatment seems four different sequencing centers) cluster. Fecal DNA quite intuitive, dieting is not an adequate long-term solution samples were split and sequenced separately at four different for most people; about 90 to 95 percent of persons who lose sequencing centers. Abbreviations: usc, Environmental US 2014/O128289 A1 May 8, 2014

Genomics Core Facility, University of South Carolina; ok, resulting UniFrac distance matrix. Asterisks indicate signifi Advanced Center for Genome Technology, University of cant differences between related and unrelated individuals Oklahoma, ct; 454 Life Sciences Branford, Conn.; and ma, Student's t-test with Monte Carlo (1,000 permutations); Josephine Bay Paul Center, Marine Biological Laboratory, *p-10-5). Woods Hole Mass. Unweighted UniFrac-based clustering (0019 FIG.5 depicts clustering of the fecal microbiotas of was performed on the combined dataset. Colored boxes monozygotic (MZ) and dizygotic (DZ) twins and their moth enclose samples from the same individual (also indicated by ers sampled at the beginning of the study and two months identical IDs followed by the number 1 or 2. The location of later. Unweighted UniFrac-based clustering. Colored boxes the sequencing facility follows each sample ID.) Randomly link samples from the same individual (also indicated by selected sequences were analyzed (s.500 per replicate). FIGS. identical IDs followed by the number 1 or 2). 34 of the 1.1, 1.2, 1.3, 1.4, and 1.5 show details from FIG. 1. individuals were only sampled once. 1,000 randomly V2/3 0016 FIG. 2 depicts 16S rRNA gene surveys revealing 16S rRNA gene sequences were analyzed per sample. FIGS. familial similarity and reduced diversity of the gut microbiota 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6 show details from FIG. 5. in obese individuals. (A) Comparison of the average UniFrac 0020 FIG. 6 depicts the relative abundance of the major distance (a measure of differences in bacterial community gut bacterial phyla across 120 gut samples obtained at two structure) between individuals over time (self), twin-pairs, different timepoints. Fecal samples were collected at the ini twins and their mother, and unrelated individuals. Briefly, tial and second timepoints (average interval between sample 1,000 sequences were randomly sampled from each V2/3 collection: 57+4 days). The relative abundance of the major dataset, OTUs were chosen, a UniFrac tree was built from gut bacterial phyla is based on analysis of V2/3 16S rRNA representative sequences, and random permutations were gene sequences. Samples are organized based on the rank done on the resulting UniFrac distance matrix. Asterisks indi order abundance of Firmicutes in the initial timepoint. cate significant differences between the indicated categories 0021 FIG. 7 depicts the number of shared phylotypes Student's t-test with Monte Carlo (1,000 permutations); (OTUs) as a function of the number of sequences per sample. *p-10-5; **p-10-14; ***p<10-41. (B) Evidence of 50-3,000 sequences were randomly selected from each reduced diversity in the fecal microbiota of obese individuals. sample, obtained from 93 different individuals. All sequences Phylogenetic diversity curves were generated by randomly were binned into species-level phylotypes using a 97% sampling 1 to 10,000 sequences from each V6 16S rRNA identity threshold. Less stringent parameters were used for dataset, and then calculating the total branch length leading to OTU binning at all levels of coverage to allow for analysis of the sampled sequences (meani-95% CI shown). 3,000 sequences per sample (density cutoff 0.65, maximum 0017 FIG. 3 depicts 16S rRNA gene surveys revealing of 3000 nodes). evidence for familial aggregation and reduced diversity in the 0022 FIG. 8 depicts the validation of annotation param obese gut microbiome. (A,B) Comparison of the average eters using control datasets. (A-C) Percent of randomly frag UniFrac distance (a measure of differences in bacterial com mented annotated genes (KEGG V44) assigned to the correct munity structure) between related and unrelated individuals. KEGG Orthologous group as a function of the (A) e-value, (B) Briefly, 10,000 sequences were randomly sampled from each identity, or (C) bit-score cutoff used. (D-F) Sensitivity true V6 dataset (Panel A) and 200 sequences were randomly positives (TP) divided by true positives plus false negatives sampled from each full-length dataset (Panel B), OTUs were (FN) as a function of the (D) e-value, (E) '% identity, or (F) chosen, a UniFrac tree was built from representative bit-score cutoff used. (G-I) Precision true positives divided sequences, and random permutations were done on the result by true positives plus false positives (FP) as a function of the ing UniFrac distance matrix. Asterisks indicate significant (G) e-value, (H) '% identity, or (I) bit-score cutoff used. The differences between related and unrelated individuals Stu Vertical gray line and circle indicates the cutoff values used in dent's t-test with Monte Carlo (1,000 permutations); p<0. this analysis. 001. (C.D) Phylogenetic diversity curves for the obese and 0023 FIG. 9 depicts the taxonomic profiles of microbial lean gut microbiome. Briefly, 1 to 1,000 sequences were gene content in the human gut (fecal) microbiome. Full randomly sampled from each V2/3 dataset (Panel C), and 1 to length 16S sequences were obtained for each reference 200 sequences were randomly sampled from each full-length genome, likelihood parameters were determined using Mod dataset (Panel D), and the average branch length leading to eltest, and a maximum-likelihood tree was generated using the sampled sequences was calculated. (E.F) Rarefaction PAUP. Bootstrap values represent nodes found in >70 of 100 curves for the obese and lean fecal microbiota. Briefly, 1 to repetitions. Branches and distributions are colored by phy 10,000 sequences were randomly sampled from each V6 lum: Bacteroidetes (orange), Firmicutes (blue), and Actino dataset (Panel E), and 1 to 200 sequences were randomly bacteria (green). Proteobacteria (E. coli) and Archaea (M. sampled from each full-length dataset (Panel F). The average Smithii and M. Stadtmanae) are uncolored. The relative abun number of OTUs in each sample was then calculated dance of sequences homologous to each genome is depicted (meant-95% CI shown). on a scale of 0 to 30% (BLASTX comparisons of microbiome 0018 FIG. 4 depicts a graph illustrating the stratification datasets to reference genomes). Sample ID nomenclature: of related and unrelated individuals concordant for physi Family number, Twin number or mom, and BMI category ological states of obesity versus leanness confirms familial (Le-lean, Ov=overweight, Ob-obese; e.g. F1 T1Lestands for similarity. (A,B) Comparison of the average UniFrac distance family 1, twin 1, lean). (a measure of differences in bacterial community structure) 0024 FIG. 10 depicts the assignment of fecal microbiome between related and unrelated individuals concordant for reads to sequenced reference human gut-derived leanness (Panel A) or obesity (Panel B). Briefly, 1,000 Bacteroidetes and Firmicutes genomes. Histogram of the per sequences were randomly sampled from each V2/3 dataset, cent identity (meantSEM) obtained from sequence align OTUs were chosen, a UniFrac tree was built from represen ments between gut microbiome reads (n=18 datasets) and tative sequences, and random permutations were done on the Firmicutes or Bacteroidetes reference genomes. US 2014/O128289 A1 May 8, 2014

0025 FIG. 11 depicts the percent identity plots of the fecal Bacteroidetes bins. Sequences from each of the 18 fecal microbiomes versus reference genomes. Each row (X-axis) microbiomes were binned based on sequence homology to represents a different genome. The y-axis shows the percent the custom 42-member reference human gut genome data identity to microbiome sequences (red dots). The combined base. (A) The frequency of each KEGG pathway was tallied data from lean/overweight individuals are in the left column for each bin and significantly different pathways were iden while the combined data from obese individuals are displayed tified using a bootstrap re-sampling analysis (Xipe V2.4). in the right column. Supercontigs were used for draft Significantly different pathways reaching at least 0.6% rela genomes; the assembly version (V) can be found after the tive abundance in at least two microbiomes were clustered strain name. The lines found at 10% identity on each plot using single-linkage hierarchical clustering and the Pearson’s depict the Sum of all sequences mapped across each genome. correlation distance metric. (B) The relative abundance of 0026 FIG. 12 depicts the dependence of percentage (A), CAZy families in the Bacteroidetes and Firmicutes sequence quality (B), and accuracy (C-D) of sequence assignments on bins. Asterisks indicate significant differences (Mann-Whit read-length. Two fecal samples were processed using extra ney test, p<0.0001). long read pyrosequencing (454 FLX Titanium kit; Samples 0031 FIG. 17 depicts the functional clustering of phylum TS28 and TS29). 10,000 sequences from the maximum of wide sequence bins and reference genomes from 36 human each read-length distribution (between 490 and 505 nt) were gut-derived Bacteroidetes and Firmicutes. The frequency of randomly selected from each sample. Simulated reads were each KEGG pathway in phylum-wide sequence bins, and in created by sampling the first 50-500 nt of each of these 10,000 10,000 simulated reads generated from each of the reference sequences, and each simulated read was compared using genomes (Readsim v0.10; ref. 56), was tallied and pathways NCBI-BLASTX against our custom gut genome database. reaching at least 0.6% relative abundance in at least two fecal Multiple BLAST thresholds were used (see key in panel A). microbiomes were clustered using principal components (A) Percent of sequences assigned to the reference genomes analysis (PCA). An average Firmicutes and Bacteroidetes as a function of read-length. (B) Average BLAST bit score as genome was generated by pooling all reads generated from a function of read-length. (C) Percent of gene assignments genomes within each phylum. (from the gut genome database) identical to full-length 0032 FIG. 18 depicts the comparison of taxonomic and sequence as a function of read-length. (D) Percent of group functional variations in the human gut microbiome. (A) Rela assignments (same assigned COG as the full-length tive abundance of major phyla across 18 fecal microbiomes sequence) as a function of read-length. from MZ twins and their mothers, based on BLASTX com 0027 FIG. 13 depicts the relative abundance of bacterial parisons of microbiomes and the NCBI non-redundant data phyla in 18 human gut microbiomes. (A-C) PCR-based 16S base. (B) Relative abundance of COG categories across each rRNA gene sequences (A) full-length, (B) V2/3 region, and sampled gut microbiome. (C) V6. (D-E) Microbiome data analyzed by BLAST com 0033 FIG. 19 depicts the relative abundance of KEGG parisons (D) NCBI non-redundant database and (E) a custom pathways and COG categories in the gut microbiomes of 18 42 gut genome database. (F) Analysis of 16S rRNA gene individuals (6 MZ twin pairs and their mothers), plus 9 pre fragments identified in each microbiome. (G) Correlation viously published adult microbiomes. Simulated reads were matrix based on all pairwise comparisons (R) of the relative generated from each of the 9 previously published micro abundance of the four major phyla (Actinobacteria, Firmic biomes datasets obtained by capillary sequencing to mimic utes, Bacteroidetes, and Proteobacteria) across all six meth pyrosequencing reads, then re-annotated using the KEGG ods. and STRING-extended COG databases. (A) The average 0028 FIG. 14 depicts the metabolic pathway-based clus relative abundance of KEGG pathways in MZ twin pairs and tering and analysis of the human gut microbiome of MZ their mothers graphed as a function of the average relative twins. (A) Metabolic pathways were tallied using the KEGG abundance of KEGG pathways in the 9 previously published database and annotation scheme. Functional profiles were adult gut microbiome datasets. (B) The distribution of COG clustered using a single-linkage hierarchical clustering with a categories across all 27 datasets. Pearson's distance metric. All pairwise comparisons were 0034 FIG. 20 depicts the relative abundance of COG cat made of the profiles by calculating each R value. (B) Alinear egories in 36 sequenced reference human gut-derived Firmi regression of the relative abundance of Bacteroidetes versus cutes and Bacteroidetes genomes. 10,000 simulated reads, the first principal component derived from a PCA analysis of generated from each of the reference genomes (Readsim KEGG metabolic profiles. (C) Comparisons of functional v0.10), were annotated using the STRING-extended COG similarity between twin pairs, between twins and their database. mother, and between unrelated individuals. Asterisks indicate 0035 FIG. 21 depicts the average functional diversity and significant differences (Student's t-test with Monte Carlo: evenness of simulated reads generated from reference p-0.01) and bars represent meant SEM. genomes from gut Firmicutes or Bacteroidetes. (A) Func 0029 FIG. 15 depicts the functional profiles of MZ fecal tional diversity was calculated in EstimateS (v8.0), based on microbiomes, based on the relative abundance of KEGG the abundance of each metabolic pathway across 10,000 pathways, which stabilize after ~20,000 sequences are col simulated reads generated from each of the 36 reference lected for a given sample. Datasets were randomly Sub genomes (Readsim v0.10). (B) Shannon evenness. Asterisks sampled between 500 and 25,000 sequences. The average indicate significant differences (Mann-Whitney test, p<0.01). functional similarity (R) between the subsampled dataset 0036 FIG. 22 depicts the enzyme-level functional and the full dataset is shown as a function of sequencing groups shared between all or a Subset of the sampled gut effort. microbiomes. Sequences from each of the 18 microbiomes 0030 FIG. 16 depicts the KEGG pathways and Carbohy characterized in this study were assigned to (A) KEGG drate Active Enzymes (CAZy) families whose representation groups, (B) CAZy families, and (C) STRING annotations. is significantly different between Firmicutes and Functional groups (inner circle), and the sequences assigned US 2014/O128289 A1 May 8, 2014

to each group (outer circle) were then tallied based on their within the Bacteroidetes phylum (phylum is also known as a co-occurrence in any combination of 1 to 18 microbiomes. division) is increased and optionally, the relative abundance For example, the outer aqua-colored segment in Panel A of bacteria within the Actinobacteria and/or Firmicutes phy demonstrates that 96.2% of the total sequences generated lum is decreased. Alternatively, to increase energy harvesting, from all 18 Samples were assigned to functional groups that to increase body fat, or promote weight gain, the relative were common to all 18 microbiomes. (D) KEGG categories abundance of Bacteroidetes is decreased and optionally, the enriched or depleted in the core versus variable components relative abundance of Actinobacteria and/or Firmicutes is of the gut microbiome. Sequences from each of the 18 fecal increased. Additional agents may also be utilized to achieve microbiomes were binned into the “core or variable micro either weight loss or weight gain. Examples of these agents biome-based on the co-occurrence of KEGG Orthologous are detailed in section I(d). groups (core groups were found in all 18 microbiomes while variable groups were present infewer (<18) microbiomes; see (a) Altering the Abundance of Bacteroides FIG. 20A). General categories are shown. Asterisks indicate significant differences (Student's t-test, p<0.05, *p-0.001, 0041. The relative abundance of Bacteroidetes may be ***p<10-5). altered by increasing or decreasing the presence of one or more Bacteroidetes species that reside in the gut. Addition 0037 FIG. 23 depicts the KEGG categories enriched or ally, non-limiting examples of species may include B. thetaio depleted in the core versus variable components of the gut taOmicron, B. vulgatus, B. ovatus, P. distasonis, B. uniformis, microbiome. Sequences from each of the 18 fecal micro B. Stercoris, B. eggerthii, B. merdae, and B. caccae. In one biomes were binned into the core or variable microbiome embodiment, the population of B. thetaiotaOmicron is altered. based on the co-occurrence of KEGG Orthologous groups In still another embodiment, the population of B. vulgatus is (core groups were found in all 18 microbiomes while variable altered. In an additional embodiment, the population of B. groups were present in fewer (<18) microbiomes; see FIG. ovatus is altered. In another embodiment, the population of P 20A). General categories are shown. Asterisks indicate sig distasonis is altered. In yet another embodiment, the popula nificant differences (Student's t-test, p<0.05, *p-0.001, tion of B. uniformis is altered. In an additional embodiment, ***p<10-5). the population of B. Stercoris is altered. In a further embodi 0038 FIG. 24 depicts the clustering of pathways enriched ment, the population of B. eggerthii is altered. In still another or depleted in the core microbiome. Sequences from each of embodiment, the population of B. merdae is altered. In the 18 distal gut microbiomes were binned into the core or another embodiment, the population of B. caccae is altered. variable microbiome based on the co-occurrence of KEGG In a further embodiment, the species within the Bacteroidetes orthologous groups core groups were found in all 18 micro phylum may be as of yet unnamed. biomes while variable groups were present in fewer (<18) 0042. The present invention also includes altering various microbiomes; see FIG. 20A). The frequency of each KEGG combinations of Bacteroidetes species, such as at least two pathway was tallied for each bin and significantly different species, at least three species, at least four species, at least five pathways were identified using a bootstrap re-sampling species, at least six species, at least seven species, at least analysis (Xipe V2.4). Pathways significantly enriched (yel eight species, at least nine species, at least ten Bacteroidetes low) or depleted (blue), reaching at least 0.6% relative abun species, or more than ten species of Bacteroidetes. For dance in at least two microbiomes, were clustered using example, the combination of B. thetaiotaomicron, B. vulga single-linkage hierarchical clustering and the Pearson's cor tus, B. ovatus, P distasonis, and B. uniformis may be altered. relation distance metric. 0043. In an exemplary embodiment, the relative abun dance of Bacteroidetes is increased to decrease energy har DETAILED DESCRIPTION OF THE INVENTION vesting, decrease body fat, or promote weight loss in a Sub 0039. It has been discovered, as demonstrated in the ject. Increased abundance of Bacteroidetes in the gut may be Examples, that there is a relationship between the human gut accomplished by several Suitable means generally known in microbiota and obesity. In particular, an obese human Subject the art. In one embodiment, a food Supplement that increases typically has fewer Bacteroidetes and more Actinobacteria the abundance of Bacteroidetes may be administered to the compared to a lean Subject. In some embodiments, an obese Subject. By way of example, one such food Supplement is human Subject has proportionately fewer Bacteroidetes and psyllium husks as described in U.S. Patent Application Pub more Actinobacteria and Firmicutes compared to a lean Sub lication No. 2006/0229905, which is hereby incorporated by ject. Taking advantage of these discoveries, the present inven reference in its entirety. In an exemplary embodiment, a pro tion provides compositions and methods to regulate energy biotic comprising one or more Bacteroidetes species or balance in a Subject. In particular, the invention provides strains may be administered to the subject. The amount of nucleic acid sequences that are associated with obesity in probiotic administered to the subject can and will vary humans. These sequences may be used as diagnostic or prog depending upon the embodiment. The probiotic may com nostic biomarkers for obesity risk, biomarkers for drug dis prise from about one thousand to about ten billion cfu/g covery, biomarkers for the discovery of therapeutic targets (colony forming units per gram) of the total composition, or involved in the regulation of energy balance, and biomarkers of the part of the composition comprising the probiotic. In one for the efficacy of a weight loss program. embodiment, the probiotic may comprise from about one hundred million to about 10 billion organisms. The probiotic microorganism may be in any suitable form, for example in a I. Modulation of Energy Balance in a Subject powdered dry form. In addition, the probiotic microorganism 0040. The energy balance of a subject may be modulated may have undergone processing in order for it to increase its by altering the Subject’s gut microbiota population. Generally Survival. For example, the microorganism may be coated or speaking, to decrease energy harvesting, decrease body fat, or encapsulated in a polysaccharide, fat, starch, protein or in a promote weight loss, the relative abundance of bacteria Sugar matrix. Standard encapsulation techniques known in US 2014/O128289 A1 May 8, 2014

the art can be used. For example, techniques discussed in U.S. fat, or promote weight gain in a Subject. Increased abundance Pat. No. 6,190,591, which is hereby incorporated by refer of Actinobacteria in the gut may be accomplished by several ence in its entirety, may be used. Suitable means generally known in the art. In an exemplary 0044 Alternatively, the relative abundance of embodiment, a probiotic comprising one or more Actinobac Bacteroidetes is decreased to increase energy harvesting, teria Strains or species may be administered to the Subject. increase body fat, or promote weight gain in a Subject. 0050. It is contemplated that the abundance of gut Actino Decreased abundance of Bacteroidetes in the gut may be bacteria may be altered (i.e., increased or decreased) from accomplished by several Suitable means generally known in about a couple fold difference to about a hundred fold differ the art. In one embodiment, an antibiotic having efficacy ence or more, depending on the desired result (i.e., increased against Bacteroidetes may be administered. Generally speak energy harvesting (weight gain) or decreased energy harvest ing, antimicrobial agents may target several areas of bacterial ing (weight loss)). A method for determining the relative physiology: protein translation, nucleic acid synthesis, cell abundance of gut Actinobacteria is described in the examples. wall synthesis or potentially, the polysaccharide acquisition 0051 Stated another way, it is contemplated that the abun machinery. In an exemplary embodiment, the antibiotic will dance of gut Actinobacteria may be altered (i.e., increased or have efficacy against Bacteriodetes but not against Firmic decreased) from about 1% to about 100% or more depending utes. The susceptibility of the targeted species to the selected on the desired result (i.e., increased energy harvesting antibiotics may be determined based on culture methods or genome screening. (weight gain) or decreased energy harvesting (weight loss)). 0045. It is contemplated that the abundance of gut For weight loss, the abundance may be altered by a decrease Bacteroidetes within an individual subject may be altered of from about 20% to about 100%, from about 30% to about (i.e., increased or decreased) from about a couple fold differ 100%, from about 40% to about 100%, from about 50% to ence to about a hundred fold difference or more, depending on about 100%, from about 60% to about 100%, from about 70% the desired result (i.e., increased energy harvesting (weight to about 100%, from about 80% to about 100%, or from about gain) or decreased energy harvesting (weight loss)) and the 90% to 100%. A method for determining the relative abun individual subject. A method for determining the relative dance of gut Actinobacteria is described in the examples. abundance of gut Bacteroidetes is described in the examples, alternatively, an array of the invention, described below, may (c) Altering the Abundance of Firmicutes be used to determine the relative abundance. 0.052 The relative abundance of Firmicutes may be altered 0046 Stated another way, it is contemplated that the abun by increasing or decreasing the presence of one or more dance of gut Bacteroidetes within an individual Subject may species that reside in the gut. Representative species include be altered (i.e., increased or decreased) from about 1% to species from , Bacilli, and Mollicutes. In one about 100% or more depending on the desired result (i.e., embodiment, the relative abundance of one or more Clostridia increased energy harvesting (weight gain) or decreased species is altered. In another embodiment, the relative abun energy harvesting (weight loss)) and the individual Subject. dance of one or more Bacilli species is altered. In yet another For weight loss, the abundance may be altered by an increase embodiment, the relative abundance of one or more Molli of from about 20% to about 100%, from about 30% to about cutes species is altered. It is also contemplated that the rela 100%, from about 40% to about 100%, from about 50% to tive abundance of several species of Firmicutes may be about 100%, from about 60% to about 100%, from about 70% altered without departing from the scope of the invention. By to about 100%, from about 80% to about 100%, or from about way of non-limiting examples, a combination of one or more 90% to 100%. A method for determining the relative abun Clostridia species, one or more Bacilli species, and one or dance of gut Bacteroidetes is described in the examples, alter more Mollicutes species may be altered. In a further embodi natively, an array of the invention, described below, may be ment, the species within the Firmicutes phylum may be as of used to determine the relative abundance. yet unnamed. 0053. In some embodiments, the Mollicutes class is (b) Altering the Abundance of Actinobacteria altered. For instance, E. dolichum, E. cylindroides, E. 0047. The relative abundance of Actinobacteria may be biforme, or C. innocuum may be altered. In one embodiment, altered by increasing or decreasing the presence of one or the species of the Mollicutes class may possess the genetic more species that reside in the gut. Representative, non-lim information to create a cell wall. In another embodiment, the iting species include B. longum, B. breve, B. catenulatum, B. species of the Mollicutes class may produce a cell wall. In a dentium, B. gallicum, B. pseudocatenulatum, C. aerofaciens, further embodiment, the species within the class Mollicutes C. Stercoris, C. intestinalis, and S. variabile. may be as of yet unnamed. 0048. In an exemplary embodiment, the relative abun 0054. In an exemplary embodiment, the relative abun dance of Actinobacteria is decreased to decrease energy har dance of Firmicutes is decreased to decrease energy harvest vesting, decrease body fat, or promote weight loss in a Sub ing, decrease body fat, or promote weight loss in a Subject. ject. Decreased abundance of Actinobacteria in the gut may Decreased abundance of Firmicutes in the gut may be accom be accomplished by several Suitable means generally known plished by several Suitable means generally known in the art. in the art. In one embodiment, an antibiotic having efficacy In one embodiment, an antibiotic having efficacy against against Actinobacteria may be administered. In an exemplary Firmicutes may be administered. In an exemplary embodi embodiment, the antibiotic will have efficacy against Actino ment, the antibiotic will have efficacy against Firmicutes but bacteria but not against Bacteriodetes. The susceptibility of not against Bacteriodetes. In another exemplary embodiment, the targeted species to the selected antibiotics may be deter the antibiotic will have efficacy against Mollicutes, but not mined based on culture methods or genome screening. Bacteriodetes. The susceptibility of the targeted species to the 0049. Alternatively, the relative abundance of Actinobac selected antibiotics may be determined based on culture teria is increased to increase energy harvesting, increase body methods or genome screening. US 2014/O128289 A1 May 8, 2014

0055 Alternatively, the relative abundance of Firmicutes i. Fiaf Polypeptide is increased to increase energy harvesting, increase body fat, 0060 A composition of the invention for promoting or promote weight gain in a subject. Increased abundance of weight loss may optionally include either increasing the Firmicutes in the gut may be accomplished by several suitable amount of a Fiaf polypeptide or the activity of a Fiaf polypep means generally known in the art. In an exemplary embodi tide. Typically, a suitable Fiaf polypeptide is one that can ment, a probiotic comprising Firmicutes may be administered substantially inhibit LPL when administered to the subject. to the subject. Several Fiaf polypeptides known in the art are suitable for use in the present invention. Generally speaking, the Fiaf 0056. It is contemplated that the abundance of gut Firmi polypeptide is from a mammal. By way of non-limiting cutes may be altered (i.e., increased or decreased) from about example, Suitable Fiaf polypeptides and nucleotides are a about a couple fold difference to about a hundred fold delineated in Table A. difference or more, depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)). A method for determining TABLE A the relative abundance of gut Firmicutes is described in the Species PubMed Ref. examples. Homo sapiens NM 139314 0057 Stated another way, it is contemplated that the abun NM O16109 dance of gut Firmicutes may be altered (i.e., increased or Mits miscuits NM 020581 Rattus norvegicus NM 1991.15 decreased) from about 1% to about 100% or more depending SiS scrofa AY307772 on the desired result (i.e., increased energy harvesting BoStatiris AY192008 (weight gain) or decreased energy harvesting (weight loss)). Pan troglodytes AY411895 For weight loss, the abundance may be altered by a decrease of from about 20% to about 100%, from about 30% to about 0061. In certain aspects, a polypeptide that is a homolog, 100%, from about 40% to about 100%, from about 50% to ortholog, mimic or degenerative variant of a Fiafpolypeptide about 100%, from about 60% to about 100%, from about 70% is also suitable for use in the present invention. In particular, to about 100%, from about 80% to about 100%, or from about the subject polypeptide will typically inhibit LPL when 90% to 100%. A method for determining the relative abun administered to the subject. A variety of methods may be dance of gut Firmicutes is described in the examples. employed to determine whether a particular homolog, mimic or degenerative variant possesses Substantially similar bio (d) Additional Weight Modulating Agents logical activity relative to a Fiaf polypeptide. Specific activity or function may be determined by convenient in vitro, cell 0058 Another aspect of the invention encompasses a com based, or in vivo assays, such as measurement of LPL activity bination therapy to regulate fat storage, energy harvesting, in white adipose tissue. In order to determine whether a and/or weight loss or gain in a Subject. In an exemplary particular Fiaf polypeptide inhibits LPL, the procedure embodiment, a combination for decreasing energy harvest detailed in the examples of U.S. Patent Application No. ing, decreasing body fat or for promoting weight loss is 20050239706, which is hereby incorporated by reference in provided. For this embodiment, a composition comprising an its entirety, may be followed. antibiotic having efficacy against Firmicutes and/or Actino 0062 Fiaf polypeptides suitable for use in the invention bacteria but not against Bacteroidetes; and a probiotic com are typically isolated or pure and are generally administered prising Bacteroidetes may be administered to the Subject. as a composition in conjunction with a Suitable pharmaceu Additionally, an anti-archaeal compound may be included in tical carrier, as detailed below. A pure polypeptide constitutes the aforementioned composition to reduce the representation at least about 90%, preferably, 95% and even more preferably, of gut methanogens and the efficiency of methanogenesis, at least about 99% by weight of the total polypeptide in a thereby reducing the efficiency of fermentation of dietary given sample. polysaccharides by saccharolytic bacteria, such as 0063. The Fiaf polypeptide may be synthesized, produced Bacteroidetes. Other agents that may be included with the by recombinant technology, or purified from cells using any aforementioned composition are detailed below. of the molecular and biochemical methods known in the art 0059. The compositions utilized in this invention may be that are available for biochemical synthesis, molecular administered by any number of routes including, but not expression and purification of the Fiaf polypeptides see e.g., limited to, oral, intravenous, intramuscular, intraarterial, Molecular Cloning, A Laboratory Manual (Sambrook, et al. intramedullary, intrathecal, intraventricular, pulmonary, Cold Spring Harbor Laboratory), Current Protocols in transdermal. Subcutaneous, intraperitoneal, intranasal, Molecular Biology (Eds. Ausubel, et al., Greene Publ. Assoc. enteral, topical, Sublingual, or rectal means. The actual effec Wiley-Interscience, New York). tive amounts of compounds comprising a weight loss com 0064. The invention also contemplates use of an agent that position of the invention can and will vary according to the increases Fiaf transcription or its activity. For example, an specific compounds being utilized, the mode of administra agent may be delivered that specifically activates Fiaf expres tion, and the age, weight and condition of the Subject. Dos sion: this agent may be a natural or synthetic compound that ages for a particular individual Subject can be determined by directly activates Fiaf gene transcription, or indirectly acti one of ordinary skill in the art using conventional consider Vates expression through interactions with components of ations. Those skilled in the art will appreciate that dosages host regulatory networks that control Fiaf transcription. Suit may also be determined with guidance from Goodman & able agents may be identified by methods generally known in Gilman's The Pharmacological Basis of Therapeutics, Ninth the art, such as by Screening natural product and/or chemical Edition (1996), Appendix II, pp. 1707-1711 and from Good libraries using the gnotobiotic Zebrafish model described in man & Gilman's The Pharmacological Basis of Therapeutics, the examples of U.S. Patent Application No. 20050239706. In Tenth Edition (2001), Appendix II, pp. 475-493. another embodiment, a chemical entity may be used that US 2014/O128289 A1 May 8, 2014

interacts with Fiaf targets, such as LPL, to reproduce the presence of a combination of archaeon genera or species is effects of Fiaf (e.g., in this case inhibition of LPL activity). In decreased. By way of non-limiting example, the presence of an alternative of this embodiment, administering a Fiafago Methanobrevibacter Smithii and Methanosphaera stadtma nist to the Subject may increase Fiaf expression and/or activ nae is decreased. ity. In one embodiment, the Fiaf agonist is a peroxisome 0069. To decrease the presence of any of the archaeon proliferator-activated receptor (PPARs) agonist. Suitable detailed above, methods generally known in the art may be PPARs include PPARC, PPARE/8, and PPARy. Fenofibrate is utilized. In one embodiment, a compound having anti-micro another Suitable example of a Fiafagonist. Additional Suit bial activities against the archaeon is administered to the able Fiafagonists and methods of administration are further Subject. Non-limiting examples of Suitable anti-microbial described in Manards, et al., J. Biol Chem, 279,34411 (2004), compounds include metronidZaole, clindamycin, tinidazole, and U.S. Patent Publication No. 2003/0220373, which are macrollides, and fluoroquinolones. In another embodiment, a both hereby incorporated by reference in their entirety. compound that inhibits methanogenesis by the archaeon is ii. Other Compounds administered to the Subject. Non-limiting examples include 0065. The compositions of the invention that decrease 2-bromoethanesulfonate (inhibitor of methyl-coenzyme M energy harvesting, decrease body fat, or promote weight loss reductase), N-alkyl derivatives of para-aminobenzoic acid may also include several additional agents suitable for use in (inhibitor of tetrahydromethanopterin biosynthesis), iono weight loss regimes. Generally speaking, exemplary combi phore monensin, nitroethane, lumazine, propynoic acid and nations of therapeutic agents may act synergistically to ethyl 2-butynoate. In yet another embodiment, a hydroxym decrease energy harvesting, decrease body fat, or promote ethylglutaryl-CoA reductase inhibitor is administered to the weight loss. Using this approach, one may be able to achieve subject. Non-limiting examples of suitable hydroxymethyl therapeutic efficacy with lower dosages of each agent, thus glutaryl-CoA reductase inhibitors include lovastatin, atorv reducing the potential for adverse side effects. In one embodi astatin, fluvastatin, pravastatin, simvastatin, and rosuvastatin. ment, acarbose may be administered with a composition of Alternatively, the diet of the subject may be formulated by the invention. Acarbose is an inhibitor of C-glucosidases and changing the composition of glycans (e.g., polyfructose-con is required to break down carbohydrates into simple Sugars taining oligosaccharides) in the diet that are preferred by within the gastrointestinal tract of the subject. In another polysaccharide degrading bacterial components of the micro embodiment, an appetite Suppressant, such as an amphet biota (e.g., Bacteroides spp) when in the presence of meso amine, or a selective serotonin reuptake inhibitor, such as philic methanogenic archaeal species such as Methanobrevi Sibutramine, may be administered with a composition of the bacter Smithii. invention. In still another embodiment, a lipase inhibitor such 0070 Generally speaking, when the archaeal population as orlistat, or an inhibitor of lipid absorption Such as Xenical, in the Subjects gastrointestinal tract is decreased in accor may be administered with a composition of the invention. dance with the methods described above, the polysaccharide iii. Restricted Calorie Diet degrading properties of the Subject's gastrointestinal micro 0066. Optionally, in addition to administration of a com biota is altered such that microbial-mediated carbohydrate position of the invention for weight loss, a Subject may also be metabolism or its efficiency is decreased. Typically, depend placed on a restricted calorie diet. Restricted calorie diets ing upon the embodiment, the transcriptome and the metabo maybe helpful for increasing the relative abundance of lome of the gastrointestinal microbiota is altered. In one Bacteroidetes and decreasing the relative abundance of Fir embodiment, the microbe is a saccharolytic bacterium. In one micutes and/or Actinobacteria. Several restricted calorie diets alternative of this embodiment, the saccharolytic bacterium is known in the art are suitable for use in combination with the a Bacteroides species. In a further alternative embodiment, compositions of the invention. Representative diets include a the bacterium is Bacteroides thetaiotaOmicron. Typically, the reduced fat diet, reduced protein, or a reduced carbohydrate carbohydrate will be a plant polysaccharide or dietary fiber. diet. Plant polysaccharides may include starch, fructan, cellulose, iv. Alteration of the Gastrointestinal Archaeon Population hemicellulose, and pectin. 0067. An anti-archaeal compound may be included in a 0071. The compounds utilized in this invention to alter the composition of the invention to decrease energy harvesting, archaeon population may be administered by any number of decrease fat storage, and/or decrease weight gain. To promote routes including, but not limited to, oral, intravenous, intra weight loss in a Subject, the gutarchaeon population is altered muscular, intra-arterial, intramedullary, intrathecal, intraven such that microbial-mediated carbohydrate metabolism or its tricular, pulmonary, transdermal, Subcutaneous, intraperito efficiency is decreased in the Subject, whereby decreasing neal, intranasal, enteral, topical, Sublingual, or rectal means. microbial-mediated carbohydrate metabolism or its effi 0072 The actual effective amounts of compound ciency promotes weight loss in the Subject. described herein can and will vary according to the specific 0068 Accordingly, in one embodiment, the subjects gas composition being utilized, the mode of administration and trointestinal archaeal population is altered so as to promote the age, weight and condition of the Subject. Dosages for a weight loss in the Subject. Typically, the presence of at least particular individual subject can be determined by one of one genera of archaeon that resides in the gastrointestinal ordinary skill in the art using conventional considerations. tract of the subject is decreased. In most embodiments, the Those skilled in the art will appreciate that dosages may also archaeon is generally a mesophilic methanogenic archaea. In be determined with guidance from Goodman & Gilman's The one alternative of this embodiment, the presence of at least Pharmacological Basis of Therapeutics, Ninth Edition one species from the genera Methanobrevibacter or Metha (1996), Appendix II, pp. 1707-1711 and from Goodman & nosphaera is decreased. In another alternative embodiment, Gilman's The Pharmacological Basis of Therapeutics, Tenth the presence of Methanobrevibacter Smithii is decreased. In Edition (2001), Appendix II, pp. 475-493. still another embodiment, the presence of Methanosphaera 0073. By way of non-limiting example, weight loss may stadtmanae is decreased. In yet another embodiment, the be promoted by administering an HMG-CoA reductase US 2014/O128289 A1 May 8, 2014

inhibitor to a subject. In an exemplary embodiment, the 1711 and from Goodman & Goldman's The Pharmacological inhibitor will selectively inhibit the HMG-CoA reductase Basis of Therapeutics, Tenth Edition (2001), Appendix II, pp. expressed by M. Smithii and not the HMG-CoA reductase 475-493. expressed by the Subject. In another embodiment, a second 0077. As described above, an HMG-CoA reductase HMG CoA-reductase inhibitor may be administered that inhibitor may be specific for the M. Smithii enzyme, or for the selectively inhibits the HMGCoA-reductase expressed by the Subject's enzyme, depending, in part, on the selectivity of the subject in lieu of the HMG-CoA reductase expressed by M. particular inhibitor and the area the inhibitor is targeted for Smithii. In yet another embodiment, an HMG-CoA reductase release in the subject. For example, an inhibitor may be tar inhibitor that selectively inhibits the HMG-CoA reductase geted for release in the upper portion of the gastrointestinal expressed by the Subject may be administered in combination tract of a subject to substantially inhibit the subjects enzyme. with an HMG-CoA reductase inhibitor that selectively inhib its the HMG-CoA reducase expressed by M. Smithii. One In contrast, the inhibitor may be targeted for release in the means that may be utilized to achieve such selectivity is via lower portion of the gastrointestinal tract of a Subject, i.e., the use of time-release formulations as discussed below or by where M. Smithii resides, then the inhibitor may substantially otherwise altering the properties of the compounds so that inhibit M. Smithii's enzyme. they will not, or will, be efficiently absorbed from the gas 0078. In order to selectively control the release of an trointestinal tract. Alternatively, the compound that selec inhibitor to a particular region of the gastrointestinal tract for tively inhibits the HMG-CoA reductase expressed by M. release, the pharmaceutical compositions of the invention Smithii may be poorly absorbed by gastrointestinal tract of the may be manufactured into one or several dosage forms for the subject. Compounds that inhibit HMG-CoA reductase are controlled, sustained or timed release of one or more of the well known in the art. For instance, non-limiting examples ingredients. In this context, typically one or more of the include atorvastatin, pravastatin, rosuvastatin, and other ingredients forming the pharmaceutical composition is statins. microencapsulated or dry coated prior to being formulated 0074 These compounds, for example HMG-CoA reduc into one of the above forms. By varying the amount and type tase inhibitors, may be formulated into pharmaceutical com of coating and its thickness, the timing and location of release positions and administered to Subjects to promote weight of a given ingredient or several ingredients (in either the same loss. According to the present invention, a pharmaceutical dosage form, such as a multi-layered capsule, or different composition includes, but is not limited to, pharmaceutically dosage forms) may be varied. acceptable salts, esters, salts of Such esters, or any other 0079. In an exemplary embodiment, the coating may bean adduct orderivative which upon administration to a Subject in enteric coating. The enteric coating generally will provide for need is capable of providing, directly or indirectly, a compo controlled release of the ingredient, such that drug release can sition as otherwise described herein, or a metabolite or resi be accomplished at Some generally predictable location in the due thereof, e.g., a prodrug. lower intestinal tract below the point at which drug release 0075. The pharmaceutical compositions maybe adminis would occur without the enteric coating. In certain embodi tered by several different means that will deliver a therapeu ments, multiple enteric coatings may be utilized. Multiple tically effective dose. Such compositions can be administered enteric coatings, in certain embodiments, may be selected to orally, parenterally, by inhalation spray, rectally, intrader release the ingredient or combination of ingredients at various mally, intracisternally, intraperitoneally, transdermally, regions in the lower gastrointestinal tract and at various times. bucally, as an oral or nasal spray, or topically (i.e. powders, 0080. As will be appreciated by a skilled artisan, the ointments or drops) in dosage unit formulations containing encapsulation or coating method can and will vary depending conventional nontoxic pharmaceutically acceptable carriers, upon the ingredients used to form the pharmaceutical com adjuvants, and vehicles as desired. Topical administration position and coating, and the desired physical characteristics may also involve the use of transdermal administration Such of the microcapsules themselves. Additionally, more than one as transdermal patches or iontophoresis devices. The term encapsulation method may be employed so as to create a parenteral as used herein includes Subcutaneous, intravenous, multi-layered microcapsule, or the same encapsulation intramuscular, or intrasternal injection, or infusion tech method may be employed sequentially so as to create a multi niques. In an exemplary embodiment, the pharmaceutical layered microcapsule. Suitable methods of microencapsula composition will be administered in an oral dosage form. tion may include spray drying, spinning disk encapsulation Formulation of drugs is discussed in, for example, Hoover, (also known as rotational Suspension separation encapsula John E., Remington's Pharmaceutical Sciences, Mack Pub tion), Supercritical fluid encapsulation, air Suspension lishing Co., Easton, Pa. (1975), and Liberman, H. A. and microencapsulation, fluidized bed encapsulation, spray cool Lachman, L., Eds. Pharmaceutical Dosage Forms, Marcel ing/chilling (including matrix encapsulation), extrusion Decker, New York, N.Y. (1980). encapsulation, centrifugal extrusion, coacervation, alginate 0076. The amount of an HMG-CoA reductase inhibitor beads, liposome encapsulation, inclusion encapsulation, col that constitutes an “effective amount can and will vary. The loidosome encapsulation, Sol-gel microencapsulation, and amount will depend upon a variety of factors, including other methods of microencapsulation known in the art. whether the administration is in single or multiple doses, and Detailed information concerning materials, equipment and individual Subject parameters including age, physical condi processes for preparing coated dosage forms may be found in tion, size, and weight. Those skilled in the art will appreciate Pharmaceutical Dosage Forms: Tablets, eds. Lieberman et al. that dosages may also be determined with guidance from (New York: Marcel Dekker, Inc., 1989), and in Ansel et al., Goodman & Goldman's The Pharmacological Basis of Pharmaceutical Dosage Forms and Drug Delivery Systems, Therapeutics, Ninth Edition (1996), Appendix II, pp. 1707 6th Ed. (Media, Pa.. Williams & Wilkins, 1995). US 2014/O128289 A1 May 8, 2014

II. Biomarkers Comprising the Gut Microbiome ecules on substrates are well known in the art, i.e. VLSIPS technology from Affymetrix (e.g., see U.S. Pat. No. 6,566, 0081. Another aspect of the invention encompasses use of 495, and Rockett and Dix, “DNA arrays: technology, options the gut microbiome as a biomarker for obesity. The biomarker and toxicological applications.' Xenobiotica 30(2): 155-177, may be utilized to construct arrays that may be used for all of which are hereby incorporated by reference in their several applications including as a diagnostic or prognostic entirety). tool to determine obesity risk, judge the efficacy of existing weight loss regimes, aid in drug discovery, identify additional 0086. In one embodiment, the biomolecule or biomol biomarkers involved in obesity oran obesity related disorder, ecules attached to the Substrate are located at a spatially and aid in the discovery of therapeutic targets involved in the defined address of the array. Arrays may comprise from about regulation of energy balance, including but not limited to 1 to about several hundred thousand addresses or more. In one those that may directly affect the composition of the gut embodiment, the array may be comprised of less than 10,000 microbiome. Generally speaking, the array may comprise addresses. In another alternative embodiment, the array may biomolecules modulated in an obese host microbiome or a be comprised of at least 10,000 addresses. In yet another lean host microbiome. alternative embodiment, the array may be comprised of less than 5,000 addresses. In still another alternative embodiment, the array may be comprised of at least 5,000 addresses. In a (a) Array further embodiment, the array may be comprised of less than 0082. The array may be comprised of a substrate having 500 addresses. In yet a further embodiment, the array may be disposed thereon at least one biomolecule that is modulated in comprised of at least 500 addresses. an obese host microbiome compared to a lean host micro I0087. A biomolecule may be represented more than once biome. Several substrates suitable for the construction of on a given array. In other words, more than one address of an arrays are known in the art, and one skilled in the art will array may be comprised of the same biomolecule. In some appreciate that other Substrates may become available as the embodiments, two, three, or more than three addresses of the art progresses. The Substrate may be a material that may be array may be comprised of the same biomolecule. In certain modified to contain discrete individual sites appropriate for embodiments, the array may comprise control biomolecules the attachment or association of the biomolecules and is ame and/or control addresses. The controls may be internal con nable to at least one detection method. Non-limiting trols, positive controls, negative controls, or background con examples of Substrate materials include glass, modified or trols. functionalized glass, plastics (including acrylics, polystyrene I0088. The array may be comprised of biomolecules and copolymers of styrene and other materials, polypropy indicative of an obese host microbiome (e.g. the nucleic acid lene, polyethylene, polybutylene, polyurethanes, Teflon.J. sequences listed in Table 13). Alternatively, the array may be etc.), nylon or nitrocellulose, polysaccharides, nylon, resins, comprised of biomolecules indicative of a lean host micro silica or silica-based materials including silicon and modified biome (e.g. the nucleic acid sequences listed in Table 14). A silicon, carbon, metals, inorganic glasses and plastics. In an biomolecule is “indicative' of an obese or lean microbiome if exemplary embodiment, the Substrates may allow optical it tends to appear more often in one type of microbiome detection without appreciably fluorescing. compared to the other. Additionally, the array may be com 0083. A substrate may be planar, a substrate may be a well, prised of biomolecules that are modulated in the obese host i.e. a 364 well plate, or alternatively, a substrate may be a microbiome compared to the lean host microbiome. As used bead. Additionally, the substrate may be the inner surface of a herein, "modulated may refer to a biomolecule whose rep tube for flow-through sample analysis to minimize sample resentation or activity is different in an obese host micro volume. Similarly, the substrate may be flexible, such as a biome compared to a lean host microbiome. For instance, flexible foam, including closed cell foams made of particular modulated may refer to a biomolecule that is enriched, plastics. depleted, up-regulated, down-regulated, degraded, or stabi 0084. The biomolecule or biomolecules may be attached lized in the obese host microbiome compared to a lean host to the substrate in a wide variety of ways, as will be appreci microbiome. In one embodiment, the array may be comprised ated by those in the art. The biomolecule may either be syn of a biomolecule enriched in the obese host microbiome thesized first, with subsequent attachment to the substrate, or compared to the lean host microbiome. In another embodi may be directly synthesized on the substrate. The substrate ment, the array may be comprised of a biomolecule depleted and the biomolecule may be derivatized with chemical func in the obese host microbiome compared to the lean host tional groups for Subsequent attachment of the two. For microbiome. In yet another embodiment, the array may be example, the substrate may be derivatized with a chemical comprised of a biomolecule up-regulated in the obese host functional group including, but not limited to, amino groups, microbiome compared to the lean host microbiome. In still carboxyl groups, oxo groups or thiol groups. Using these another embodiment, the array may be comprised of a bio functional groups, the biomolecule may be attached using molecule down-regulated in the obese host microbiomecom functional groups on the biomolecule either directly or indi pared to the lean host microbiome. In still yet another rectly using linkers. embodiment, the array may be comprised of a biomolecule 0085. The biomolecule may also be attached to the sub degraded in the obese host microbiome compared to the lean strate non-covalently. For example, a biotinylated biomol host microbiome. In an alternative embodiment, the array ecule can be prepared, which may bind to Surfaces covalently may be comprised of a biomolecule stabilized in the obese coated with Streptavidin, resulting in attachment. Alterna host microbiome compared to the lean host microbiome. tively, a biomolecule or biomolecules may be synthesized on I0089 Generally speaking, an array of the invention may the Surface using techniques such as photopolymerization comprise at least one biomolecule indicative of, or modulated and photolithography. Additional methods of attaching bio in, an obese host microbiome compared to a lean host micro molecules to arrays and methods of synthesizing biomol biome. In one embodiment, the array may comprise at least 5, US 2014/O128289 A1 May 8, 2014 10

10, 15, 20, 25, 30, 35,40, 45,50,55, 60, 65,70, 75,80, 85,90, parameters of the respective programs (e.g., BLASTX and 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, BLASTN) are employed. See http://www.ncbi.nlm.nih.gov 155, 160, 165, 170, 175, 180,185, 190, 195, 200, 205, 210, for more details. 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 275, 280, 0094 For each of the above embodiments, methods of 285,290, 295,300, 305,310,315, 320, 325, 330, 335, 340, determining biomolecules that are indicative of, or modulated 345,350,355,360,365,370,375, 380,385, 390,395, or 400 in, an obese host microbiome compared to a lean host micro biomolecules indicative of, or modulated in, an obese host biome may be determined using methods detailed in the microbiome compared to a lean host microbiome. In another Examples. 0.095 The arrays may be utilized in several suitable appli embodiment, the array may comprise at least 200, at least cations. For example, the arrays may be used in methods for 300, at least 400, at least 500, at least 600, at least 700, at least detecting association between two or more biomolecules. 800, or at least 900 biomolecules indicative of, or modulated This method typically comprises incubating a sample with in, an obese host microbiome compared to a lean host micro the array under conditions such that the biomolecules com biome. prising the sample may associate with the biomolecules 0090. As used herein, “biomolecule' may refer to a attached to the array. The association is then detected, using nucleic acid, an oligonucleic acid, an amino acid, a peptide, a means commonly known in the art, Such as fluorescence. polypeptide, a protein, a lipid, a carbohydrate, a metabolite, “Association,” as used in this context, may refer to hybridiza or a fragment thereof. Nucleic acids may include RNA, DNA, tion, covalent binding, or ionic binding. A skilled artisan will and naturally occurring or synthetically created derivatives. A appreciate that conditions under which association may occur biomolecule may be present in, produced by, or modified by will vary depending on the biomolecules, the Substrate, and a microorganism within the gut. the detection method utilized. As such, suitable conditions may have to be optimized for each individual array created. 0091. In one embodiment, the biomolecules of the array 0096. In yet another embodiment, the array may be used as may be selected from the biomolecules listed in Table 13. For a tool in a method to determine whether a compound has instance, the biomolecules of the array may be selected from efficacy for treatment of obesity oran obesity-related disorder the group comprising nucleic acids corresponding to SEQID in a host. Alternatively, the array may be used as a tool in a NO:1 through SEQID NO:273. In another embodiment, the method to determine whether a compound increases or biomolecules of the array may be selected from the biomol decreases the relative abundance of Bacteriodes, Actinobac ecules listed in Table 14. For instance, the biomolecules of the teria, or Firmicutes in a Subject. Typically, Such methods array may be selected from the group comprising nucleic comprise comparing a plurality of biomolecules of the hosts acids corresponding to SEQ ID NO:274 through SEQ ID microbiome before and after administration of a compound, NO:383. In yet another embodiment, the biomolecules of the such that if the abundance of biomolecules associated with array may be selected from the biomolecules listed in Table obesity decreased after treatment, or the abundance of bio 13 and Table 14, for instance, the nucleic acids corresponding molecules indicative of Bacteroides increases, or the abun to SEQID NO:1 through SEQID NO:383. dance of biomolecules indicative of Firmicutes and/or Acti 0092. Additionally, the biomolecule may beat least 70, 75, nobacteria decreases, the compound may be efficacious in 80, 85, 90, or 95% homologous to a biomolecule listed in treating obesity in a host. Table 13 or Table 14 above. In one embodiment, the biomol 0097. The array may also be used to quantitate the plural ecule may beat least 80, 81, 82, 83, 84,85, 86, 87,88, or 89% ity of biomolecules of the host microbiome before and after homologous to a biomolecule derived from an accession administration of a compound. The abundance of each bio number detailed above. In another embodiment, the biomol molecule in the plurality may then be compared to determine ecule may beat least 90,91, 92,93, 94.95, 96, 97,98, or 99% if there is a decrease in the abundance of biomolecules asso homologous to a biomolecule derived from an accession ciated with obesity after treatment. number detailed above. 0098. In some embodiments, the array may be used as a 0093. In determining whether a biomolecule is substan diagnostic or prognostic tool to identify Subjects that are tially homologous or shares a certain percentage of sequence Susceptible to more efficient energy harvesting, and therefore, identity with a sequence of the invention, sequence similarity more Susceptible to weight gain and/or obesity. Such a may be defined by conventional algorithms, which typically method may generally comprise incubating the array with allow introduction of a small number of gaps in order to biomolecules derived from the subject’s gut microbiome to achieve the best fit. In particular, “percent identity” of two determine the relative abundance of nucleic acids or nucleic polypeptides or two nucleic acid sequences is determined acid products associated with Bacteroidetes, Actinobacteria, using the algorithm of Karlin and Altschul (Proc. Natl. Acad. or Firmictues. In some embodiments, the array may be used to Sci. USA 87:2264-2268, 1993). Such an algorithm is incor determine the relative abundance of Mollicutes, Mollicute porated into the BLASTN and BLASTX programs of Alts associated nucleic acids, or Mollicute-associated nucleic acid chulet al. (J. Mol. Biol. 215:403-410, 1990). BLAST nucle products in a Subject's gut microbiome. Methods to collect, otide searches may be performed with the BLASTN program isolate, and/or purify biomolecules from the gut microbiome to obtain nucleotide sequences homologous to a nucleic acid of a subject to be used in the above methods are known in the molecule of the invention. Equally, BLAST protein searches art, and are detailed in the examples. may be performed with the BLASTX program to obtain amino acid sequences that are homologous to a polypeptide (b) Microbiome Profiles of the invention. To obtaingapped alignments for comparison 0099. The present invention also encompasses use of the purposes, Gapped BLAST is utilized as described in Altschul microbiome as a biomarker to construct microbiome profiles. et al. (Nucleic Acids Res. 25:3389-3402, 1997). When utiliz Generally speaking, a microbiome profile is comprised of a ing BLAST and Gapped BLAST programs, the default plurality of values with each value representing the abun US 2014/O128289 A1 May 8, 2014

dance of a microbiome biomolecule. The abundance of a reference profile most similar to the host microbiome profile, microbiome biomolecule may be determined, for instance, by such that if the host’s microbiome is most similar to a refer sequencing the nucleic acids of the microbiome as detailed in ence obese microbiome, the host is at risk for obesity or an the examples. This sequencing data may then be analyzed by obesity-related disorder. The microbiome profile from the known Software, as detailed in the examples, to determine the host may be determined using an array of the invention. The abundance of a microbiome biomolecule in the analyzed reference profiles may be stored on a computer-readable sample. The abundance of a microbiome biomolecule may medium such that software known in the art and detailed in also be determined using an array described above. For the examples may be used to compare the microbiome profile instance, by detecting the association between a biomol and the reference profiles. ecules comprising a microbiome sample and the biomol 0104. The host microbiome may be derived from a subject ecules comprising the array, the abundance of a microbiome that is a rodent, a human, a livestock animal, a companion biomolecule in the sample may be determined. animal, or a Zoological animal. In one embodiment, the host 0100. A profile may be digitally-encoded on a computer microbiome is derived from a rodent, i.e. a mouse, a rat, a readable medium. The term “computer-readable medium' as guinea pig, etc. In another embodiment, the host microbiome used herein refers to any medium that participates in provid is derived from a human. In a yet another embodiment the ing instructions to a processor for execution. Such a medium host microbiome is derived from a livestock animal. Non may take many forms, including but not limited to non-vola limiting examples of livestock animals include pigs, cows, tile media, Volatile media, and transmission media. Non horses, goats, sheep, llamas and alpacas. In still another Volatile media may include, for example, optical or magnetic embodiment, the host microbiome is derived from a compan disks. Volatile media may include dynamic memory. Trans ion animal. Non-limiting examples of companion animals mission media may include coaxial cables, copper wire and include pets. Such as dogs, cats, rabbits, and birds. In still yet fiber optics. Transmission media may also take the form of another embodiment, the host microbiome is derived from a acoustic, optical, or electromagnetic waves, such as those Zoological animal. As used herein, a "Zoological animal' generated during radio frequency (RF) and infrared (IR) data refers to an animal that may be found in a Zoo. Such animals communications. Common forms of computer-readable may include non-human primates, large cats, wolves, and media include, for example, a floppy disk, a flexible disk, hard bears. disk, magnetic tape, or other magnetic medium, a CD-ROM, CDRW, DVD, or other optical medium, punch cards, paper III. Kits tape, optical mark sheets, or other physical medium with patterns of holes or other optically recognizable indicia, a 0105. The present invention also encompasses a kit for RAM, a PROM, and EPROM, a FLASH-EPROM, or other evaluating a compound, therapeutic, or drug. Typically, the memory chip or cartridge, a carrier wave, or other medium kit comprises an array and a computer-readable medium. The from which a computer can read. array may comprise a substrate, the Substrate having disposed 0101 A particular profile may be coupled with additional thereon at least one biomolecule that is modulated in an obese data about that profile on a computer readable medium. For host microbiome compared to a lean host microbiome. The instance, a profile may be coupled with data about what computer-readable medium may have a plurality of digitally therapeutics, compounds, or drugs may be efficacious for that encoded profiles wherein each profile of the plurality has a profile, or about other features of the subjects digestive plurality of values, each value representing the abundance of health when consuming a given diet or set of diets. Con a biomolecule in a host microbiome detected by the array. The versely, a profile may be coupled with data about what thera array may be used to determine a profile for a particular host peutics, compounds, or drugs may not be efficacious for that under particular conditions, and then the computer-readable profile. Alternatively, a profile may be coupled with known medium may be used to determine if the profile is similar to risks associated with that profile. Non-limiting examples of known profile stored on the computer-readable medium. the type of risks that might be coupled with a profile include Non-limiting examples of possible known profiles include disease or disorder risks associated with a profile. The com obese and lean profiles for several different hosts, for puter readable medium may also comprise a database of at example, rodents, humans, livestock animals, companion least two distinct profiles. animals, or Zoological animals. 0102. Such a profile may be used, for instance, in a method of selecting a compound for treating obesity or an obesity Definitions related disorder in a host. Generally speaking, Such a method would comprise providing a microbiome profile from the host 0106 The term “abundance” refers to the representation and providing a plurality of reference microbiome profiles, of a given taxonomic group (e.g. phylum, order, family, gen each associated with a compound, and selecting the reference era, or species) of microorganism present in the gastrointes profile most similar to the host microbiome profile, to thereby tinal tract of a Subject. select a compound for treating obesity or an obesity-related 0107 The term “activity of the microbiota population' disorder in the host. The host profile and each reference refers to the microbiome's ability to harvest energy and nutri profile may comprise a plurality of values, each value repre entS. senting the abundance of a microbiome biomolecule. 0108. The term “antagonist” refers to a molecule that 0103) The microbiome profiles may be utilized in a variety inhibits or attenuates the biological activity of a Fiaf polypep of applications. For example, the microbiome profiles may be tide and in particular, the ability of Fiaf to inhibit LPL, and/or used in a method for predicting risk for obesity or an obesity the ability of the microbiota to regulate Fiaf. Antagonists may related disorder in a host. The method comprises, in part, include proteins such as antibodies, nucleic acids, carbohy providing a microbiome profile from a host, and providing a drates, Small molecules, or other compounds or compositions plurality of reference microbiome profiles, then selecting the that modulate the activity of a Fiaf polypeptide either by US 2014/O128289 A1 May 8, 2014 12 directly interacting with the polypeptide or by acting on com 0119. A “subject in need of treatment for obesity’ gener ponents of the biological pathway in which Fiaf participates. ally will have at least one of three criteria: (i) BMI over 30; (ii) 0109 The term “agonist” refers to a molecule that 100 pounds overweight; or (iii) 100% above an “ideal body enhances or increases the biological activity of a Fiaf weight as determined by generally recognized weight charts. polypeptide and in particular, the ability of Fiaf to inhibit 0.120. As various changes could be made in the above LPL. Agonists may include proteins, peptides, nucleic acids, compounds, products and methods without departing from carbohydrates, Small molecules (e.g., Such as metabolites), or the scope of the invention, it is intended that all matter con other compounds or compositions that modulate the activity tained in the above description and in the examples given of a Fiaf polypeptide either by directly interacting with the below, shall be interpreted as illustrative and not in a limiting polypeptide or by acting on components of the biological SS. pathway in which Fiaf participates. I0121 The following examples are included to demon 0110. The term “altering as used in the phrase “altering strate preferred embodiments of the invention. It should be the microbiota population' is to be construed in its broadest appreciated by those of skill in the art that the techniques interpretation to mean a change in the representation of disclosed in the examples that follow represent techniques microbes or the functions/activities of microbial communi discovered by the inventors to function well in the practice of ties in the gastrointestinal tract of a subject. The change may the invention. Those of skill in the art should, however, in light be a decrease or an increase in the presence of a particular of the present disclosure, appreciate that many changes can be microbial species, genus, family, order, or class, or change in made in the specific embodiments that are disclosed and still the expression of microbial community associated nucleic obtain a like or similar result without departing from the spirit acids or a change in the protein and metabolic products pro and scope of the invention. Therefore all matter set forth or duced by members of the community. shown in the accompanying drawings is to be interpreted as 0111. “BMI as used herein is defined as a human sub illustrative and not in a limiting sense. jects weight (in kilograms) divided by height (in meters) Squared. EXAMPLES 0112 An “effective amount” is a therapeutically-effective amount that is intended to qualify the amount of agent that 0.122 The following examples illustrate various iterations will achieve the goal of a decrease in body fat, or in promoting of the invention. weight loss. 0113 Fas stands for fatty acid synthase. Example 1 0114 Fiaf stands for fasting-induced adipocyte factor, also known as angiopoietin like protein 4 (Angplit14). The Gut Microbiota is Linked to Family and BMI 0115 LPL stands for lipoprotein lipase. I0123. The bacterial lineages of the human gut microbiota 0116. The term “obesity-related disorder” includes disor are largely unexplored. In this study, the lineages of gut ders resulting from, at least in part, obesity. Representative microbiota of 31 monozygotic (MZ) twin pairs, 23 dizygotic disorders include metabolic syndrome, type II diabetes, (DZ) twin pairs, and where available their mothers (n=46), hypertension, cardiovascular disease, and nonalcoholic fatty were characterized. (Tables 1-5). MZ and DZ co-twins and liver disease. parent-offspring pairs provide an attractive paradigm for 0117 The term “metagenomics' refers to the application assessing the impact of genotype and shared early environ of modern genomic techniques to the study of the composi ment exposures on the gut microbiome. Moreover, geneti tion and operations of communities of microbial organisms cally identical MZ twin pairs gain weight in response to sampled directly in their natural environments, by passing the overfeeding in a more reproducible way than do unrelated need for isolation and lab cultivation of individual species. individuals and are more concordant for body mass index 0118 PPAR stands for peroxisome proliferator-activator (BMI) than dizygotic twin pairs, Suggesting shared features receptor. of their energy balance influenced by host genotype. TABLE 1 V2/31 16S rRNA gene Sequencing statistics

Data ID Months time- Family Twin? BMI without Total SubjectID point number Mom Ancestry Zygosity category Antibiotics sequences F1T1Le1 TS1 1 Twin EA MZ LC8 >6 641S F1T1Le2 TS1.2 1 Twin EA MZ LC8 >6 1627 F1 T2Le1 TS2 1 Twin EA MZ LC8 NA 15495 F1 T2Le2 TS2.2 1 Twin EA MZ LC8 >6 1957 F1MOy1 TS3 1 Mom EA NA Overweight >6 7870 F1MOw2 TS3.2 1 Mom EA NA Overweight >6 1799 F2T1Le1 TS4 2 Twin EA MZ LC8 >6 9343 F2T1Le2 TS42 2 Twin EA MZ LC8 >6 2886 F2T2Le1 TSS 2 Twin EA MZ LC8 >6 13991 F2T2Le2 TSS.2 2 Twin EA MZ LC8 >6 3606 F2MOb1 TS6 2 Mom EA NA Obese >6 7717 F2MOb2 TS6.2 2 Mom EA NA Obese >6 432S F3T1Le1 TS7 3 Twin EA MZ LC8 >6 11808 F3T1Le2 TS7.2 3 Twin EA MZ LC8 >6 2962

US 2014/O128289 A1 May 8, 2014 16

TABLE 1-continued V2/31 165 rRNA gene sequencing statistics

Data I Months time Twin BMI without Total Subject ID point Mom Anc stry Zygosity category Antibiotics sequences

TS 49 win EA. Obese 11555 TS 50 Mom A. Obese 8045 TS 51 win Obese 3800 TS 51.2 win Obese 3210 TS 52 win Obese 3326 TS 52.2 win Obese 2742 TS 53 Mom Overweight 4118 TS 54.2 win Lean 1466 TS 55 win Lean 2267 TS 55.2 win Lean 2361 TS 56 Mom Obese 1694 TS 56.2 Mom Obese 1906 TS 60 win Obese 2367 TS 60.2 win Obese 2049 TS 61 win Obese 218S TS 62 Mom Obese 3564 TS 62.2 Mom Obese 4041 TS 63 win Lean 1624 TS 63.2 win Lean 2495 TS 64 win Lean 2651 TS 642 win Lean 3.018 TS 65 Mom Lean 2767 TS 65.2 Mom Lean 2839 TS 66 win eSe 3628 2 TS 66.2 win eSe 3252 TS 67 win eSe 2822 TS 67.2 win eSe 4538 TS 68 Mom eSe 2882 TS 68.2 Mom eSe 4569 TS 69 win eSe 4217 2 TS 69.2 win eSe 3644 TS 70 win eSe 2117 2 TS 70.2 win eSe 2785 TS 78 win eSe 2378 2 TS 78.2 win eSe 2894 TS 79 win eSe 2122 2 TS 79.2 win eSe 31.89 TS 8O Mom (8. 2132 TS 81 win eSe 3455 2 TS 81.2 win eSe 2812 TS 82 win verweight 7014 TS 82.2 win eSe 6903 TS 83 Mom eSe 3243 TS 83.2 Mom eSe 2884 TS 84 win Lean 1925 TS 85 win Lean 2545 TS 85.2 win Lean 2.538 TS 86 Mom Overweig t 1735 TS 90 win Obese 316S TS 91 win Obese 2720 TS 92 Mom Overweig t 5067 TS 93 win EA. Lean 1799 TS 93.2 win Lean 1739 TS 94 win Lean 2291 TS 94.2 win Lean 1612 TS 95 Mom Lean 2782 TS 95.2 Mom Lean 2462

TOTAL 119519 US 2014/O128289 A1 May 8, 2014

TABLE 2 TABLE 3 V6 16S rRNA gene sequencing statistics Full-length 16S rRNA gene sequencing statistics

Spect Data ID Twin/Mom Family BMI Sequences Subject ID Data ID Twin/Mom Family BMI Sequences F1T1L, TS1 1 L 25,140 F1T1Le TS1 Twin 1 Lean 349 e Will (8. s F1 T2L S2 1 L 351 F1 T2Le1 TS2 Twin 1 Lean 42,186 Mos TS3 n 1 overweight 331 F1MOy TS3 Mom 1 Overweight 17,726 F2T1L. TS4 2 L 351 F2T1Le1 TS4 Twin 2 Lean 25,705 21 le win (8. F2T2Le1 TSS Twin 2 Lean 26,608 F2T2Le TS5 Will 2 Lean 345 F2MOb1 TS6 Mom 2 Obese 27,007 2MOb Ts6 Mom 2 Obese 348 F3T1Le1 TS7 Twin 3 Lean 17,469 F3T Le TS7 win 3 Lean 237 F3T2Le1 TS8 Twin 3 Lean 17,170 F3T2Le S8 Will 3 Lean 3S4 F3MOy1 TS9 Mom 3 Overweight 14,787 F3MOy TS9 Mom 3 Overweight 357 FST1Le1 TS13 Twin 5 Lean 15,296 FST1Le TS13 Twin 5 Lean 337 FST2Le1 TS14 Twin 5 Lean 14.220 FST2Le TS14 Twin 5 Lean 350 s M. TS15 Mom 5 reight 14,244 FSMOy TS15 Mom 5 Overweight 338 78. f win is: F7T1Ob1 TS19 Twin 7 Obese 333 Will (St. s - F7MOb1 TS21 Mom 7 Obese 23,714 FT2Obl TS20 Will 7 Obese 340 F9T1Le TS25 Twin 9 Lean 2O491 F7MOb S21 Mom 7 Obese 332 F9T2Le1 TS26 Twin 9 Lean 27,626 F9T1Le TS25 Twin 9 Lean 351 F9MOb1 TS27 Mom 9 Obese 25,494 F9T2Le TS26 Twin 9 Lean 252 F1 OT1 Ob1 TS28 Twin 10 Obese 20,905 F9MOb TS27 Mom 9 Obese 343 F1 OT2Ob1 TS29 Twin 10 Obese 15,698 F1 OT1 Ob1 TS28 Twin 10 Obese 344 F10Movi TS30 Mom 10 Overweight 32,083 F1 OT2Ob1 TS29 Twin 10 Obese 337 A. f win in A. F1 OMOw1 TS30 Mom 10 Overweight 261 e Will (8. s 1 a F11MOw1 TS33 Mom 11 Overweight 28,962 F15Tobi TS49 win 15 Obese 338 F1ST1Ob1 TS49 Twin 15 Obese 22,201 FIST2Obl TSS) Will 15 Obese 319 F15T2Ob1 TS50 Twin 15 Obese 30,498 F1SMOb1 S51 Mom 15 Obese 331 F15MOb1 TS51 Mom 15 Obese 22,691 F16T1Ob1 TS55 Twin 16 Obese 353 F16T1Ob1 TS55 Twin 16 Obese 37,027 F16T2Ob1 TSS6 Twin 16 Obese 278 F16T2Ob1 TSS6 Twin 16 Obese 31,512 F16MOb1 TS57 Mom 16 Obese 348 F16Mobil TS57 Mom 16 Obese 30,392 F43T1 Ob1 TS148 Twin 43 Obese 323 SES T wn i 3. F43T2Ob1 TS149 Twin 43 Obese 340 Will (St. s F43MOb1 TS150 Mom 43 Obese 23,463 43MOb1 S1SO Mom 43 Obese - -

TOTAL 817,942 TOTAL 9,920 ID nomenclature: Family number, Twin number or mother, and BMI category (Le = lean; ID nomenclature: Family number, Twin number or mother, and BMIcategory (Le = lean; Ov = overweight, Ob = obese; e.g. F1 T1Le stands for family 1, twin 1, lean) Ov = overweight, Ob = obese; e.g. F1 T1LE stands for family 1, twin 1, lean)

TABLE 4 Phytotypes shared acrosse 70% of all individuals (V2/3 dataset: 1,000 random Sequences/individual) Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification 1 151 98.1 7942 28.7 O 6.53 + 0.41 Bacteria: Fimircutes: Clostridia: Faecalibacterium 2 151 98.1 5375 25.5 O 4.41+0.34 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 3 144 93.5 2518 14.7 O 2.06 + 0.16 Bacteria: Firmicutes: Clostridia: Clostridiales 4 143 92.9 S606 3O.S O 4.56 - 0.41 Bacteria: Firmicutes: Clostridia: Clostridiales: Eubacterium reciaie 5 140 90.9 1629 8.1 O 1.34 + 0.11 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium Cliostridioforme 6 134 87.O 757 12.7 O 0.62 + 0.09 Bacteria: Firmicutes: Clostridia: US 2014/O128289 A1 May 8, 2014

TABLE 4-continued Phytotypes shared acrosse 70% of all individuals (V2/3 dataset: 1,000 random sequences/individual)

Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene Sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification Clostridiales: Riminococcits, Riminococcits Schinki 33 86.4 1485 12.2 1.23 - 0.14 Bacteria: Firmicu es: ostridia; ostridiales; op FORCOCO2S 33 86.4 1392 6.5 1.14 - 0.10 8. C eria: Firmicu es: ostridia; ostridiales 33 86.4 O.99. O.12 8. C eria: Firmicu es: ostridia; ostridiales; timinococcits 28 83.1 819 5.2 O.68 0.06 8. C eria: Firmicu es: ostridia; ostridiales 27 82.5 747 3.7 O.62- 0.05 8. C eria: Fimircu es: ostridia; Faecalibacterium 26 81.8 11598 51.6 9.39 O.79 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 25 81.2 2585 34.3 2.15 0.31 Bacteria: Fimircutes: Clostridia: Faecalibacterium 23 79.9 3512 15.3 2.89 0.25 Bacteria: Fimircutes:

77.9 792 8.4 O66 0.08 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 76.6 632 2.7 Bacteria: Fimircutes:

74.7 3422 43.3 2.79 0.41 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 73.4 441 2.3 O.37 0.03 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 72.7 11.68 17.4 O.98 0.16 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 72.1 749 5.2 Bacteria: Firmicutes: Clostridia: Clostridiales 21 70.1 640 3.5 O530.06 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits

1,000 sequences were randomly sampled from a single timepoint for each individual Based on the consensus of 290% sequences within each phylotype (best-BLAST-hit against the Greengenes database) US 2014/O128289 A1 May 8, 2014 19

TABLE 5 Phylotypes shared across >90% of all individuals (V6 dataset: 10,000 random Sequences/individual

Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene Sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification 1 33 10400 9.7 3.40 0.45 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 33 S161 5.9 1.67 0.23 Bacteria: Firmicutes: Clostridiales: Ciostridium nexile; Cliostridium fusiformis 33 6077 6.7 1.97 0.32 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 33 16600 26.8 5.36 1.02 Bacteria: Firmicutes: Clostridia: Clostridiales: Etibacterium reciaie 33 11654 12.5 3.78 O.S8 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 32 97.0 31.13 5.8 O.OOO 1.01 - 0.23 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 32 97.0 2908 4.2 O.OOO O.96 - 0.21 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 32 97.0 2382 3.7 O.OOO O.78 0.13 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 32 97.0 1712 4.4 O.OOO O56 - 0.14 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcus; Riminococcits Schinki 10 31 93.9 3940 6.6 O.OOO 1.29 O.26 Bacteria: Fimircutes: Clostridia: Faecalibacterium 11 31 93.9 3729 4.9 O.OOO 1210.18 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 12 30 90.9 454 0.7 O.OOO O.15 0.03 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 13 30 90.9 687 1.1 O.OOO O.23 - 0.04 Bacteria: Firmicutes: Clostridia 14 30 90.9 999 2.3 O.OOO O.33- 0.08 Bacteria: Firmicutes: Clostridia: Preptostreptococaceae; Peptostreptococcus anaerobius; Cliostridium bifermenians 15 30 90.9 1241 5.3 O.OOO Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium boiteae 16 30 90.9 160 O.OOO O.05 - 0.01 Bacteria: Actinobacteria: Actinobacteridae: Actinomycineae 17 30 90.9 1417 2.0 O.OOO O-46 - 0.09 Bacteria: Firmicutes: Clostridia: US 2014/O128289 A1 May 8, 2014 20

TABLE 5-continued Phylotypes shared across >90% of all individuals (V6 dataset: 10,000 random Sequences/individual Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification Clostridiales: Riminococcits 18 30 90.9 1014 1.2 O.OOO 0.33- 0.06 Bacteria: Firmicutes: Clostridia: Clostridiales 19 30 90.9 1353 1.6 O.OOO 0.44 + 0.08 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcus; Ruminococcus initi 2O 30 90.9 2686 6.O O.OOO 0.88 + 0.22 Bacteria: Firmicutes: Clostridia: Clostridiales: ciostridioforme 21 30 90.9 7454 12.2 O.OOO 2.43 + 0.63 Bacteria: Fimircutes: Clostridia: Faecalibacterium Based on the consensus taxonomy of >90% sequences within each phylotype (best-BLAST-hit against the Greengenes database)

TABLE 6 Phylotypes shared acrosse 70% of all individivals (Full-length dataset; 200 random Sequences/individual

Meant Semyo Number Highest Lowest of 16S rRNA % of of reads relative relative gene Individuals Individuals grouped abundance abundance sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals Classifications 1 28 93.3 378 17.9 7.81 - 1.04 Bacteria: Firmicutes: Clostridia: Fa e C aibacterium 27 90.0 347 6.90 120 acteria; Firmicutes; ostridia; ostridiales; timinococcits 26 86.7 128 9.9 2.62- 0.47 acteria; Firmicutes; ostridia; ostridiales 26 86.7 298 23.1 6.OO 114 acteria; Firmicutes; ostridia; ostridiales; tibacterium reciaie 26 86.7 127 12.O 2.64 - 0.49 acteria; Firmicutes; ostridia; ostridiales; Cliostridium clostridioforme 22 73.3 110 10.9 2.33 O.S.S Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 22 73.3 87 5.7 1.76 O.29 Bacteria: Firmicutes: Clostridia: Clostridiales: Ciostridium nexile; Cliostridium fusiformis 21 70.O 112 11.9 2.32 0.49 Bacteria: Firmicutes: Clostridia: Clostridiales: Coprococcus US 2014/O128289 A1 May 8, 2014 21

TABLE 6-continued Phylotypes shared acrosse 70% of all individivals (Full-length dataset; 200 random Sequences/individual

Meant Semyo Number Highest Lowest of 16S rRNA % of of reads relative relative gene Individuals Individuals grouped abundance abundance sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals Classifications 9 21 70.O 75 6.9 O.O 1.53 + 0.32 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 10 21 70.O S4 5.7 O.O 1.14 + 0.23 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile Based on the consensus taxonomy of >90% sequences within each phylotype (best-BLAST-hit against the Greengenes database)

Sample Characteristics room temperature, followed by extraction with phenol:chlo roform:isoamyl alcohol, and precipitation with isopropanol. 0.124 Twin pairs who had been enrolled in the Missouri DNA obtained from three separate 10 mg frozen aliquots of Adolescent Female Twin Study (MOAFTS) were recruited each fecal sample were pooled (>200 ug DNA) and used for for this study (mean period of enrollment, 11.7+1.2 years; pyrosequencing (see below). range, 4.4-13.0 years). The MOAFTS twin cohort, comprised Full-Length 16S rRNA Sequence-Based Surveys of female like-sex twin pairs, was identified from Missouri I0126. Five replicate PCR reactions were performed for birth records over the period 1994-1999, when the twins were each fecal DNA sample. To generate full length or near full median age 15. A total of 350 twins from the larger MOAFTS length bacterial 16S rRNA amplicons, each 25 ul reaction cohort completed Screening interviews for the present study. contained 100 ng of gel purified DNA (Qiaquick, Qiagen), 10 Pairs most likely to meet study criteria were identified at the mM Tris (pH 8.3), 50 mM KC1, 2 mM MgSO4, 0.16 uM wave five interview of the MOAFTS twin cohort (which has dNTPs, 0.4 uM of the bacteria-specific primer 8F (5'- 90% retention of wave four participants). Eligibility was then AGAGTTTGATCCTGGCTCAG-3"), 0.4 uMofthe universal confirmed at screening interview. All twins were 25-32 years primer 1391R (5'-GACGGGCGGTGWGTRCA-3'), 0.4 M old, of European or African ancestry (EA and AA, respec betaine, and 3 units of Taq polymerase (Invitrogen). Cycling tively), were generally concordant for obesity (BMD-30 conditions were 94°C. for 2 min, followed by 25 cycles of 94° kg/m) or leanness (BMI=18.5-24.9 kg/m) (1 twin pair was C. for 1 min, 55° C. for 45 sec, and 72° C. for 2 min. Replicate lean/overweight (overweight defined as BMI-25 and <30) PCRs were pooled and concentrated (Millipore; Montage and 6 pairs were overweight/obese, and had not taken anti PCR filter columns). Full-length 16S rRNA gene amplicons biotics for at least 5.49+0.09 months. Each participant com (1.3kb) were then gel-purified using the Qiaquick kit pleted a detailed medical, lifestyle, and dietary questionnaire. (Qiagen), subcloned into TOPOTA pCR4.0 (Invitrogen), and Participants were broadly representative of the overall Mis the ligated DNA transformed into E. coli TOP10 (Invitrogen). souri population with respect to BMI, parity, education, and For each sample, 384 colonies containing cloned 16S rRNA marital status. Although all were born in Missouri, they cur nucleic acid amplicons were processed for sequencing. Plas rently live throughout the USA: 29% live in the same house, mid inserts were sequenced bi-directionally using vector but some live >800 km apart. Since fecal samples are readily specific primers plus the internal primer 907R (5'-CCGT attainable and representative of interpersonal differences in CAATTCCTTTRAGTTT-3'). gut microbial ecology, they were collected from each indi I0127. 16S rRNA gene sequences were edited and vidual and frozen immediately. The collection procedure was assembled into consensus sequences using the PHRED and repeated again with an average interval between sample col PHRAP software packages within the Xplorseq program. lections of 57+4 days. Sequences that did not assemble were discarded and bases with PHRED quality scores <20 were trimmed. Sequences Community DNA Preparation were checked for chimeras using Bellerophon program ver 0.125 Frozen de-identified fecal samples were stored at sion 3 with the default parameters (final dataset n=8,941 near -80° C. before processing. In order to homogenize each full-length 16S rRNA gene sequences; for sequence designa sample, a 10-20 g aliquot of each sample was pulverized in tions see Table 1). Alignments for reference genome 16S liquid nitrogen with a mortar and pestle. An aliquot (~500mg) rRNA gene sequences were manually edited in ARB. of each sample was then Suspended, while frozen, in a solu V2/3 16S rRNA Sequence-Based Surveys tion containing 500 ul of extraction buffer 200 mM Tris (pH I0128. Four replicate PCR reactions targeting the V2/3 8.0), 200 mM. NaCl, 20 mM EDTA), 210 ul of 20% SDS, 500 region of bacterial 16S rRNA genes were performed on the ul of a mixture of phenol: chloroform:isoamyl alcohol (25:24: same fecal DNA samples used above. Each 20 ul reaction 1, pH 7.9), and 500 ul of a slurry of 0.1 mm-diameter Zirconia/ contained 100 ng of gel purified DNA (Qiaquick, Qiagen), 8 silica beads (BioSpec Products, Bartlesville, Okla.). Micro ul 2.5x HotMaster PCR Mix (Eppendorf), 0.3 uM of the bial cells were Subsequently lysed by mechanical disruption primer 8F 5'-GCCTTGCCAGCCCGCTCAG-TCA with a bead beater (BioSpec Products) set on high for 2 minat GAGTTTGATCCTGGCTCAG-3': composite of 454 primer US 2014/O128289 A1 May 8, 2014 22

B (underlined), linker nucleotides (TC), and the universal were identified using the Megablast software and the follow bacterial primer 8F (italics), and 0.3 uM of the primer 338R ing parameters: E-value 1'; minimum coverage, 99%; and 5'-GCCTCCCTCGCGCCATCAGNNNNNNNNCA minimum pairwise identity, 97%. Candidate OTUs were TGCTGCCTCCCGTAGGAGT-3';454 Life Sciences primer identified as sets of sequences connected to each other at this A (underlined), a unique 8 base barcode (Ns), linker nucle level using the top 4000 hits per sequence. Each candidate otides (CA), and the broad-range bacterial primer 338R (ital OTU was considered valid if the average density of connec ics). Cycling conditions were 95°C. for 2 min, followed by tion was above threshold; otherwise it was broken up into 30 cycles of 95°C. for 20 sec, 52° C. for 20 sec, and 65° C. for Smaller connected components. 1 min. Replicate PCRs were pooled and purified with Ampure magnetic purification beads (Agencourt). Tree Building and UniFrac Clustering for PCA Analysis 0129. PCR products were quantified with the bisbenzim I0132 A relaxed neighbor-joining tree was built from one ide Hassay. An aliquot of each PCR product was incubated representative sequence per OTU using Clearcut, employing for 5 minat room temperature in THE reagent 10 mM Trizma the Kimura correction (the PH lanemask was applied to V2/3 HCl pH 8.1, 100 mM. NaCl, 1 mM EDTA, and 50 ng/ml data), but otherwise with default comparisons. Unweighted freshly prepared bisbenzimide H (Sigma). Samples were UniFrac was run using the resulting tree and the counts of read on a flurometer or plate reader (excitation at 365 nm, each sequence in each sample. Principle component analysis emission at 460 nm) relative to a standard curve constructed (PCA) was performed on the resulting matrix of distances using E. coli DNA (Sigma). Multiple pools, each containing between each pair of samples. To determine if the UniFrac approximately equimolar amounts of PCR products, were distances were on average significantly different for pairs of assembled for 454 FLX amplicon pyrosequencing (n=33-100 samples (i.e. between twin-pairs, between twins and their barcoded samples/pool). Technical replicates were analyzed mother, or between unrelated individuals), a t-test was per from selected representatives of each pool across four differ formed on the UniFrac distance matrix, and a p-value was ent sequencing centers; results were highly reproducible, dis generated for the t-statistic by permutation of the rows and criminating between individuals and between samples from columns as in the Mantel test, regenerating the t-statistic for the same individual over time (FIG. 1). 1000 random samples, and using the distribution to obtain an V6 16S rRNA Sequence-Based Surveys empirical p-value. 0130 PCR reactions targeting the V6 region of bacterial 16S rRNA genes were performed on the same fecal DNA Taxonomy Assignment samples used above. Each 32 ul reaction contained 100 ng of gel purified DNA (Qiaquick, Qiagen), PCR buffer (PurePeak 0.133 Taxonomy was assigned using the best-BLAST-hit DNA polymerization mix. Thermo-Fisher), 0.625 mM Pure against Greengenes (E-value cutoff of le", minimum 88% PeakdNTPs (Thermo-Scientific), 0.625uMFusion Primer A, coverage, 88% percent identity) and the Hugenholtz tax 0.625 uM Fusion Primer B, and 5U Pfu polymerase (Strat omony, downloaded May 12, 2008, excluding sequences agene). The primer set included 5 forward primers (Fusion A) annotated as chimeric (http://greengenes.lbl.gov/Download/ and 4 reverse primers (Fusion B) fused to the 454 Life Sci Sequence Data/Greengenes format/). ences adaptors A and B respectively. Cycling conditions were 94°C. for 3 min, followed by 30 cycles of 94° C. for 30 sec, Rarefaction and Phylogenetic Diversity Measurements 57°C. for 45 sec, and 72°C. for 1 min, with a final extension 0.134. To determine which individuals had the most period of 72° C. for 2 min. PCR products were purified with diverse communities of gut bacteria, rarefaction plots and MinElute columns (Qiagen), and DNA was quantified using Phylogenetic Diversity (PD) measurements, as described by a Bioanalyzer (Agilent) and the PicoGreen assay (Invitro Faith (Biological Conservation 1992), were made for each gen). Two pools of PCR products were constructed for 454 sample. PD is the total amount of branch length in a phylo FLX amplicon pyrosequencing, composed of 18 and 20 genetic tree constructed from the combined 16S rRNA samples, respectively (the second run contained 3 samples dataset, leading to the sequences in a given sample. To from the V2/3 region and 3 technical replicates, one addi account for differences in sampling effort between individu tional sample (TS30) was sequenced in a third run, bringing als, and to estimate the thoroughness of sampling of each the total number of V6 samples processed to 33). Since tech individual, the accumulation of PD (branch length) with sam nical replicates were highly reproducible (see above and FIG. pling effort was plotted in a manner analogous to rarefaction 5), datasets for a given individual’s biospecimen were pooled curves. The PD rarefaction curve for each individual was for all Subsequent analyses. Any sequences that did not have generated by applying custom python code that can be down an exact match to the proximal primer or that contained one or loaded from http://bayes.colorado.edu/unifrac, to the Arb more ambiguous bases were removed as low quality. The parsimony insertion tree. proximal primer and any fuzzy matches (identified with BLAST and the fuZZnuc program) to the distal primer were Results then trimmed from the sequences. Finally, any trimmed I0135) To characterize the bacterial lineages present in the sequences shorter than 50 nucleotides were also removed as fecal microbiotas of these 44 individuals, 16S rRNA sequenc low quality. ing was performed, targeting the full-length gene with an ABI 3730x1 capillary sequencer. Additionally, multiplex sequenc Picking Operational Taxonomic Units (OTUs) ing with a 454 FLX pyrosequencer was used to Survey the 0131 Pyrosequencing data was pre-processed to remove V2/3 variable region and the V6 hypervariable region (Tables sequences with low quality scores, sequences with ambigu 1, 2 and 3). Complementary phylogenetic and taxon-based ous characters, or sequences outside of the length bounds methods were used to compare 16S rRNA sequences among (V6<50 nt, V2/3<200 nt) and binned according to sample fecal communities. Phylogenetic clustering with UniFrac is based on the error-correcting barcodes. Similar sequences based on the principle that communities can be compared in US 2014/O128289 A1 May 8, 2014 23 terms of their shared evolutionary history, as measured by the reaching on average 3,984+232 (V2/3) and 24.786+ 1.403 degree to which they share branch length on a phylogenetic (V6) sequences per sample. To control for differences in tree. This approach was complemented with taxon-based coverage between samples, all analyses were performed on methods; these methods disregard some of the information an equal number of randomly selected sequences 200 full contained in the phylogenetic tree of the taxa in question, but length, 1,000 V2/3, and 10,000 V6. At this level of coverage, have the advantage that specific taxa unique to, or shared there was little overlap between the sampled fecal communi among, groups of samples can be identified (e.g., those from ties: only 2, 5, and 21 phylotypes were found in >90% of the leanorobese individuals). Prior to both types of analyses, 16S individuals surveyed (full-length, V2/3, and V6 data respec rRNA gene sequences were grouped into Operational Taxo tively). Moreover, the number of 16S rRNA gene sequences nomic Units (OTUS/phylotypes) using the furthest-neighbor belonging to these phylotypes varied greatly between fecal like algorithm and a sequence identity threshold of 97%, microbiotas (Tables 4, 5 and 6). which is commonly used to define species-level phylotypes. 0.138 Samples taken from the same individual at the initial Taxonomic assignments were made using BLAST and collection point and 57+4 days later were remarkably consis Hugenholtz taxonomy annotations in the Greengenes data tent with respect to the specific phylotypes found (FIGS. 1 and 5), but showed variations in the relative abundance of the base. major gut bacterial phyla (FIG. 6). There was no significant 0136. No matter which region of the 16S rRNA gene was association between UniFrac distance and the time between examined (V2/3 or V6 pyrosequencing reads, or the near sample collections. Overall, fecal samples from the same complete gene from Sanger reads), individuals from the same individual were much more similar to one another than family (a twin and her co-twin, or twins and their mother) had samples from family members or unrelated individuals (FIG. a more similar bacterial community structure than unrelated 2A), demonstrating that short-term temporal changes in com individuals (FIGS. 2A and 3A, B) and shared significantly munity structure within an individual are minor compared to more phylotypes G=55.2, p<10° (V2/3); G=112.3, p<0. inter-personal differences. 001 (V6); G=11.3, p<0.001 (full-length). No significant cor (0.139. After assigning V2/3, V6 and full-length 16S rRNA relation was seen between the degree of physical separation gene sequences to bacterial taxa (see Example 3 below), it of family members’ current homes and the degree of similar was found that obese individuals generally had a lower rela ity between their microbial communities (defined by Uni tive abundance of the Bacteroidetes and a higher relative Frac). The observed familial similarity was not due to an abundance of the Firmicutes and Actinobacteria: the statisti indirect effect of the physiologic states of obesity versus cal significance of these observations varied depending upon leanness; similar results were observed after stratifying twin the sequencing methods used (Table 7), likely due to differ pairs and their mothers by BMI category (concordant lean or ences in PCR conditions (for example, the 8F primer has a concordant obese individuals; FIG. 4). Surprisingly, there known bias against Actinobacteria). was no significant difference in the degree of similarity in the 0140. In Summary, across all methods, obesity was asso gut microbiotas of adult MZ versus DZ twin-pairs (FIG. 2A). ciated with a significant decrease in the level of diversity However, in the present study it was not assessed whether MZ (FIG.2B and FIGS. 3C-F). This reduced diversity suggests an and DZ twin pairs had different degrees of similarities at analogy: the obese gut microbiota is not like a rainforest or earlier stages of their lives. reef, which are adapted to high energy flux and are highly 0.137 Multiplex pyrosequencing of V2/3 and V6 ampli diverse, but rather may be more like a fertilizer runoff where cons allowed higher levels of coverage of community diver a reduced diversity microbial community blooms with abnor sity compared to what was feasible using Sanger sequencing, mal energy input. TABLE 7

Phylum-level taxonomic assignments

lean obese

(8 Sel N mean sem N p-value

V2/3 (EA) % Bacteroidetes 26.76 246 26 24.39 1.89 42 O.22 % Firmicutes 71.48 2.50 26 72.57 1.92 42 O.36 % Actinobacteria O.72 0.14 26 1.7O O.S8 42 O.OS V2/3 (AA) % Bacteroidetes 37.52 3.05 8 29.41 149 62 O.O2 % Firmicutes 60.74 3.04 8 68.14 142 62 O.O3 % Actinobacteria O.97 0.40 8 1.27 O.21 62 O.26 V6 (EA) % Bacteroidetes 6.85 125 12 3.15 O.93 16 O.O1 % Firmicutes 81.72 2.41 12 75.99 460 16 O.14 % Actinobacteria 7.14 1.76 12 17.91 S.O1 16 O.O3 Full-length % Bacteroidetes 11.44 2.77 10 7.58 2.35 16 O.15 (EA) % Firmicutes 83.SO 2.28 10 84.60 3.03 16 O.39 % Actinobacteria 2.78 O.78 10 4.41 1.14 16 O.13 US 2014/O128289 A1 May 8, 2014 24

TABLE 7-continued Phylum-level taxonomic assignments lean obese

(8 Sel N mean sem N p-value

BLAST % Bacteroidetes 42.60 8.75 6 34.69 8.16 9 O.26 (EA) % Firmicutes S1.54 8.35 6 S1.2S 5.47 9 O49 % Actinobacteria 2.07 O.33 6 10.34 3.35 9 O.O2 A subset of each dataset was included in the analysis; 10,000 sequences sample (V6), 1,000 sequences/sample (V2.3) and 200 sequences sample (full-length). Sequences from the same individual across both timepoints were pooled, alues are from a Student's t-test of the obese wersus lean distribution The AA leanindividuals surveyed have significantly more Bacteroidetes andless Firmicutes than the lean EA individuals (p<0.05) BLASTX comparisons between microbiomes and NCBI non-redundant database

Example 2 0145. In the case where a phylotype was found in all samples, a similar procedure could be used to identify the Distribution of Phylotypes in Individuals maximum value of p consistent with the observed minimum abundance of the phylotype whose minimum abundance 0141 All hosts were searched for bacterial phylotypes across all samples is highest. In this case, instead of calculat presentathigh abundance using a sampling model based on a ing the fraction of samples in which the phylotype was absent, combination of standard Poisson and binomial sampling sta (i) binomial sampling could be used to randomly sample the tistics. number of observed counts of a phylotype given the paramet ric value of p and the sample size of each sample, (ii) the Phylotype Sampling Model minimum abundance across all samples could be measured, 0142. A sampling model was developed that allows place and (iii) this minimum abundance compared to the minimum ment of bounds on the maximum abundance of any phylotype abundance actually observed. Again, an analytical Solution found across all samples. The principle here is that if a given using extreme-value statistics is possible if sample sizes are phylotype made up not less than some proportion p of the equal, but the solution must be obtained by numerical meth microbiome of all humans, it is then possible to calculate (i) ods (in this case, the same type of bisection search used the number of samples of a given size expected to lack that above). The sampling model was implemented in Python phylotype due to sampling error, and (ii) the probability that using PyCogent. an actual proportion p-hat as low as the minimum abundance would be observed in any sample. Results 0143. The probability P of failing to observe a given 0146. Using this model the full-length 16S rRNA dataset microbe at proportion p in a sample of size n is given by described in Example 1 was first analyzed. The most abun Poisson statistics as simply e”. For equal sample sizes, the dant species-level phylotype in each sample made up 11% probability of observing the phylotype in at least k samples of that sample on average (range: 4.2%-22.0%), and the most using binomial sampling with Pr(Success)=(1-P) can there abundant phylotype found across the combined dataset was fore be calculated. Then, the inverse binomial can be used to found in 25 of the 27 fecal microbiotas (taxonomy ask what value of P. and therefore of p, gives a specified assignment=Bacteria; Firmicutes; Clostridia; Clostridiales: probability (say, 5%) of observing a given phylotype in as few Ruminococcus). These data are consistent with no phylotype samples as actually observed for the most abundant phylo being present at more than 1.3% abundance in all samples. type. This calculation yields an upper bound for p (i.e. the 0147 The deeper pyrosequencing data confirmed this value of p at which we can reject the idea that we would have result. In the V6 dataset, using even sampling of 10,000 seen the phylotype in as few samples as actually observed at sequences/sample, the most abundant phylotype in each the 95% confidence level). sample made up 12% of that sample on average (range: 5.0%- 0144. For unequal sizes, there is no analytical solution to 36.6%). The overall most abundant phylotype was found in the equivalent of the binomial in which Pr(success) differs for all 33 samples (Bacteria; Firmicutes; Clostridia; Clostridi each trial. Therefore, numerical optimization must be used to ales; Eubacterium rectale). However, in some samples, this solve for p. Because the function relating p and the probabil phylotype was present in frequencies as low as 0.01%. ity of observing the phylotype in at least a given number of 0.148. The sampling model allows one to ask what level of samples is monotonic, a bisection search (bounded by p=0 abundance in every individual the most abundant phylotype and p=1) can be used to find the appropriate value of p for a could have before its absence from, or limited representation desired confidence level. In practice, P was calculated for in some samples becomes Surprising. For example, with each sample, a vector of random numbers between 0 and 1 1,000 sequences/samples, it would be very Surprising if a was chosen, and the number of times the random number at a species at 50% abundance across all samples in any out of 30 given position was less than P was counted. Repeating this samples was missed, but it would not be surprising if a species procedure for a fixed number of iterations (100,000 for the at 0.00001% abundance were missed. reported values) gives Sufficiently smooth values to approxi 014.9 The sampling model (using 1000 random sequences mate the monotonic function and to allow the bisection search per sample) indicated that this minimum observed abundance to converge on the same value of p to three significant figures was consistent with a true abundance of no more than across repeated trials. 0.66%. In the V2/3 dataset, the most abundant phylotype in US 2014/O128289 A1 May 8, 2014 25 each sample made up 14.6% of that sample on average (range: dominated by a few abundant phylotypes, these groups vary 3.8%-47.1%). The overall most abundant phylotype was dramatically in their proportional representation in the present in 270 of 274 samples at this depth of coverage sampled gut communities. Also, no phylotypes were detect (Bacteria:Bacteroidetes: Bacteroidales; Bacteroidaceae). The able in all individuals sampled within this range of coverage sampling model indicated that this frequency was consistent (FIG. 7). with a true abundance of no more than 0.53%. These results were confirmed, with excellent agreement, by the V6 data: at Example 3 1,000 sequences/sample, the maximum abundance OTU is found in 32 of 33 samples, consistent with an abundance of no Taxonomic Assignments of Metagenomic Reads more than 0.66%. However, at a coverage depth of 10,000 0151. The International Human Microbiome Project has sequences/sample, this OTU is found in all 33 samples but at emphasized the importance of sequencing the genomes of a a minimum observed abundance of 0.02%, consistent with a panel of reference microbial strains. Therefore, shotgun true abundance of no more than 0.1%. Using all the V6 data pyrosequencing was used to sample the fecal microbiomes of without controlling for sampling effort, the minimum 18 individuals representing 6 of the families described in observed abundance is consistent with a true abundance of no Example 1. more than 0.07% (the estimate of the true abundance falls Pyrosequencing of total community DNA with increased sample size because it is less likely that the low 0152 Shotgun sequencing runs were performed on the frequency would be observed due to sampling error when 454 FLX pyrosequencer from total community DNA of 3 lean more total sequences contribute to the result). Thus, we con European American MZ twin-pairs and their mothers plus 3 clude, with 95% confidence, based on the even sampling used obese European American MZ twin pairs and their mothers, for the other analyses in this study (i.e., 1,000 sequences/ yielding 8,294,835 reads and 14,730 16S rRNA fragments. sample from V2/3, 10,000 sequences/sample for V6) that the Two Samples were also analyzed on a single run employing maximum abundance of any OTU across all samples cannot 454/Roche GS FLX Titanium extra long read sequencing exceed the V2/3 result of 0.53%, although the true maximum technology (Tables 8 and 9). Sequencing reads with degen abundance might be as much as an order of magnitude lower erate bases (“Ns) were removed along with all duplicate than this based on the greater depth of coverage in the V6 sequences, as sequences of identical length and content area samples. common artifact of the pyrosequencing methodology. 0150. In summary, the analysis showed that no phylotype Finally, human sequences were removed by identifying is present at more than ~0.5% abundance in all of the samples sequences homologous to the H. sapiens reference genome in this study, and that although individual microbiotas are (BLASTN e-value-10-5, % identity)75, and scored-50). TABLE 8

Microbiome sequencing statistics

16S rRNA Subject Data Twin Number Filtered gene Da ID Mom Family BMI Platform Total nt Reads Reads fragments

F1T1Le TS1 Twin 1 Lean FLX 60,016,519 254,044 217,386 439 F1 T2Le TS2 Twin 1 Lean FLX 90,271,969 514,022 443,640 512 F1MOy TS3 Mom 1 Overweight FLX 13,506,401 571,301 510,972 723 F2T1Le TS4 Twin 2 Lean FLX O7,008,761 472,154 414,754 626 F2T2Le TSS Twin 2 Lean FLX 12,835,879 553,142 490,776 928 F2MOb TS6 Mom 2 Obese FLX 35,976,476 623,027 535,763 1,039 F3T1Le TS7 Twin 3 Lean FLX 46,946,832 607,386 555,853 1,188 F3T2Le TS8 Twin 3 Lean FLX 13,177,766 468,769 414,497 976 F3MOy TS9 Mom 3 Overweight FLX 37,564,473 552,870 499.499 934 F7T1Ob1 TS19 Twin 7 Obese FLX 95,538,760 583,989 498,880 569 F7T2Ob1 TS2O Twin 7 Obese FLX O8,342,331 550,695 495,040 829 F7MOb TS21 Mom 7 Obese FLX 95.960,723 451,177 413,772 774 F1 OT1 Ob1 TS28 Twin 10 Obese Titanium 38,364,927 399,717 302,780 652 F1 OT2Ob1 TS29 Twin 10 Obese Titanium 239,971,702 672,196 502,399 1,190 F1 OMOw1 TS30 Mom 10 Overweight FLX O5,932,316 564,184. 495,865 791 F1ST1Ob1 TS49 Twin 15 Obese FLX O4,449,087 596,149 519,072 769 F1ST2Ob1 TSSO Twin 15 Obese FLX 29,037,456 642,191 549,700 1,209 F1SMOb1 TSS1 Mon 15 Obese FLX O1,531,105 557,165 434,187 582

SUM 2,136,433,483 9,634,178 8,294,835 14,730

ID nomenclature: Family Number, Twin number or mom, and BMIcategory (Le = lean, Ov = overweight, Ob = Obese; e.g. F1 T1Le Stands for family 1, twin 1, lean) Sequences used after removing low quality, duplicate, and human sequences 16S rRNA gene fragments identified in microbiome sequencing reads US 2014/O128289 A1 May 8, 2014 26

TABLE 9

Microbiome BLAST statistics

Mean Subject Data Raw Reads % Sequences Nucleotides Read- % % % % % % Da ID Reads Used Used Used length Hsa RDP KEGG STRING NR Gut F1T1Le1 TS1 254,044 217,386 85.6 51,708,794 237.9 O42 (0.21 29.1 34.5 S4.9 S7.9 F1 T2Le1 TS2 514,022 443,640 86.3 78,853,892 177.7 0.08 O.12 20.3 28.7 46.9 51.7 F1MOy TS3 571,301 510,972 89.4 O2,717.417 2O1.O. O.16 0.15 23.8 33.6 56.5 61.2 F2T1Le1 TS4 472,154 414,754 87.8 95,003,113 229.1 0.14 0.15 26.2 445 72.3 74.9 F2T2Le1 TSS 553,142 490,776 88.7 O0,599,979 2OS.O. O.22 O.19 23.0 27.8 54.1 6.2.1 F2MOb TS6 623,027 535,763 86.O 18,207,161 220.6 0.62 0.20 26.9 37.2 S8.9 62.1 F3T1Le1 TS7 607,386 555,853 91.5 34,889,015 242.7 O.13 0.22 26.9 34.0 58.4 61.7 F3T2Le1 TS8 468,769 414,497 88.4 O0,520,072 242.5 O.2O O.24 28.5 35.7 61.1 64.4 F3MOy TS9 552,870 499,499 90.3 24,768,172 249.8 O. 14 O.19 26.8 36.6 63.2 66.3 F7T1Ob1 TS19 583,989 498,880 85.4 82,117,565 1646 (O.O6 0.12 19.1 30.6 52.9 57.1 F7T2Ob1 TS2O 550,695 495,040 89.9 98,053,098 198.1 O.32 0.17 22.3 29.3 47.2 49.9 F7MOb TS21 451,177 413,772 91.7 88,786,017 2146 (O.O9 O.19 25.5 37.6 62.8 66.3 F1 OT1 Ob1 TS28 399,717 302,780 75.7 O1434,082 33S.O 0.06 0.36 24.5 28.4 53.2 55.5 F1 OT2Ob1 TS29 672,196 502,399 74.7 73,386,030 345.1 O. 11 O.29 27.5 34.8 63.2 63.9 F1 OMOw1 TS30 564,184. 495,865 87.9 94.405,318 1904 O.21 O16 22.4 32.O 54.7 60.7 F1ST1Ob1 TS49 596,149 519,072 87.1 91.987,878 177.2 0.29 O.15 18.6 23.0 43.7 46.4 F15T2Ob1 TS50 642,191 549,700 85.6 11,999,603 2O3.7 0.24 O.22 24.6 29.4 51.9 57.9 F15MOb1 TS51 557,165 434,187 77.9 81,330,211 1873 0.40 0.14 21.0 26.3 44.2 43.9 Average 535,232 460,824 86.1 O1,709,301 223.5 0.22 O.19 24.3 32.5 SS6 S9.1 Sum 9,634,178 8,294,835 1,830,767417 Key: % sequences used = percentage of sequences remaining after removing low quality, duplicate, and human sequences; Hsa = reads matching the H. sapiens genome;% RDP = percentage ofreads matching the RDP16S rRNA database;%KEGG,96 STRING, 96 NR = percentage of reads that were assignable to entries in these various databases; % Gut = percentage of reads assigned to the database of 42 reference genomes

Database Searches and Metabolic Reconstructions biosum were also used for functional clustering and diversity analyses (http://genome.wustl.edu/pub/). Coverage plots 0153. The distributions of taxa, genes, orthologs, meta (percent identity plots) were generated using nucmer and bolic pathways, and high-level gene categories were tallied mummerplot (part of the MUMmer v3.19 package), and based on the corresponding annotation of the best-BLAST-hit default parameters. sequence found in each reference database. For KEGG analy 0154 Annotations were validated with simulated datasets sis, the closest matching gene with an annotation was used, (FIG. 8). To do so, the frequency of annotated genes from the since many genes in the database remain unannotated, includ KEGG database (v44) was first tallied across the aggregate ing all KEGG Orthologous groups (KOS) assigned to genes human gut microbiomes (n=18 datasets). The 1,000 most with an identical e-value (commands -e 0.00001 -m 9-b 100 frequent microbial genes were then used to generate simu were used to run NCBI BLASTX). Custom Perl scripts were lated reads between 50 and 500 nt long. The simulated reads used for all KEGG, STRING, and NCBI NR analyses. were subsequently annotated (BLASTX against the KEGG Selected genes from recently sequenced reference genomes database), with self-hits excluded. This analysis revealed a were manually annotated using NCBI-BLASTP searches low rate of false positives (i.e. high precision), but using very against the KEGG, STRING, and NR database. The 42 ref short sequences (e.g. 50-100 nt) increased the rate of false erence genome database includes predicted proteins from negatives (lower sensitivity) (FIG. 8). Given the increased draft or complete assemblies of Alistipes putredinis, read-length relative 454 GS20 pyrosequencing data, simu Bacteroides WH2. Bacteroides thetaiotaomicron 3731, lated reads with an average length comparable to our data Bacteroides thetaiotaomicron 7330, Bacteroides thetaio (200-250 nt), demonstrated robust assignments with an taOmicron 5482, Bacteroides fragilis, Bacteroides caccae, e-value-10, 9% identity)-50, and/or bit-score>50. Using all Bacteroides distasonis, Bacteroides ovatus, Bacteroides Ster three cutoffs, sequences 200 nt in length returned 81.5% of coris, Bacteroides uniformis, Bacteroides vulgatus, Para the correct assignments, with a precision of 0.93 and sensi bacteroides merdae, Anaerostipes caccae, Anaerotruncus tivity of 0.88, similar to what was observed by re-annotating colihominis, Anaerofustis Stercorihominis, Bacteroides cap the original full-length gene sequences after ignoring self illosus, Clostridium bartlettii, Clostridium bolteae, hits. The KEGG cutoff criteria were also applied to BLASTX Clostridium eutactus, Clostridium leptum, Clostridium analysis results for STRING-based predictions, given the ramosum, Clostridium scindens, Clostridium sp. L2-50, similar size of the databases. Clostridium spiroforme, Dorea longicatena, Eubacterium 0155 ABI 3730x1 capillary sequencing reads from 9 pre dolichum, Eubacterium eligens, Eubacterium rectale, Eubac viously published adult human gut microbiomes were terium Siraeum, Eubacterium ventriosum, Faecalibacterium obtained from the NCBI Trace.Archive. The full dataset from prausnitzii M212, Peptostreptococcus micros, Ruminococcus each sample was annotated by BLASTX comparisons against gnavus, Ruminococcus Obeum, Ruminococcus torques, Col the KEGG and STRING database (see above; BLASTX linsella aerofaciens, Bifidobacterium adolescentis, Bifido e-value-10.9% identity>50, and scored 50). To allow quan bacterium longum, Escherichia coli K12, Methanobrevi titative comparisons between these datasets and pyrose bacter Smithii, and Methanobrevibacter stadtmanae (see quencing data, all forward sequencing reads was first http://genome.wustl.edu/pub? and NCBI GenBank). Draft extracted and then one simulated pyrosequencer read from assemblies of Clostridium sp. SS2-1 and Clostridium sym each longer capillary read was generated. Nucleotides span US 2014/O128289 A1 May 8, 2014 27 ning positions 100 to 322 were used from all capillary reads of 1% of the reads assigned to Firmicutes (p<10, Mann Suitable length, to avoid low quality regions that commonly Whitney: FIGS. 10 and 11). This observation underscores the occur at the beginning and end of the reads. These simulated high level of phylogenetic and genomic diversity within the reads were then annotated as described above. gut-associated Firmicutes, indicates that the readily cultur 015.6 16S rRNA gene fragments were identified in each able sequenced gut Firmicutes are not closely related to the microbiome through BLASTN searches of the RDP database abundant gut genomes present in the 18 gut microbiomes, and (version 9.33; e-value-10; Bit-score>50; % identity>50; Suggests that future reference microbial genome sequencing alignment length-00). Putative 16S rRNA gene fragments efforts should be directed towards representatives of this were then aligned using the NAST multi-aligner with a mini dominant phylum. mum template length of 100 bases and minimum '% identity 0160 The effect of technical advances that produce longer of 75%. Taxonomy was assessed after insertion into an ARB reads on improving these assignments was also tested by neighbor-joining tree. sequencing fecal community samples from one twin pair 0157 Microbiomes were clustered based on their profiles using next-generation Titanium pyrosequencing methods after normalizing across all sampled communities (Z-score), average read length of 341+134 nt (SD) versus 208+68 for using the Pearson’s correlation distance metric, followed by the standard FLX platform. FIG. 12 shows that the frequency single-linkage hierarchical clustering in addition to Principal and quality of sequence assignments is improved as read Components Analysis (Cluster3.0). Results were visualized length increases from 200 to 350 nt. using the Treeview Java applet. Functional diversity (Shan 0.161 FIG. 13 Summarizes the relative abundance of the non index and evenness) was calculated using the number of major bacterial phyla present in these 18 microbiomes, as assignements in each microbiome to each of the 254 path defined by six different approaches (sequencing full-length, ways present in the KEGG database (EstimateS 8.0). The V2/3 and V6 amplicons; BLAST comparisons of shotgun maximum possible index is the natural log of the total number pyrosequencer reads with the NCBI non-redundant and the of pathways: In (254) or 5.54. Shannon evenness was calcu custom 42 gut genome databases, plus analysis of 16S rRNA lated by dividing the Shannon index for a given microbiome gene fragments). Pairwise comparisons of relative abundance by the maximum possible index (scale of 0 to 1, with 1 data from 16S rRNA gene fragments generated from shotgun representing a microbiome with all pathways found at an sequencing reads correlate most closely with V2/3 PCR data equal abundance). Results were compared to simulated (FIG. 13 and Table 7). metagenomic reads generated from 36 recently sequenced reference human gut-derived Bacteroidetes and Firmicutes Example 4 genomes (http://genome.wustl.edu/pub/organism/). Reads were produced by Readsim v0.10, using the following In Silico Functional Analysis of Gut Microbiomes options: -n 10000-modilr normal -meanlir 223-stdlr 0.3. The mean and standard deviation for length of the simulated reads 0162 The filtered sequences obtained in Example 3 from was based on the observed read-length distribution of the 18 the 18 microbiomes were used to conduct a functional analy fecal microbiome datasets (Table 9). sis of gut microbiomes. Results CAZyme Analysis 0158. One fundamental parameter that governs the utility 0163 Metagenomic sequence reads described in Example of reference genomes is the ability to accurately assign frag 3 were searched against a library of modules derived from all mentary reads from metagenomic datasets to these genomes. entries in the Carbohydrate-Active enZymes (CAZy) data Therefore, the filtered pyrosequencing reads from the fecal base (www.cazy.org using FASTY, e-value(10). This microbiomes of 18 individuals from the 6 different families library consists of ~180,000 previously annotated modules described in Example 1 (3 lean twin-pairs and their mothers: (catalytic modules, carbohydrate binding modules (CBMs) 3 obese twin pairs and their mothers; Tables 1 and 2) were and other non-catalytic modules or domains of unknown compared to a custom database of 42 human gut associated function) derived from 80,000 protein sequences. The num bacterial and archaeal genomes (FIG. 7) using BLASTX, and ber of sequencing reads matching each CAZy family was validated these assignments independently against NCBI's divided by the number of total sequences assigned to non-redundant protein database. The relative abundance of CAZymes and multiplied by 100 to calculate a relative abun sequences from the 18 individual microbiome datasets dance. An R value was calculated for each pair of CAZy assigned to each reference genome was highly variable (see profiles. The distribution of glycoside hydrolase similarity FIG. 9; R=0.26+0.02 for all pairwise comparisons of taxo scores was then compared to the distribution of glycosyltrans nomic profiles), consistent with the considerable heterogene ferase similarity scores. ity in microbial community structure among the fecal micro biomes observed from sequencing 16S rRNA gene Statistical Analyses amplicons. 0164 Xipe (version 2.4) was employed for bootstrap 0159. The custom database of 42 reference genomes analyses of pathway enrichment and depletion, using the included 23 Firmicutes but only 13 Bacteroidetes. Since the parameters sample size=10,000 and confidence level=0.95. Firmicutes dominate the gut microbiotas of subjects (FIG. 6) Linear regressions were performed in Excel (version 11.0, and the reference genome database, it might be expected that Microsoft). Mann-Whitney and Student's t-tests were uti reads assigned to Firmicutes would match the reference lized to identify statistically significant differences between genomes more closely than reads assigned to Bacteroidetes. two groups (Prism v4.0, Graph Pad; Excel version 11.0, The opposite was true: on average, 46.3+2.6% of the pyrose Microsoft). The Bonferronicorrection was used to correct for quencing reads assigned to Bacteroidetes matched the refer multiple hypotheses. The Mantel test was used to compare ence genomes at 100% identity, as compared to only 16.7+1. distance matrices: the matrix of each pairwise comparison of US 2014/O128289 A1 May 8, 2014 28 the abundance of each reference genome, and the abundance including 77 glycoside hydrolase, 21 carbohydrate-binding of each metabolic pathway, were compared (Mantel program module, 35 glycosyltransferase, 12 polysaccharide lyase, and in Python using PyCogent; 10,000 replicates). Data are rep 11 carbohydrate-esterase families (Table 10A and B). On resented as meantSEM unless otherwise indicated. average 2.62+0.13% of the gut microbiome could be assigned 0.165 Odds ratios were used to identify commonly-en to CAZymes (a total of 217,615 sequences), a percentage that riched genes in the gut microbiome. In short, all gut micro is greater than the most abundant KEGG pathway in the gut biome sequences were compared against the custom database microbiome (Transporters; 1.20+0.06%), and indicative of of 42 gut genomes (BLASTXe-value-10, bitscore>50, and the abundant and diverse set of microbial genes in the distal % identity. 50). A gene by sample matrix was then screened gut microbiome directed towards accessing a wide range of to identify genes commonly-enriched in either the obese or polysaccharides. lean gut microbiome (defined by an odds ratio greater than 2 0.167 Category-based clustering of the functions from or less than 0.5 when comparing the pooled obese twin micro each microbiome was performed using Principal Compo biomes to the pooled lean twin microbiomes and when com nents Analysis (PCA) and hierarchical clustering. This analy paring each individual obese twin microbiome to the aggre sis revealed two distinct clusters of gut microbiomes based on gate lean twin microbiome, or vice versa). The statistical metabolic profile, corresponding to samples with an significance of enriched or depleted genes was then calcu increased abundance of Firmicutes and Actinobacteria, and lated using a modified t-test (q-values 0.05; calculated with samples with a high abundance of Bacteroidetes (FIG. 14A). code kindly supplied by Mihai Pop and J. R. White, Univer A linear regression of the first principal component (PC1, sity of Maryland). To search for genes that were consistently explaining 20% of the functional variance) and the relative enriched or depleted in all six MZ twin-pairs, a gene-by abundance of the Bacteroidetes showed a highly significant sample matrix was generated based on BLASTX compari correlation (R=0.96, p<10-12: FIG. 14B). Functional pro Sons of each microbiome with our custom 42-genome data files stabilized within each individual’s microbiome after base, and an odds ratio was calculated by directly comparing 20,000 sequences had been accumulated (FIG. 15). Family the frequency of each gene in each twin versus the respective members had more similar functional profiles than unrelated co-twin. The analysis revealed only 49 genes (odds ratio>2 or individuals (FIG.14C), Suggesting that shared bacterial com <0.5): they representa variety of taxonomic groups, including munity structure (who’s there based on 16S rRNA analyses) Firmicutes, Bacteroidetes, and Actinobacteria and did not also translates into shared community-wide relative abun show any clear functional trends. dance of metabolic pathways. Accordingly, a direct compari Results son of functional and taxonomic similarity disclosed a sig nificant association: individuals that share similar taxonomic 0166 Sequences matching 156 total CAZyme families profiles also share similar metabolic profiles (p<0.001; Man were found within at least one human gut microbiome, tel test). TABLE 10A Relative abundance of CAZymes across 9 gut microbiomes (% of sequence assignments across all identified CAZymes) Subject ID Glycoside hydrolases 70.56 73.96 72.14 72.40 68.38 67.37 68.69 67.84 69.92 GH13 8.96 6.31 6.37 3.97 10.78 8.04 8.63 9.97 8.02 GH2 740 7.10 7.01 6.51 S.13 S.49 5.81 6.02 5.94 GH43 3.48 5.78 S.63 6.61 4.39 4.69 5.05 4.14 5.75 GH92 3.44 6.25 S.OO 7.70 3.25 547 3.28 2.65 4...SO GH3 5.72 5.37 4.31 4.47 3.20 3.94 4.03 4.70 4.09 GH97 1.97 5.45 4.01 4.67 1.18 3.38 3.51 2.23 3.91 GH31 2.98 2.48 2.53 2.41 3.84 2.11 2.16 3.04 2.13 GH2O 240 2.30 2.35 3.34 1.93 2.93 1.99 1.92 2.19 GH29 1.99 1.51 2.12 2.54 2.94 2.52 2.53 2.19 1.83 GHT7 2.13 1.39 1.43 O.86 2.18 2.18 2.18 2.45 1.99 GH28 1.58 2.44 3.71 3.07 1.46 2.24 2.25 1.79 2.OO GHS1 1.18 1.51 1.38 1.44 2.12 S8 1.73 1.68 1.31 GH36 1.62 1.12 1.19 O.99 18O 23 1.64 2.02 1.37 GH1 1.51 O.87 1.02 O.34 2.90 O8 1...SO 1...SO 1.67 GHS 1.95 2.41 1.75 1.53 1.07 O.98 2.62 1.45 1.95 GH42 O.91 O.49 O.83 O.90 2.43 O.62 1.09 1.10 1.03 GH1 OS 1.56 1.65 2.07 2.07 1.01 38 1.46 1.27 1.83 GHY95 1.56 1.18 1.36 1.24 O.91 .21 122 1.04 O.99 GH32 O.91 O.61 O.70 0.75 2.12 18 1.OS O.91 O.84 GH78 1.91 1.09 122 1.61 O.60 O.70 1.OS O.89 1.25 Glycosyltransferases 2O2S 17.20 17.49 16.26 23.34 21.64 22.09 22.78 1966 GT2 S.66 6.26 6.31 5.58 7.68 7.91 7.14 7.48 7.39 GT4 3.55 3.76 3.96 4.44 4.93 4.43 4.64 4.60 4.2O GT35 4.75 2.47 2.07 1.62 4.75 2.85 3.58 3.91 2.90 GT28 1.51 O.85 O.89 O.S3 1.51 OO 1.34 148 1.OO GT5 1.74 0.77 0.79 O.33 1.72 O.81 1.38 1.62 1.15 GT51 O.77 O.78 0.75 O.74 O.99 O8 O.92 1.17 O.80 Carbohydrate binding 1.76 2.40 2.15 2.02 2.05 2.22 2.38 2.25 2.11 molecules US 2014/O128289 A1 May 8, 2014 29

TABLE 10A-continued Relative abundance of CAZymes across 9 gut microbiomes (% of sequence assignments across all identified CAZymes) Subject ID

Carbohydrate esterases 5.89 4.70 5.45 5.53 S.OO 5.81 5.64 5.36 6.04 CE4 1.53 1.01 1.03 O.78 1.41 1.04 1.16 1.27 1.2O Polysaccharide lyases 1.55 1.74 2.77 3.79 122 2.9S 1.2O 1.78 2.27

Groups found at an average relative abundance 1% are shown ID nomenclature: Family number, Twin number or mother and BMIcategory (Le= lean, Ov= overweight, Ob= obese e.g. F1 T1Lestands for family 1 twin 1 lean)

TABLE 1 OB Relative abundance of CAZymes across 9 gut microbiomes (% of sequence assignments across all identified CAZymes) Subject ID Glycoside hydrolases 73.46 70.45 71.57 6419 69.11 69.96 68.15 69.61 71.50 GH13 4.68 8.36 6.37 11.17 11.80 7.05 12.34 16.84 1119 GH2 6.43 6.53 6.53 5.52 S.40 5.93 S.69 5.64 6.21 GH43 5.8O 6.49 S.OO 4.34 6.57 S.O4 5.05 5.59 4.56 GH92 7.66 4.36 6.72 .71 73 5.70 1.93 O60 3.59 GH3 3.46 3.77 4.27 3.89 5.07 3.75 3.75 4.29 3.41 GH97 4.06 3.95 3.62 O.96 25 3.96 122 O.28 1.87 GH31 2.67 2.06 2.49 2.86 3.37 2.52 2.81 3.99 2.79 GH2O 3.33 2.45 3.32 O9 17 3.12 1.66 O.92 3.18 GH29 3.93 1.53 3.31 8O 47 2.59 1.51 O.93 1.81 GHT7 32 1.95 1.49 2.87 2.95 1.62 2.64 3.47 2.04 GH28 2.63 1.99 2.49 .64 .01 2.31 1.44 O.54 1.11 GHS1 73 2.29 1.51 8O 2.74 140 1.71 2.34 1.60 GH36 .24 1.79 1.39 52 92 1.28 2.20 2.63 2.37 GH1 0.72 0.79 O.71 2.01 2.50 1.3S 3.74 2.29 2.25 GHS 37 2.56 1.30 29 37 O.90 O.84 122 O.95 GH42 O.94 0.44 O.98 8O 2.82 O.93 2.26 3.87 2.06 GH1 OS 77 O.83 1.63 O.95 O.SO 1.65 O.98 O.39 O.83 GHY95 .33 1.90 1.12 O.68 0.75 1.3S 1.01 O48 1.44 GH32 O.99 1.15 O.82 1S 52 O.99 1.47 2.04 1.OO GH78 43 1.45 O.98 O3 39 O.8O O.90 O.S8 1.21 Glycosyltransferases 16.68 20.34 1824 26.36 23.15 1953 23.54 23.99 21.50 GT2 6.19 6.8O 6.97 941 9.8O 6.74 7.98 7.14 6.78 GT4 4.17 3.99 4.08 S.62 4.43 4...SO 4.42 4.18 4.80 GT35 81 2.76 2.13 4...SO 3.78 2.59 4.42 5.25 3.66 GT28 O.S8 O.94 O.83 1.31 1.00 1.01 148 2.12 1.33 GT5 O46 O.83 O.65 1.54 1.24 O.96 1.74 1.90 O.96 GT51 O.68 1.06 0.72 1.82 1.27 O.88 1.06 1.63 1.02 Carbohydrate binding 90 2.06 2.15 2.66 2.88. 2.08 2.22 2.28 1.98 molecules Carbohydrate esterases 5.19 5.19 S.O2 5.24 3.94 6.O1 4.68 3.84 4.15 CE4 0.73 O.84 O.92 1.35 O.96 1.04 1.31 1.51 O.91 Polysaccharide lyases 2.78 1.95 3.02 1.55 O.93 2.43 1.43 O.28 O.87 Groups found at an average relative abundance 1% are shown ID nomenclature: Family number, Twin number or mother and BMIcategory (Le= lean, Ov= overweight, Ob= obese e.g. F1 T1Lestands for family 1 twin 1 lean)

Example 5 dance of metabolic pathways in the Firmicutes and Bacteroidetes, disclosed 26 pathways with a significantly Different Functions for Bacteroides and Firmicutes different relative abundance (FIG. 16A). The Bacteroidetes were enriched for a number of carbohydrate metabolism 0168 Functional clustering of phylum-wide sequence pathways, while the Firmicutes were enriched for transport bins representing reads from the Firmicutes or the systems. The finding is consistent with information gleaned Bacteroidetes showed discrete clustering by phylum (FIG. from a number of sequenced Bacteroidetes genomes that 16A). A direct comparison of the Firmicutes and demonstrate expansive families of genes involved in carbo Bacteroidetes sequence bins to simulated reads generated hydrate metabolism, as well as the CAZyme analysis in from 36 reference Bacteroides and Firmicute genomes rep Example 3, which revealed a significantly higher relative resented in the 42 member custom database described in abundance of glycoside hydrolases, carbohydrate-binding Example 3, revealed that the metabolic profile of each micro modules, glycosyltransferases, polysaccharide lyases, and biome was similar to the average metabolic profile of each carbohydrate esterases in the Bacteroidetes sequence bins phylum (FIG. 17). Bootstrap analyses of the relative abun (FIG. 16B). US 2014/O128289 A1 May 8, 2014 30

Example 6 TABLE 11-continued Relative abundance of metabolic pathways in the gut microbiome Identifying a Core Human Gut Microbiome % of KEGG assignments)

0169. One of the major goals of the international human Meant Sem across microbiome project is to determine whether there is an iden KEGG Metabolic Pathway all 18 microbiomes tifiable core microbiome of shared organisms, genes, or RNA polymerase 81 O.O2 functional capabilities found in a given body habitat of all or Reductive carboxylate cycle (CO2 fixation) 80 O.O3 the vast majority of humans. Although all of the 18 gut micro Propanoate metabolism 80 O.O biomes surveyed showed a high level of beta-diversity with Peptidoglycan biosynthesis 79 O.O respect to the relative abundance of bacterial phyla (FIG. N-Glycan degradation .78 O.OS Urea cycle and metabolism of amino groups 78 O.O 18A), analysis of the relative abundance of broad functional Translation factors 78 O.O2 categories of genes (COG) and metabolic pathways (KEGG) Selenoamino acid metabolism 77 O.O2 revealed a generally consistent pattern regardless of the Glyoxylate and dicarboxylate metabolism 73 O.O sample surveyed (FIG. 18B and Table 11): the pattern is also DNA polymerase 72 O.O Pentose and glucuronate interconversions 70 O.O2 consistent with results obtained from a meta-analysis of pre Cysteine metabolism 68 O.O2 viously published gut microbiome datasets from 9 adult indi Pantothenate and CoA biosynthesis 67 O.O viduals (FIG. 19). This consistency was not simply due to the Nucleotide Sugars metabolism 67 O.O2 broad level of these annotations, as a similar analysis of Glycosaminoglycan degradation 66 0.04 Function unknown 66 O.O Bacteroidetes and Firmicutes reference genomes revealed One carbon pool by folate 65 O.O Substantial variation in the relative abundance of each cat Sphingolipid metabolism 64 O.O3 egory (FIG. 20). Furthermore, pair-wise comparisons of Protein export 62 O.O metabolic profiles revealed an average R of 0.97+0.0023 (FIG. 14A), indicating a high level of functional similarity Pathways with an average relative abundance of >0.6% are shown between adult human gut microbiomes. 0170 Overall functional diversity was compared using the Shannon index, a measurement that combines diversity (the TABLE 11 number of different types of metabolic pathways) and even Relative abundance of metabolic pathways in the gut microbiome ness (the relative abundance of each pathway). The human gut % of KEGG assignments) microbiomes Surveyed had a stable and high Shannon index Mean it sem across value (4.63+0.01), close to the maximum possible level of KEGG Metabolic Pathway all 18 microbiomes functional diversity (5.54: See Example 4). Despite the pres ence of a small number of abundant metabolic pathways Transporters 4.93 + O.21 (listed in Table 11), the overall functional profile of each gut Other replication, recombination and repair proteins 3.35 - 0.04 ABC transporters 3.24. O.13 microbiome is quite even (Shannon evenness of 0.84+0.001 General function prediction only 2.60 OO6 on a scale of 0 to 1), demonstrating that most metabolic Purine metabolism 2.29 O.O2 pathways are found at a similar level of abundance. Interest Other enzymes 2.16 O.O3 ingly, the level of functional diversity in each microbiome Aminoacyl-tRNA biosynthesis 2.14 - 0.05 Glutamate metabolism 98 O.O3 was significantly linked to the relative abundance of the Starch and Sucrose metabolism 92 O.O3 Bacteroidetes (R=0.81, p<10); microbiomes enriched for Pyruvate metabolism 73 O.O2 Firmicutes/Actinobacteria had a decreased level of functional Pyrimidine metabolism 70 O.O2 diversity. This observation is consistent with an analysis of Peptidases 69 O.OS Alanine and aspartate metabolism 58 O.O2 simulated metagenomic reads generated from each of 36 Glycine, serine and threonine metabolism 53 O.O2 Bacteroidetes and Firmicutes genomes (FIG. 21): on average, Other translation proteins 37 - O.O2 the Bacteroidetes genomes have a significantly higher level of Galactose metabolism 37 - O.O3 both functional diversity and evenness (Mann-Whitney, p<0. Glycolysis. Gluconeogenesis 35 O.O2 Other ion-coupled transporters 34 0.06 01). Fructose and mannose metabolism 31 O.O3 (0171 At a finer level, 26-53% of enzyme-level func Two-component system 31 O.O3 Ribosome 27 O.O3 tional groups were shared across all 18 microbiomes, while Replication complex 18 O.O2 8-22% of the groups were unique to a single microbiome Phenylalanine; tyrosine and tryptophan biosynthesis 17 O.O2 (FIGS. 22A-C). The core functional groups present in all Valine, leucine and isoleucine biosynthesis 15 O.O2 microbiomes were also highly abundant, representing Carbon fixation 15 O.O1 93-98% of the sequences found in the gut (fecal) microbiome. Nitrogen metabolism 13 O.O2 Glycerolipid metabolism O7 O.O2 Given the higher relative abundance of these core groups, Oxidative phosphorylation O7 O.O3 >95% were found after 26.11+2.02 Mb of sequence was Butanoate metabolism OS O.O2 collected from a given microbiome, whereas the variable Chaperones and folding catalysts 99 O.O1 Pentose phosphate pathway 95 O.O1 groups continue to increase Substantially with each additional Tyrosine metabolism 95 O.O2 Mb sequence. Of course, any estimate of the total size of the Histidine metabolism 92 O.O2 core microbiome will be dependent upon sequencing effort, Cell division 91 O.O1 especially for functional groups foundata low abundance. On Aminosugars metabolism 89 O.O3 average, this survey achieved greater than 450,000 sequences Arginine and proline metabolism 85 O.O1 Citrate cycle (TCA cycle) 84 O.O2 per fecal sample, which, assuming an even distribution, Methlionine metabolism 83 O.O2 would allow us to sample groups found at a relative abun Lysine biosynthesis 82, O.O1 dance of 10. In order to estimate the total size of the core microbiome based on the 18 sampled individuals, each US 2014/O128289 A1 May 8, 2014

microbiome was randomly sub-sampled in 1,000 sequence mouse model of diet-induced obesity, the obese human gut intervals (FIG.22D). Based on this analysis, the core micro microbiome was enriched for phosphotransferase systems biome is approaching a total of 2,142 total orthologous involved in microbial processing of carbohydrates (Table 12). groups (one site binding hyperbola curve fit to the resulting To identify specific genes that were significantly associated rarefaction curve, R=0.9966), indicating that 93% of func with obesity, all gut microbiome sequences were compared tional groups (defined by STRING) found within the core against the custom database of 42 gut genomes described in microbiome, were already identified. Of these core groups, example 3. A gene-by-sample matrix was then screened to 64% (KEGG) and 56% (STRING) were also found in 9 identify genes commonly-enriched in either the obese or previously published but much lower coverage datasets gen lean gut microbiome (defined by an odds ratio>2 or <0.5 erated by capillary sequencing of adult fecal DNA (average of when comparing all obese twin microbiomes to the aggregate 78.413+2,044 bidirectional reads/sample). lean twin microbiome or vice versa). The analysis yielded 383 0172 Metabolic reconstructions of the core microbiome genes that were significantly different between the obese and revealed significant enrichment for a number of expected lean gut microbiome (q-values 0.05; 273 enriched and 110 functional categories, including those involved in transcrip depleted in the obese microbiome; see Tables 13 and 14). By tion, translation, and amino acid metabolism (FIG. 23). Meta contrast, only 49 genes were consistently enriched or bolic profile-based clustering indicated that the representa depleted between all twin-pairs. tion of core functional groups was highly consistent across 0.175. These obesity-associated genes were representative samples (FIG. 24), and includes a number of pathways likely of the taxonomic differences described above: 75% of the important for life in the gut, such as those for carbohydrate obesity-enriched genes were from Actinobacteria (vs. 0% of and amino acid metabolism (e.g. fructose/mannose metabo lean-enriched genes; the other 25% are from Firmicutes) lism, aminosugars metabolism, and N-Glycan degradation). while 42% of the lean-enriched genes were from Variably represented pathways and categories include cell Bacteroidetes (vs. 0% of the obesity-enriched genes). Their motility (only a Subset of Firmicutes produce flagella), secre functional annotation indicated that many are involved in tion systems, and membrane transport Such as phosphotrans carbohydrate, lipid, and amino acid metabolism (Tables ferase systems involved in the import of nutrients, including 13-14). Together, they comprise an initial set of microbial sugars (FIGS. 23 and 24). biomarkers of the obese gut microbiome. 0173 CAZyme profiles of glycoside hydrolases and gly cosyltransferases were compared by calculating the R value TABLE 12 between each pair of microbiomes (see Table 10 for families with a relative abundance >1%). This analysis revealed that Pathways enriched or depleted in obese gut microbiomes' all individuals have a similar profile of glycosyltransferases Enriched Fatty acid biosynthesis (mean R=0.96+0.003), while the profiles of glycoside Nicotinate and nicotinamide metabolism Other ion-coupled transporters hydrolases were significantly more variable, even between Pentose and glucuronate interconversions family members (mean R=0.80+0.01: p.<10-30, paired Stu Phosphotransferase system (PTS) dent's t-test). This suggests that the number and spectrum of Protein folding and associated processing glycoside hydrolases is probably affected by external factors Signal transduction mechanisms Transcription factors Such as diet more than the glycosyltransferases. Depleted Bacterial chemotaxis Bacterial motility proteins Example 7 Benzoate degradation via CoA ligation Butanoate metabolism Obesity Associated Pathways Citrate cycle (TCA cycle) Glycosaminoglycan degradation 0.174. To identify metabolic pathways associated with Other enzymes Oxidative phosphorylation obesity, only non-core associated (variable) functional Pyruvate/Oxoglutarate oxidoreductases groups were included in a comparison of the gut microbiomes Starch and Sucrose metabolism of lean and obese twin pairs. A bootstrap analysis was used to Tryptophan metabolism identify metabolic pathways that were enriched or depleted in the variable obese gut microbiome. For example, similar to a TABLE 13 Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 1 Bifidobacterium adolescentis 15448.6403 tRNA-ribosyltransferase COGO343 J KOO773 2 Bifidobacterium longum 23465114 Transcriptional regulators COG1609 K 3 Bifidobacterium longum 23466 186 ABC-type Sugar transport system, COG1653 G periplasmic component 4 Bifidobacterium adolescentis 154488903 Superfamily I DNA and RNA COG3973 R helicases 5 Bifidobacterium adolescentis 154486,727 DNA polymerase IV COGO389 L KO2346 6 Bifidobacterium adolescentis 154488.882 peptide?nickel transport system ATP- COG1123 R KO2O31.2 binding protein 7 Bifidobacterium adolescentis 154488.633 Trk-type K+ transport systems COGO168 P 8 Bifidobacterium adolescentis 154488131 Asp-tRNAASn/Glu-tRNAGln COGOO64 J KO2434 amidotransferase B subunit US 2014/O128289 A1 May 8, 2014 32

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 9 Bifidobacterium adolescentis 154487571 Threonine dehydratase COG1171 E KO1754 10 Bifidobacterium adolescentis 154486641 Glucose-6-phosphate isomerase COGO166 G KO1810 11 Bifidobacterium adolescentis 154488790 ATP-dependent helicase Lhr and Lhr- COG12O1 R KO3724 ike helicase 12 Bifidobacterium adolescentis 119025482 Predicted ATPase involved in cell COG2884 D KO9812 division 13 Bifidobacterium adolescentis 154486531 Predicted phosphohydrolases COG1409 R 14 Bifidobacterium adolescentis 154486606 RNA-(guanine-N1)-methyltransferase COGO336 J KOOSS4 15 Bifidobacterium adolescentis 154486895 MP dehydrogenase/GMP reductase COGOS16.7 FR KOOO88 16 Bifidobacterium adolescentis 154486720 Aspartate?tyrosine? aromatic COGO436 E KOO812 aminotransferase 17 Bifidobacterium adolescentis 119026599 Cation transport ATPase COGO474 P KO1529 18 Bifidobacterium adolescentis 15448.6334 hypothetical protein 19 Bifidobacterium adolescentis 1190257.43 NAD/NADP transhydrogenase alpha COG3288 C KOO324 Subunit 2O Bifidobacterium longum 233366.17 UspA and related nucleotide-binding COGOS89 T proteins 21 Bifidobacterium adolescentis 154486937 ABC-type Sugar transport system COG1653 G KO2O27 22 Bifidobacterium longum 23465912 hypothetical protein 23 Bifidobacterium longum 23335963 K+ transporter COG3158 P KO3S49 24 Bifidobacterium adolescentis 119025729 ABC-type transport system, Fe–S COGO719 O cluster assembly 25 Bifidobacterium adolescentis 154487396 Glutamine synthetase COG1391 OT KOO982 adenylyltransferase 26 Bifidobacterium adolescentis 154488156 hypothetical protein 27 Bifidobacterium adolescentis 154486668 Acetylpropionyl-CoA carboxylase COG4770 I KO1946 28 Bifidobacterium adolescentis 154487299 Nuclease subunit of the excinuclease COGO322 L KO3703 complex 29 Bifidobacterium longum 23465540 Acetate kinase COGO282 C KOO925 30 Clostridium bartlettii. 164687.465 putative conjugative transposon NOG13238 protein 31 Bifidobacterium longum 23465,037 Dipeptidase COG4690 E KO8659 32 Bifidobacterium adolescentis 154488210 Predicted hydrolase of the metallo- COGO595 R KO7021 beta-lactamase Superfamily 33 Bifidobacterium adolescentis 154487598 RNAirRNA methyltransferase protein KOOS99 34 Bifidobacterium adolescentis 119025149 hypothetical protein 35 Bifidobacterium adolescentis 154487052 hypothetical protein NOGO7592 36 Bifidobacterium adolescentis 154486554 PTS system, enzyme I KOO935 37 Bifidobacterium longum 23335.005 Selenocysteine lyase COGOS2O E KO1763 38 Bifidobacterium longum 23465294 Branched-chain amino acid COG1114 E KO3311 {{8SES 39 Bifidobacterium adolescentis 119025432 Acyl-CoA thioesterase COG1946 I KO1076 40 Bifidobacterium adolescentis 154486,528 Aspartate-semialdehyde COGO136 E KOO133 dehydrogenase 41 Bifidobacterium adolescentis 154487076 Predicted ATPase with chaperone COGO606 O KO7391 activity 42 Bifidobacterium longum 23466221 Alcohol dehydrogenase, class IV COG1454 C KOOO48 43 Bifidobacterium adolescentis 119025541 Phosphoribosylformylglycinamidine COGOO46.7 KO1952 synthase 44 Bifidobacterium adolescentis 119026031 Geranylgeranyl pyrophosphate COGO142 H synthase 45 Bifidobacterium longum 23465502 Signal transduction histidine kinase COG458S T 46 Bifidobacterium adolescentis 154486631 Predicted metal-binding, possibly COG1399 R nucleic acid-binding protein 47 Bifidobacterium adolescentis 15448.8013 Sugar (pentulose and hexulose) COG1070 G KOO853 kinases 48 Bifidobacterium adolescentis 119025777 Aspartate carbamoyltransferase COGOS4O KOO609 49 Bifidobacterium adolescentis 119025510 Superfamily II DNA helicase COGOS14 KO3654 50 Bifidobacterium adolescentis 119026360 Protease II COG1770 E KO1354 51 Bifidobacterium adolescentis 119025672 Signal transduction histidine kinase COG392O T 52 Bifidobacterium adolescentis 154487392 Orotidine-5'-phosphate decarboxylase COGO284 KO1591 53 Bifidobacterium adolescentis 154487114 Permeases of the major facilitator COGO477 GEPR Superfamily S4 Bifidobacterium adolescentis 119025804 Predicted Fe–S-cluster redox enzyme COGO820 R KO6941 55 Bifidobacterium longum 23465197 Permeases of the major facilitator COGO477 GEPR Superfamily 56 Bifidobacterium adolescentis 154487064 Superfamily II RNA helicase COG4581 KO1529 57 Bifidobacterium longum 23465727 ABC-type dipeptide transport system COGO747 E KO2O3S 58 Bifidobacterium adolescentis 154486,507 hypothetical protein 59 Bifidobacterium longum 23465,472 Predicted transcriptional regulator COG286S K 60 Bifidobacterium adolescentis 154486695 ABC-type phosphate transport system COG0226 C KO2O40 US 2014/O128289 A1 May 8, 2014 33

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 61 Bifidobacterium longum 23466332 Dihydroxyacid COGO129 EG KO1687 dehydratase/phosphogluconate dehydratase 62 Bifidobacterium adolescentis 154489143 Predicted COGO637 R phosphatase/phosphohexomutase 63 Bifidobacterium adolescentis 154486.988 Phosphoribosylaminoimidazole COGOO26 KO1589 carboxylase 64 Bifidobacterium adolescentis 154486732 glycoside hydrolase family 77 COG1640 G KOO7OS 65 Bifidobacterium adolescentis 154487590 Uncharacterized conserved protein COG3247 S 66 Bifidobacterium adolescentis 154486669 Acetyl-CoA carboxylase COG4799 KO1966 67 Bifidobacterium adolescentis 154488O16 Homoserine kinase COGOO83 E KOO872 68 Bifidobacterium adolescentis 119026221 glycoside hydrolase family 43 69 Bifidobacterium adolescentis 119025727 CTP synthase (UTP-ammonia lyase) COGOSO)4 KO1937 70 Bifidobacterium adolescentis 154486325 Uncharacterized protein conserved in COG3583 S bacteria 71 Bifidobacterium adolescentis 119025371 Transcription elongation factor COGO195 K KO26OO 72 Bifidobacterium adolescentis 154486867 Sugar (pentulose and hexulose) COG1070 G KOO854 kinases 73 Bifidobacterium adolescentis 154487511 putative cell division protein 74 Bifidobacterium adolescentis 154487124 hypothetical protein 75 Bifidobacterium adolescentis 119025212 hypothetical protein 76 Bifidobacterium adolescentis 154487481 hypothetical protein 77 Bifidobacterium adolescentis 154488824 putative two-component sensor kinase 78 Bifidobacterium adolescentis 154488224 serine threonine protein kinase 79 Bifidobacterium adolescentis 154487149 carbohydrate esterase family 1 8O Bifidobacterium adolescentis 154488135 rRNA methylases COGOS 66 J KOOS99 81 Bifidobacterium adolescentis 154489172 glycoside hydrolase family 77 COG1640 G KOO7OS 82 Bifidobacterium adolescentis 154487327 Superfamily II RNA helicase COG4581 L KO3727 83 Bifidobacterium adolescentis 119025670 Transcription elongation factor COGO782 K KO3624 84 Bifidobacterium adolescentis 15448.6326 Dimethyladenosine transferase COGOO3O J KO2S28 85 Bifidobacterium longum 23465.077 glycosyl-transferase family 51 COGO744 M KO3693 86 Bifidobacterium longum 23464647 hypothetical protein NOG25707 87 Bifidobacterium adolescentis 15448.6363 hypothetical protein 88 Bifidobacterium adolescentis 154486438 Permeases of the major facilitator COGO477 GEPR Superfamily 89 Bifidobacterium longum 23335686 ABC-type antimicrobial peptide COGO577 V KO2004 transport system 90 Bifidobacterium adolescentis 15448.6327 4-diphosphocytidyl-2C-methyl-D- COG1947 KOO919 erythritol 2-phosphate synthase 91 Bifidobacterium adolescentis 154488959 twitching motility protein PilT KO2669 92 Bifidobacterium adolescentis 154486273 Leucyl-tRNA synthetase COGO495 KO1869 93 Bifidobacterium adolescentis 15448.6329 RNA nucleotidyltransferase/poly(A) COGO617 KOO970 polymerase 94 Bifidobacterium adolescentis 154487191 putative phage protein 95 Bifidobacterium adolescentis 154486270 DNA polymerase III, delta Subunit COG1466 KO2340 96 Bifidobacterium adolescentis 154486380 hypothetical protein 97 Anaerostipes caccae 167747544 Non-ribosomal peptide synthetase COG102O Q modules and related proteins 98 Bifidobacterium adolescentis 154486,501 Predicted unusual protein kinase COGO661 R 99 Bifidobacterium adolescentis 154486855 Lacl-family transcriptional regulator OO Bifidobacterium adolescentis 154486.358 Hemolysins and related proteins COG1253 R KO3699 O1 Bifidobacterium adolescentis 154486649 Acetylornithine deacetylase/Succinyl- COGO624 E KO1439 diaminopimelate desuccinylase O2 Bifidobacterium adolescentis 119025555 Orotidine-5'-phosphate decarboxylase COGO284 KO1591 O3 Bifidobacterium longum 23465600 Gamma-glutamyl phosphate COGOO14 E KOO147 reductase O4 Bifidobacterium adolescentis 154486786 FAD synthase/riboflavin kinase/FMN COGO196 H KOO861,0953 adenylyltransferase 05 Bifidobacterium adolescentis 154488712 Ribonuclease D COGO349 KO3684 O6 Bifidobacterium adolescentis 154488649 N-acetylglutamate synthase (N- COG1364 E KO0620,0642 acetylornithine aminotransferase) O7 Bifidobacterium adolescentis 154489082 Ribonucleoside-triphosphate COG1328 KOOS27 reductase O8 Bifidobacterium adolescentis 154487141 transcriptional regulator, AraC family 09 Bifidobacterium longum 23335562 Acetyltransferase (isoleucine patch COGO110 R KOO68O Superfamily) 10 Bifidobacterium adolescentis 119025600 ABC-type amino acid transport COGO76S E system, permease component 11 Bifidobacterium adolescentis 15448.6349 Recombinational DNA repair ATPase COG1195 KO3629 (RecR pathway) US 2014/O128289 A1 May 8, 2014 34

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 12 Bifidobacterium adolescentis 154487341 Succinyl-CoA synthetase COGOO45 C KO1903 13 Bifidobacterium adolescentis 154486419 AdenyloSuccinate synthase COGO104 F KO1939 14 Bifidobacterium adolescentis 154486.323 transcriptional regulator, AraC family 15 Bifidobacterium adolescentis 119025197 3-isopropylmalate dehydratase large COGOO6S E KO1702.3 Subunit 16 Bifidobacterium adolescentis 154489094 Predicted dehydrogenases and COGO673 R related proteins 17 Bifidobacterium longum 23336262 O-acetylhomoserine sulfhydrylase COG2873 E KO1740 18 Bifidobacterium longum 23465907 ABC-type COGO6O1 EP KO2O33 dipeptidefoligopeptide nickel transport systems 19 Bifidobacterium adolescentis 154487000 Threonine aldolase COG2008 E KO162O 2O Bifidobacterium adolescentis 154487167 Sortase and related acyltransferases COG1247 M KO3823 21 Bifidobacterium longum 23465,198 Thioredoxin reductase COGO492,0526 OC KOO384 22 Bifidobacterium adolescentis 154488926 Arabinose efflux permease COG2814 G 23 Bifidobacterium longum 23465931 ABC-type antimicrobial peptide COG1136 V KO2003f4 transport system, ATPase component 24 Bifidobacterium adolescentis 154486352 Type IIA topoisomerase (DNA COGO188 KO1863.2469 gyrasetopo II, topoisomerase IV) 25 Bifidobacterium adolescentis 119026009 Pyruvate-formate lyase-activating COG118O O KO4O69 enzyme 26 Bifidobacterium adolescentis 154487279 Methionine synthase II (cobalamin- COGO62O E KOOS49 independent) 27 Bifidobacterium adolescentis 119025238 Acetolactate synthase COGO440 E KO1653 28 Bifidobacterium adolescentis 119025129 Signal recognition particle GTPase COGO552 U KO3110 29 Bifidobacterium adolescentis 154488132 Asp-tRNAASnf Glu-tRNAGln COGO154 KO2433 amidotransferase 30 Bifidobacterium adolescentis 154486940 ABC-type dipeptide transport system COGO747 E KO2O3S 31 Bifidobacterium adolescentis 154488789 Type IIA topoisomerase (DNA COGO188 KO1863.2469 gyrasetopo II, topoisomerase IV) 32 Bifidobacterium adolescentis 154487377 Long-chain acyl-CoA synthetases COG1022 KO1897 33 Bifidobacterium adolescentis 154488794 DNA-directed RNA polymerase, COGOS68 K KO3O86 sigma subunit 34 Bifidobacterium adolescentis 15448.8989 Superfamily I DNA and RNA COGO210 KO1529 helicases 35 Bifidobacterium adolescentis 154486903 Prolyl-tRNA synthetase COGO442 KO1881 36 Bifidobacterium adolescentis 154488.684 putative helicase 37 Bifidobacterium adolescentis 15448.6399 Lysophospholipase COG2267 38 Bifidobacterium adolescentis 1190266.11 ABC-type Sugar transport systems, COG3839 G KOS816 ATPase components 39 Bifidobacterium adolescentis 154486670 Putative fatty acid synthase/reductase COGO304/0331/ IQ KOOOS9,209, 2O3O4981, 665,666,680 4982 40 Bifidobacterium adolescentis 154488852 ABC-type oligopeptide transport COG4166 E KO2O3S system 41 Bifidobacterium adolescentis 154486664 putative ABC-type Sugar transport system 42 Bifidobacterium adolescentis 119025257 Ribonucleases G and E COG1530 KO1128 43 Bifidobacterium adolescentis 154486472 ABC-type antimicrobial peptide COGO577 V KO2004 transport system 44 Bifidobacterium adolescentis 154487.036 hypothetical protein 45 Bifidobacterium adolescentis 15448.7636 glycoside hydrolase family 2 COG32SO G KO1190 46 Eubacterium dolichum 160915695 glycoside hydrolase family 31 47 Bifidobacterium adolescentis 154489092 Aspartate?tyrosine? aromatic COGO436 E KOO812 aminotransferase 48 Bifidobacterium adolescentis 119026440 hypothetical protein NOG21350 49 Bifidobacterium adolescentis 119025397 Myosin-crossreactive antigen COG4716 S 50 Bifidobacterium adolescentis 119026143 Glutamine amidotransferase COGO118 E KO2SO1 51 Bifidobacterium adolescentis 154487050 Universal stress protein Uspa COGOS89 T 52 Bifidobacterium adolescentis 154486729 Phosphoglycerate dehydrogenase COGO111 HE 53 Bifidobacterium adolescentis 154488261 Predicted hydrolases or COGOS96 R acyltransferases S4 Bifidobacterium adolescentis 154489101 hypothetical protein 55 Bifidobacterium adolescentis 154487476 Phosphotransacetylase COGO280,0857 CR KOO625 56 Bifidobacterium adolescentis 154488788 Uncharacterized proteins of the AP COG1524 R Superfamily 57 Ruminococcus obelum 1538.09835 putative ketose-bisphosphate aldolase 58 Clostridium leptum 160933115 hypothetical protein 59 Bifidobacterium adolescentis 119026429 Ribulose-5-phosphate 4-epimerase COGO23S G KO3O8O 60 Bifidobacterium adolescentis 154487579 glycoside hydrolase family 36 COG3345 G KO74O7 61 Bifidobacterium longum 23464678 hypothetical protein US 2014/O128289 A1 May 8, 2014 35

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 62 Bifidobacterium adolescentis 15448.6391 Serine/threonine protein phosphatase COGO631 T KO1090 63 Bifidobacterium adolescentis 154486962 ABC-type amino acid transport signal COGO834 ET KO2O3O transduction systems 64 Bifidobacterium adolescentis 154486954 DNA primase COGO358 L KO2316 65 Bifidobacterium adolescentis 154486993 Glutamine COGOO34 F KOO764 phosphoribosylpyrophosphate amidotransferase 66 Bifidobacterium adolescentis 154488913 HrpA-like helicases COG1643 L KO3578 67 Bifidobacterium adolescentis 154486787 Predicted ATP-dependent serine COG1066 O KO4485 protease 68 Bifidobacterium adolescentis 154486493 Ammonia permease COGOOO4 P KO332O 69 Bifidobacterium adolescentis 154487494 Methenyl tetrahydrofolate COGO190 H KO0288,1491 cyclohydrolase 70 Bifidobacterium adolescentis 119025196 Transcriptional regulator COG1414 K 71 Dorea longicatena 153853202 hypothetical protein 72 Bifidobacterium adolescentis 154487329 putative transcriptional regulator 73 Bifidobacterium adolescentis 154487591 LacI-family transcriptional regulator 74 Bifidobacterium adolescentis 15448.6321 glycoside hydrolase family 3 75 Bifidobacterium adolescentis 119025741 GTPase COG1159 R KO3595 76 Clostridium scindens 167758922 dUTPase COGO756 F KO1520 77 Bifidobacterium adolescentis 119025587 Signal transduction histidine kinase COGO642 T 78 Bifidobacterium adolescentis 15448.6470 Predicted membrane protein COG4393 S 79 Clostridium scindens 167760262 putative sporulation protein 8O Bacteroides stercoris 167763769 hypothetical protein 81 Anaerostipes caccae 167746872 putative ABC transporter 82 Bifidobacterium adolescentis 154486920 ABC-type amino acid transport signal COGO834 ET KO2O3O transduction systems 83 Bifidobacterium adolescentis 154487063 Uncharacterized conserved protein COG2326 S 84 Bifidobacterium adolescentis 119025989 glycoside hydrolase family 13 COGO366 G KO1187 85 Clostridium bartlettii. 164687864 Lactoylglutathione lyase COGO346 E KO1759 86 Bifidobacterium adolescentis 154486443 ABC-type antimicrobial peptide COGO577 V KO2004 transport system 87 Bifidobacterium adolescentis 154488245 NADH:flavin COG1902 C KOO3S4 oxidoreductases, NADPH2 dehydrogenase 88 Bifidobacterium longum 23465963 atypical histidine kinase sensor of NOG21560 two-component system 89 Bifidobacterium adolescentis 154488949 hypothetical protein 90 Bifidobacterium adolescentis 154486865 maltose O-acetyltransferase 91 Clostridium scindens 167759009 cytidylate kinase KOO945 92 Bifidobacterium adolescentis 154486901 ATP-dependent exoDNAse COGO507 L 93 Ruminococcus torques 153814251 hypothetical protein 94 Bifidobacterium adolescentis 119025327 Ribosomal protein L13 COGO1 O2 J KO2871 95 Bifidobacterium adolescentis 154488916 ABC-type antimicrobial peptide COG1136 V transport system 96 Bifidobacterium adolescentis 119025389 putative histidine kinase sensor of two component system 97 Ruminococcus gnavus 154504598 Translation elongation factor P (EF- COGO231 J KO2356 P) initiation factor 5A (eIF-5A) 98 Bifidobacterium adolescentis 119026648 ribonuclease P NOG21633 KO3S36 99 Clostridium scindens 167760715 hypothetical protein 200 Bifidobacterium adolescentis 119026098 Uncharacterized conserved protein COG2606 S 2O1 Clostridium scindens 167761320 ABC-type antimicrobial peptide COG1136 V KO2003 transport system 2O2 Bacteroides stercoris 167762249 hypothetical protein 2O3 Anaerostipes caccae 167746530 putative ion channel 204 Bifidobacterium adolescentis 119025.057 Serine/threonine protein kinase COGOS 15 RTKL. 205 Clostridium bartlettii. 16468,6672 Molybdopterin biosynthesis enzymes COGOS21 H KO3638 2O6 Ruminococcus obelum 153811887 hypothetical protein 2O7 Clostridium spiroforme 169349879 protein-Np-phosphohistidine-Sugar KOO890 phosphotransferase 208 Clostridium ramosum 167756439 type I restriction enzyme, S subunit KO1154 209 Bifidobacterium adolescentis 119025640 Short-chain alcohol dehydrogenase of COG4221 R unknown specificity 210 Eubacterium ventriosum 154483925 Uncharacterized conserved protein COG2SO1 S 211 Bifidobacterium adolescentis 154487477 Phosphoketolase COG3957 G KO1621,32.36 212 Bifidobacterium adolescentis 154489149 Putative molecular chaperone COGO443 O KO1529,4043, 8070 213 Bifidobacterium adolescentis 119025585 hypothetical protein 214 Clostridium scindens 167759334 ABC-type antimicrobial peptide COG1136 V KO2003 transport system US 2014/O128289 A1 May 8, 2014 36

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 215 Anaerostipes caccae 167748732 Serine-pyruvate COGOO75 E KO3430 aminotransferase? archaeal aspartate aminotransferase 216 Ruminococcus gnavus 154505702 Putative phage replication protein COG2946 L KO7467 RstA 217 Bifidobacterium adolescentis 154486389 Cell division protein FtsI COGO768 M 218 Bifidobacterium adolescentis 154488668 ABC-type cobalt transport system COG1122 P KO2OO6 219 Bifidobacterium adolescentis 154486277 Fructose-2,6- COGO4O6 G KO1834 bisphosphatase/phosphoglycerate mutase 220 Clostridium scindens 167758556 hypothetical protein 221 Dorea longicatena 153855715 putative acetyltransferase 222 Eubacterium dolichum 160915136 ABC-type antimicrobial peptide COG1136 V KO2003 transport system 223 Bifidobacterium adolescentis 119026205 soleucyl-tRNA synthetase COGOO60 J KO1870 224 Ruminococcus obelum 153810514 glycoside hydrolase family 23 COGO741,91 M 225 Eubacterium eligens Contig2011.538 putative phosphohydrolase 226 Bifidobacterium adolescentis 154487387 Transcriptional regulator COGOS83 K 227 Ruminococcus obelum 153812199 putative flavodoxin 228 Bifidobacterium adolescentis 154486996 Phosphoribosylformylglycinamidine COGOO46.7 F KO1952 (FGAM) synthase 229 Dorea longicatena 153854.194 Ornithinefacetylornithine COG4992 E KOO818 aminotransferase 230 Ruminococcus gnavus 154505209 Predicted GTPases COG1160 R 231 Dorea longicatena 153853531 Predicted transcriptional regulators COG1695 K 232 Ruminococcus torques 153814203 Acetyltransferases COGO456 R KO3826 233 Clostridium scindens 167761371 putative ABC-type transport system 234 Bifidobacterium longum 38906105 FOF1-type ATP synthase COGOOSS C KO2112 235 Collinsella aerofaciens 139439837 hypothetical protein 236 Clostridium leptum 160933,570 ABC-type antimicrobial peptide COGO577 (1136 V KO2003 transport system 237 Eubacterium rectale 2731 putative sensor histidine kinase 238 Bifidobacterium adolescentis 154489126 ABC-type multidrug transport system COG1132 V KO6147 239 Ruminococcus obelum 153812105 putative conjugative transposon NOGOS968 protein 240 Dorea longicatena 153853999 hypothetical protein 241 Clostridium bolteae 160937390 hypothetical protein 242 Ruminococcus torques 153814809 cytidylate kinase KOO945 243 Ruminococcus obelum 153810530 hypothetical protein 244 Clostridium scindens 167758273 putative alanine racemase 245 Clostridium scindens 167760222 putative ABC transporter 246 Dorea longicatena 153854759 Sporulation protein COG2088 M KO6412 247 Bifidobacterium adolescentis 119025414 glycosyl-transferase family 4 248 Ruminococcus obelum 153813075 hypothetical protein 249 Eubacterium ventriosum 154482695 Queuine/archaeosine tRNA- COGO343 J KOO773 ribosyltransferase 250 Ruminococcus obelum 153811892 hypothetical protein 251 Ruminococcus obelum 153810246 Type IV secretory pathway, VirB4 COG3451 U components 252 Dorea longicatena 153854838 Ribosomal protein S16 COGO228 J KO2959 253 Dorea longicatena 153855241 putative DNA gyrase, subunit A 254 Collinsella aerofaciens 139438412 putative transcriptional regulator 255 Clostridium leptum 160934853 putative ribosomal-protein-alanine acetyltransferase 256 Eubacterium rectale 3602 Type IV secretory pathway, Vir)4 COG3505 U components 257 Bifidobacterium adolescentis 15448.6460 ABC-type multidrug transport system COG1132 V KO6147 258 Anaerostipes caccae 167746203 exonuclease SbcC KO3S46 259 Ruminococcus obelum 153813732 hypothetical protein 260 Eubacterium ventriosum 154484729 protein-Np-phosphohistidine-Sugar KOO890 phosphotransferase 261 Eubacterium rectale 3363 putative ABC transporter 262 Ruminococcus obelum 1538.09913 hypothetical protein 263 Anaerostipes caccae 167748861 putative arylsulfate sulfotransferase 264 Eubacterium eligens Contig2011.154 Uncharacterized conserved protein COG4283 S 26S Clostridium scindens 167759418 putative competence protein ComEA 266 Eubacterium rectale 3439 putative RNA-directed DNA polymerase 267 Clostridium bolteae 16094.0954 SAM-dependent methyltransferases COGOSOO QR KOOS99 268 Ruminococcus obelum 15381 1726 putative DNA topoisomerase 269 Ruminococcus obelum 153813044 putative transposase US 2014/O128289 A1 May 8, 2014 37

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins SEQ. ID COG KEGG Orthologous No: Genome and NCBI proteinID Annotation COG Categories groups 270 Eubacterium rectale 2410 type I restriction enzyme, R. Subunit KO1152.3 271 Clostridium bolteae 160941795 putative recombination protein 272 Bifidobacterium adolescentis 1544867 24 putative esterase 273 Collinsella aerofaciens 139438485 putative amidohydrolase

TABLE 1.4 Bacterial genes enriched in gut microbiomes of lean MZ twins KEGG SEQ. COG orthologous No. Genome and NCBI proteinID Annotation COG Categories groups 274 Bacteroides capillosus 1545.00567 putative amidohydrolase 275 Clostridium leptum 160934848 putative acetyltransferase 276 Ruminococcus obelum 153810033 phosphocarrier protein HPr KO2784 277 Eubacterium siraeum 167749283 putative ABC transporter related protein 278 Bacteroides capillosus 154497054 Polyribonucleotide COG1.185 J KOO962 nucleotidyltransferase 279 Eubacterium siraeum 167749675 soleucyl-tRNA synthetase COGOO60 J KO1870 28O Eubacterium rectale 3617 hypothetical protein 281 Bacteroides capillosus 154498.345 putative sporulation protein 282 Parabacteroides merdae 154490921 hypothetical protein 283 Bacteroides capillosus 154500960 putative chromosome segregation protein 284 Ruminococcus torques 153814925 putative sporulation protein 285 Clostridium scindens 167758815 glycosyl-transferase family 4 286 Clostridium sp. L2 50 160893.842 Protease subunit of ATP-dependent COGO740 OU KO1358 Clp proteases 287 B theta WH2O00545 putative type I restriction enzyme EcoAI specificity protein 288 Bacteroides capillosus 154500843 Erk system potassium uptake protein KO3499 TrikA 289 Clostridium bolteae 160936948 putative two-component transcriptional regulator 290 Bacteroides capillosus 154498.005 ATP-dependent serine COG1066 O KOOS67 protease/cysteine S methyltransferase 291 Parabacteroides merdae 154492394 hypothetical protein 292 Bacteroides capillosus 154498.009 Fructoseftagatose bisphosphate COGO191 G KO1622 aldolase 293 B theta 3731 OOO845 hypothetical protein 294 Anaerotruncus colihominis 167769594 Predicted ATPase (AAA+ COG1373 R Superfamily) 295 Bacteroides capillosus 1545.00228 putative translation protein 296 Anaerofustis stercorihominis 1693346.67 putative DNA recombinase 297 B theta 3731 OO3400 hypothetical protein 298 Parabacteroides distasonis 150008749 hypothetical protein 299 Bacteroides fragilis 19068109 mobilization protein BmgA NOG11714 3OO Eubacterium dolichum 160914-154 glycoside hydrolase family 20 COG3525 G KO12O7 301 Bacteroides capillosus 154497125 RNA methyltransferase, TrmH family KO3218 3O2 Clostridium sp. L2 50 160894.658 NTP pyrophosphohydrolases COGO494,3323 LRS KO3S74 303 Parabacteroides merdae 154494925 Glyceraldehyde-3-phosphate COGOO57 G KOO134 dehydrogenase Bacteroides capillosus 1544.96139 Type IIA topoisomerase (DNA COGO188 L KO1863,2469 gyrasetopo II, topoisomerase IV) 305 Clostridium ramosum 167755346 MoxR-like ATPase KO3924 306 Bacteroides uniformis 160888848 hypothetical protein 307 Ruminococcus gnavus 154504651 Putative translation initiation inhibitor COGO2S1 J KO7567 3O8 Bacteroides uniformis 160890270 putative phage protein 309 Bacteroides capillosus 154500164 putative DNA recombinase 310 B theta WH2O00807 sulfotransferase? FAD synthetase COGO175 EH KOO957 311 Bacteroides uniformis 160892052 carbohydrate esterase family 4 and 12 312 Clostridium sp. L2 50 160893671 hypothetical protein 313 Bacteroides capillosus 154500952 hypothetical protein 314 Clostridium scindens 167759293 putative ribonucleoside-triphosphate reductase activating protein US 2014/O128289 A1 May 8, 2014 38

TABLE 14-continued Bacterial genes enriched in gut microbiomes of lean MZ twins KEGG SEQ. COG orthologous No. Genome and NCBI proteinID Annotation COG Categories groups 315 Bacteroi es capillosus 154498.134 Predicted GTPases COG1160 R KO3977 316 Bacteroi es capillosus 154500412 ribosomal protein 317 Bacteroi es fragilis 60683403 midazolonepropioinase and related COG1228 KO1468 amidohydrolases 318 Peptostreptococcus micros 160946111 hypothetical protein NOG15344 319 B theta 7330 OO1524 putative transposase 32O Bacteroides capillosus 1545.00229 putative peptidase 321 Bacteroides vulgatus 150006208 integrase COGOS82 322 Bacteroides capillosus 1545O1540 hypothetical protein 323 Bacteroides stercoris 167762500 Site-specific recombinase XerD COG4974 324 Bacteroides fragilis 60679880 glycoside hydrolase family 38 COGO383 KO1.191 325 Bacteroides capillosus 154497979 putative replication protein 326 Bacteroides capillosus 154500160 putative helicase 327 Bacteroides stercoris 167752230 Retron-type reverse transcriptase COG3344 328 B theta WH2 OO3792 hypothetical protein NOG14996 329 Bacteroides capillosus 154497731 hypothetical protein 330 Parabacteroides merdae 154494117 UDP-N-acetyl-D-mannosaminuronate COGO677 KO2472 dehydrogenase 331 Bacteroides caccae 1538.07847 2-succinyl-6-hydroxy-2,4- COG116S cyclohexadiene-1-carboxylate syntinase 332 Anaerotruncus colihominis 167771309 N-acetylglutamate synthase (N- COG1364 KOO618 acetylornithine aminotransferase) 333 B theta WH2 OO3808 putative outer membrane protein 334 Eubacterium dolichum 160914-195 putative copper-translocating P-type KO1529 ATPase 335 Bacteroides fragilis 53715551 Predicted ATPase COG1373 336 Clostridium bolteae 160937654 putative phage protein 337 Bacteroides fragilis 53712550 Alkyl hydroperoxide reductase COG3634 KO3387 338 Parabacteroides merdae 154492101 hypothetical protein 339 Clostridium bolteae 160936352 Uncharacterized conserved protein COG2606 340 Bacteroides uniformis 160889340 TraM 341 B theta 7330 002089 Adenine-specific DNA methylase COGO827,4646 KL 342 B theta WH2 OO3982 putative outer membrane protein 343 Bacteroides capillosus 154496743 hypothetical protein 344 Clostridium bolteae 160941240 putative citrate lyase 345 Bacteroides capillosus 154496327 putative v-type ATPase 346 Bacteroides capillosus 154496839 putative cobalamin biosynthesis protein 347 Bacteroides fragilis 60683742 Small-conductance mechanosensitive COGO668 channel 348 Eubacterium siraeum 167749611 putative transcriptional regulator 349 Parabacteroides distasonis 150007998 Cobyric acid synthase COG1492 KO2232 350 Parabacteroides distasonis 150008480 putative pyruvate formate-lyase 3 activating enzyme 351 Bacteroides capil osus 154496329 Na+-transporting two-sector KO1549; SO ATPase/ATP synthase 352 Bacteroides capil osus 154496850 hypothetical protein 353 Bacteroides capil osus 154496749 putative spore maturation protein 3S4 Bacteroides capil osus 154496.148 putative spore protease 355 Clostridium bolteae 160937655 DNA polymerase KOO961 356 Bacteroides fragi is 60683107 Putative copper/silver efflux pump COG3696 KO7239,7787 357 Bacteroides capillosus 154496.295 putative short-chain dehydrogenase/reductase 358 Anaerotruncus colihominis 167771023 stage V sporulation protein AC KO64OS 359 B theta WH2 OO4992 ABC-type multidrug transport system COGO842 KO9686 360 Bacteroides capillosus 1545004.09 Transcription antiterminator COGO2SO k KO26O1 361 B theta 3731 OO3445 putative tyrosine type site-specific NOG36763 recombinase 362 B theta WH2O03671 putative 3-oxoacyl-acyl-carrier protein synthase 363 Parabacteroides distasonis 150010457 hypothetical protein 364 Bacteroides fragilis 60681723 putative hydrolase lipoprotein NOGO9493 365 Clostridium scindens 167758928 putative transcriptional regulator 366 Bacteroides capillosus 154498.046 Exonuclease VII Small subunit COG1722 367 Ruminococcus gnavus 154504691 putative phage protein 368 Anaerotruncus colihominis 167772969 hypothetical protein 369 Bacteroides caccae 153808785 Predicted nucleoside-diphosphate COG1086 MG Sugar epimerases 370 Alistipes putredinis 167751920 phosphoglycolate phosphatase KO1091 371 Anaerotruncus colihominis 167772790 hypothetical protein US 2014/O128289 A1 May 8, 2014 39

TABLE 14-continued Bacterial genes enriched in gut microbiomes of lean MZ twins KEGG SEQ. COG orthologous No. Genome and NCBI proteinID Annotation COG Categories groups 372 Parabacteroides merdae 1544.94124 putative transcriptional regulator 373 Bacteroides caccae 1538.09523 glycoside hydrolase family 29 COG3669 G KO12O6 374 Bacteroides fragilis 46242778 Trad conjugation protein 375 Bacteroides capillosus 154499075 putative site-specific recombinase 376 Anaerotruncus colihominis 163816273 putative DNA helicase 377 Bacteroides capillosus 154495881 Pentose-5-phosphate-3-epimerase COGOO36 G KO1783 378 Bacteroides uniformis 160887913 hypothetical protein 379 Dorea longicatena 153853397 putative phage protein 380 Bacteroides vulgatus 150003721 putative outer membrane protein 381 B theta WH2O02145 putative outer membrane protein 382 Bacteroides capillosus 1545.00525 hypothetical protein Lean 383 Alistipes putredinis 167752229 putative DNA primase NOG22337

Example 8 entire cohort of lean and obese MOAFTS twins in terms of parity (nulliparous/parous), educational attainment (more BMI Categorization by Ethnicity in Participants in than high School education/high School education or less) and Missouri Adolescent Female Twin Study marital status (married or living with someone as married/not (0176 BMI category by ethnicity for the entire MOAFTS married; p-0.05 for all comparisons). Obese EA women pro wave 5 cohort, based on 3326 twins with complete data on viding biospecimens had a mean BMI at wave 5 of 36.9+4.7 height and weight is summarized in Table 15. Dizygotic (DZ) compared with a mean among EA lean women of 21.4-1.5 twins had a significantly higher mean BMI than monozygotic (meantsd). EA twins were selected as being stably lean (MZ) twins 25.8+6.5 vs. 24.8+5.9, p<0.001, meantsd), and across all waves of data collection (i.e., baseline at median a higher prevalence of overweight (22.8 vs 20.9%) and obese age 15, one-year follow-up, 5-year follow-up and 7-year fol (20.7 vs 16.1%;2=31.6, p<0.001). This may reflect a higher low-up), with a self-reported BMI of 18.5-24.9 kg/m. dizygotic twinning rate among obese women (MZ twinning Example 9 occurs randomly39). BMI was more highly correlated in MZ twins than in DZ twins, both in EA pairs (rMZ=0.80, rDZ=0. Comparison of Amplification Methods in Taxonomic 48) and in AA pairs (rMZ=0.73, rDZ=0.26), and this Assignments remained true when analysis was restricted to pairs concor (0178 A frequently reported result from any 16S rRNA dant for obesity (EA: rMZ=0.61, rDZ=0.27: AA rMZ=0.62, gene sequence-based Survey is the relative abundance of bac rDZ=-0.11) or concordant for leanness (EA: rMZ-0.43, terial phyla. Given the broad nature of these phyla and the fact rDZ=0.14; AA: rMZ=0.55, rDZ=0.39). After age-adjust that a relatively few phyla dominate the human distal gut ment, quantitative genetic modeling yielded an estimated microbiota, it might be expected that the relative abundance additive genetic variance for BMI of 68% (95% Confidence of each phylum be consistent regardless of the amplification Interval ICI: 57-79%), shared environmental variance of and sequencing methods used. However, differences were 14% (95% CI: 2-24%), and non-shared environmental vari observed between methods in this study (FIGS. 13 A-E). ance of 14% (95%CI: 17-21%). Data from the Behavioral Relative to the sampled gut microbiomes (defined by pyrose Risk Factor Surveillance System for Missouri women of quencing of total community DNA), the full-length, V2/3, comparable age in 2006 yield higher rates of overweight and and V6 16S rRNA gene datasets were all significantly obesity in EA women (23.8% overweight and 25% obese) depleted for Bacteroidetes (paired Student's t-test, p<0.001), compared to rates observed in MOAFTS (19.6% overweight and significantly enriched for Firmicutes (p<0.01). One pos EA, 14.8% obese EA). sible explanation for these differences is that the TABLE 1.5 BMI category in the Missouri Adolescent Female Twin Study Obese Obese Underweight Lean Overweight Obese I II III (n = 138) (n = 1893) (n = 711) (n = 309) (n = 174) (n = 113) EA 4.79 60.87 19.58 8.08 4.27 2.41 (n = 2860) AA O.21 31.80 31.59 16.32 10.88 9.21 (n = 478) All numbers are percentages. Underwight: , 18.5 kg/m. Lean 18.5-24.9 kg/m 25-29.9 kg/m. Obese I: 30-34.9 kg/m. Obese II: 35-39.9 kg/m. Obese III: 40 kg/m.

0177 Lean and obese women selected for inclusion in the Bacteroidetes reference genomes are more closely related to biospecimen collection project were representative of the those in the microbiomes than the Firmicutes reference US 2014/O128289 A1 May 8, 2014 40 genomes, thereby inflating estimates of the relative abun PCR-based methods underestimate the relative abundance of dance of this phylum (FIG. 10). To address this potential the Bacteroidetes (FIG. 13F). Moreover, results obtained confounding factor, 16S rRNA gene fragments from all 18 from shotgun sequencing 16S rRNA gene fragments and PCR microbiome datasets were identified and classified them taxo amplification of the V2/3 region showed the strongest corre nomically. The results of this analysis confirmed that the three lation (FIG. 13G).

SEQUENCE LISTING The patent application contains a lengthy “Sequence Listing section. A copy of the “Sequence Listing is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US2014.0128289A1). An electronic copy of the “Sequence Listing will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR1.19(b)(3).

What is claimed is: 8. An array comprising a Substrate, the Substrate having 1. An array comprising a Substrate, the Substrate having disposed thereon disposed thereon (a) at least one polypeptide indicative of, or modulated in, (a) at least one nucleic acid indicative of, or modulated in, an obese host microbiome compared to a can host micro an obese host microbiome compared to a lean host biome, or microbiome, or (b) at least one polypeptide indicative of, or modulated in, (b) at least one nucleic acid indicative of, or modulated in, a can host microbiome compared to an obese host micro a lean host microbiome compared to an obese host biome. microbiome. 9. The array of claim 8, wherein the polypeptide is encoded 2. The array of claim 1, wherein the nucleic acid comprises by a nucleic acid sequence selected from the nucleic acid a nucleic add sequence selected from the nucleic acid sequences listed in Table 13 or Table 14. sequences listed in Table 13 or Table 14, or a nucleic acid 10. The array of claim 8, wherein the polypeptide or sequence capable of hybridizing to a nucleic acid sequence polypeptides are located at a spatially defined address of the listed in Table 13 or 14. array. 3. The array of claim 1, wherein the nucleic acid or nucleic 11. The array of claim 10, wherein the array has no more acids are located at a spatially defined address of the array. than 500 spatially defined addresses. 4. The array of claim3, wherein the array has no more than 12. The array of claim 10, wherein the array has at least 500 500 spatially defined addresses. spatially defined addresses. 5. The array of claim 3, wherein the array has at least 500 spatially defined addresses. 13. The array of claim 9, wherein the nucleic acid sequence 6. The array of claim 1, wherein the nucleic acid sequence is selected from the group consisting of sequences encoded is selected from the group consisting of sequences encoded by SEQID NO:1-273. by SEQID NO:1-273. 14. The array of claim 9, wherein the nucleic acid sequence 7. The array of claim 1, wherein the nucleic acid sequence is selected from the group consisting of sequences encoded is selected from the group consisting of sequences encoded by SEQID NO:274-383. by SEQID NO:274-383. k k k k k