US 20110177976A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0177976 A1 Gordon et al. (43) Pub. Date: Jul. 21, 2011

(54) METHODS FOR PROMOTING WEIGHT Related U.S. Application Data LOSS AND ASSOCATED ARRAYS (60) Provisional application No. 61/076,887, filed on Jun. (75) Inventors: Jeffrey I. Gordon, St. Louis, MO 30, 2008, provisional application No. 61/101.011, (US); Peter Turnbaugh, St. Louis, filed on Sep. 29, 2008. MO (US) Publication Classification (73) Assignee: THE WASHINGTON (51) Int. Cl. UNIVERSITY, St. Louis, MO C40B 40/08 (2006.01) (US) C40B 40/10 (2006.01) (21) Appl. No.: 13/002,137 (52) U.S. Cl...... 506/17:506/18 (22) PCT Fled: Jun. 30, 2009 (57) ABSTRACT (86) PCT NO.: PCT/USO9/49253 Methods of modulating body fat or weight loss are presented Nucleic acid and protein microarrays that comprise biomol S371 (c)(1), ecules associated with an obese host microbiome or a lean (2), (4) Date: Mar. 29, 2011 host microbiome are utilized for analysis. Patent Application Publication Jul. 21, 2011 Sheet 1 of 50 US 2011/0177976 A1

SEE FIG 11

SEE FIG 12

F.G. 1

SEE FIG 1.3

SEE FIG 1.4

SEE FG 15 Patent Application Publication Jul. 21, 2011 Sheet 2 of 50 US 2011/0177976 A1

CONTAT FIG. 12 FIG 11 Patent Application Publication Jul. 21, 2011 Sheet 3 of 50 US 2011/0177976 A1

CONT FROM FG 11

CONT. A. F.G. 1.3 F.G. 12 Patent Application Publication Jul. 21, 2011 Sheet 4 of 50 US 2011/0177976 A1

CONT FROM FIG. 12

CONT. AT F.G. 14 F.G. 13 Patent Application Publication Jul. 21, 2011 Sheet 5 of 50 US 2011/0177976 A1

CONT. FROM FG 1.3

CONTAT FIG. 15

F.G. 14 Patent Application Publication Jul. 21, 2011 Sheet 6 of 50 US 2011/0177976 A1

CONT. FROM FG 14

FG 15 Patent Application Publication Jul. 21, 2011 Sheet 7 of 50 US 2011/0177976 A1

0.82 0.80 .8 i . . r i . i.88 88 treated a Seif vice ity; it

122 102 8 k

s 2

4. 2

2 2000 4000 6000 8000 10000 Number of sequences

FIG 2 Patent Application Publication Jul. 21, 2011 Sheet 8 of 50 US 2011/0177976 A1

A 0.66 B ors Fisii-ie:gth ,83

8 8,8: & s 0.63 8,82 36 5

8,8

.38 treated Relate: w unrelated Reiate C 16 v213

i

12 s

-sa 8 as: e 8 . 4.

(it 8. 8. Nitroer of sequences FIG 3 Patent Application Publication Jul. 21, 2011 Sheet 9 of 50 US 2011/0177976 A1

D 8. Full-length 4 Elean 2.5-3 III it obese 2 11 II g 1.5 I i

s 100 s 200 Niniber of sequences

4000 S 8 isier of sectieces

FIG 3 Patent Application Publication Jul. 21, 2011 Sheet 10 of 50 US 2011/0177976 A1

'l Full-length

s

so to so 20 Number of sequences

FIG 3F Patent Application Publication Jul. 21, 2011 Sheet 11 of 50 US 2011/0177976 A1

A 8

.8

3.8

.

. . ear e8. tireiated eated

.8

,8.

.

.8

.

.8- Cese Othese treated reiated FIG 4 Patent Application Publication Jul. 21, 2011 Sheet 12 of 50 US 2011/0177976 A1

SEE FIG 51

SEE FIG 52

FIG 5 SEE FIG 53

SEE FIG 54

SEE FIG 55

SEE FIG 56

Patent Application Publication Jul. 21, 2011 Sheet 13 of 50 US 2011/0177976 A1

CON. A FG, 5.2

FIG 51 Patent Application Publication Jul. 21, 2011 Sheet 14 of 50 US 2011/0177976 A1

CONT FROM FG, 5.1

F832

...... kees F838:

CON AT FIG. 5.3 FIG 52 Patent Application Publication Jul. 21, 2011 Sheet 15 of 50 US 2011/0177976 A1

CON FROM FIG. 5.2

CONT. AT FIG. 54. F.G. 5.3 Patent Application Publication Jul. 21, 2011 Sheet 16 of 50 US 2011/0177976 A1

CONT FROM FG 53

CONTAT FIG 5.5 F.G. 54 Patent Application Publication Jul. 21, 2011 Sheet 17 of 50 US 2011/0177976 A1

CONT. FROM FG, 5.4

F

FS:

CONT. ATFG. 5.6

FIG 55 Patent Application Publication Jul. 21, 2011 Sheet 18 of 50 US 2011/0177976 A1

CONT FROM FIG 55

FIG 5.6 Patent Application Publication Jul. 21, 2011 Sheet 19 of 50 US 2011/0177976 A1

initial timepoint Second timepoint

O 1O 2 30 4, 5 6.O 70 30 O CO. O. 10 20 30 40 50 60 7O 8O 90 OO Relative abundance (% of 16S rRNA sequences) ... Actioacteria Frictes Eacterocetes Proteobacteria. Other FG 6 Patent Application Publication Jul. 21, 2011 Sheet 20 of 50 US 2011/0177976 A1

8: *&^ 88% of itivitiais ::::: *: 83% of it civicas *::::::::::: Ricycities / 8: w8 88% fiftivisitas / 8.338 civicas 88: 8:::iii.33s 38:

4.

8. 8: 88: 888 288 300 88:38: {:f3>8888&is&isis:

FIG 7 Patent Application Publication Jul. 21, 2011 Sheet 21 of 50 US 2011/0177976 A1

* SO 8 it r:- 58 ft &x 28 it - 258

• 3 38 : * : : 338 338 : fiti gee

20 30 40 50 60 to 80 90 % identity

F.G. 8 Patent Application Publication Jul. 21, 2011 Sheet 22 of 50 US 2011/0177976 A1 4.88 2

&&3. s * st

20 30 40 50... 60 to 80 90 100 3. c. 8 "it Bit-score wxw 388 it ''''''. 238 it * 3: : :::::::::::: 38 it D *::::::::::*: ) t S8 ri S8 it 8 frigee

8. s

4.

2 Patent Application Publication Jul. 21, 2011 Sheet 23 of 50 US 2011/0177976 A1

E - e 0.8 to.6. as 0.4 0.2

* 8 838 it 2 3 4 5 6 8 * i38 st &M 2 % identity -- an F

S8 ft 8 frigee

S ,3.k8 a. rw8xxx::::::::::::::::M 2 3 4 5 S 3 38 Bit-score

FIG 8 Patent Application Publication Jul. 21, 2011 Sheet 24 of 50 US 2011/0177976 A1

.

.2

* 38 ::::::::::::: 33

as 5 t 8 futi gene

.8

4. 0.2

. 2 3 4 8 8 % identity

FIG 8 Patent Application Publication Jul. 21, 2011 Sheet 25 of 50 US 2011/0177976 A1

f to.8 in0.6 a a 0.4 s 0.2 20 30 40 50 6070 80 90 100 Bit-score

it futi gene

FIG 8 Patent Application Publication Jul. 21, 2011 Sheet 26 of 50 US 2011/0177976 A1

6"50MET Patent Application Publication Jul. 21, 2011 Sheet 27 of 50 US 2011/0177976 A1

s &{x Bacteroidetes xT& S. ki Firicites x &

S S S S &

so ss so "ss" to vs " so as "go 'gs too Percent icientity to reference geitones

FIG 10 Patent Application Publication Jul. 21, 2011 Sheet 28 of 50 US 2011/0177976 A1

Lean/Overweight F G. 1. 1. Obese Alistipes putrednis v2.1 1 OO

6O

2 O

Bacteroides fragilis NCTC9343 100 60

20.:

Bacteroides fragilis YCH46

------is:

Bactero ides ovatus V2. 100 3-c.c. ::::::::::::8.4.4.4.4.4.4.4.4.4.4.4.4.4,8.4.4.4.4.8 6 O

2 O ser ---. Bacteroides sterCoris v2.1 100

60

20...... Patent Application Publication Jul. 21, 2011 Sheet 29 of 50 US 2011/0177976 A1

lean?Overweight F G. 11 Obese

Clostridium leptum v3.1 100: 60

Mb Mb Patent Application Publication Jul. 21, 2011 Sheet 30 of 50 US 2011/0177976 A1

Lean/Overweight F G. 1 1. Obese Collinsella aerofaciens v3.1

100: w s s s

6O :s s s s

Methanobrevibacter Smithii ACC35061 Patent Application Publication Jul. 21, 2011 Sheet 31 of 50 US 2011/0177976 A1

Lean/Overweight FIG 11 Obese Bacteroides distasonis ACC8503

Ruminococcus gnavus v. 100

Ruminococcus obeum v3.1 1 OO

6 O

2 O

Patent Application Publication Jul. 21, 2011 Sheet 33 of 50 US 2011/0177976 A1 C 388S: s 3.

3. 2. 3. 38 S. S} React eigii;

D

23: 38 88& S. 838 e88 > Sample Sarpie2 s. As threshot: e-waites:8 8-waite<18 8waitext e-wak-ex8, hitscorex$8%icientity-38 FG, 12 Patent Application Publication Jul. 21, 2011 Sheet 34 of 50 US 2011/0177976 A1

A A

83%

83%

($3.

::

%

3%

8.

38%

88:

33%

%

1%

8:

3.

38%

& & 2 & 3 8 8 & & 38 &: 8S - 8S S . . is . is. k-. 3 . & ...... is is i.

riricites Bacterchitiates Acticeacteria oteobacteria FIG. 13 « Other Patent Application Publication Jul. 21, 2011 Sheet 35 of 50 US 2011/0177976 A1

Vicrobiore sequences D 38%

83.

88%

48%.

88:

8. 8%, 88:

28% 8% i s i s s s

firictites Bactercitietes Actiidacteria prote acteria FG 13 x tet Patent Application Publication Jul. 21, 2011 Sheet 36 of 50 US 2011/0177976 A1

88::::::::::::::::: *x-ext: 2: fs 88.8 Kerence gast gessssss :8S ra&A fragments

------

2: is FS2& 3. 28 3x3 E8 28. 48 F: SR 3R8s; Sox 323s scs s as: 88:

F.G. 14A Patent Application Publication Jul. 21, 2011 Sheet 37 of 50 US 2011/0177976 A1

B 1.

8

2

-8, 8 -0.4 -8.2. { 0.2 0.4 . PC1 (20%) 9.9.3.9.9. 56789. 9 4 Win WS Win vs Unreated TWi Non pairs

FIG 14. Patent Application Publication Jul. 21, 2011 Sheet 38 of 50 US 2011/0177976 A1

8 8

8. 388 8:8 S{{: 20000 25000 8.338 of $88:38:ces

FG 15 Patent Application Publication Jul. 21, 2011 Sheet 39 of 50 US 2011/0177976 A1

W79?,"50MET Patent Application Publication Jul. 21, 2011 Sheet 40 of 50 US 2011/0177976 A1

4.50 k

4.00 3.50- Bacteroidetes 3.00

2.5

2. 5.

0.50 0.00

CAZy families

FG 16B Patent Application Publication Jul. 21, 2011 Sheet 41 of 50 US 2011/0177976 A1

°CA2 (5%) |co Patent Application Publication Jul. 21, 2011 Sheet 42 of 50 US 2011/0177976 A1

Bacterial phyiu

E

i i dietes : Actioacter ote: ac te s Otte

FG 18A Patent Application Publication Jul. 21, 2011 Sheet 43 of 50 US 2011/0177976 A1

COG categories

FG 18B Patent Application Publication Jul. 21, 2011 Sheet 44 of 50 US 2011/0177976 A1

3. 89

Present study Previously reported F.G. 19

Patent Application Publication Jul. 21, 2011 Sheet 46 of 50 US 2011/0177976 A1

4.60 8.83

4.38

8.8 a.S.

4.48 8

4,44

8. 4.48

4.36 , Eacteriees ificates acteroicietes irrigates getoires gencies gersonies gences A B

FIG 21 Patent Application Publication Jul. 21, 2011 Sheet 47 of 50 US 2011/0177976 A1

XE A orthologous groups 3% in :

39.3% of groups it 8 isosities

98.2% of sequent cess ratch groughs in 18 microbiotes

CA2yses

B 8.3% is :

33.2% of goigs is 3 tigristics

38% of settiecess statis fatiities is 8 tic: cysiciss FIG 22 Patent Application Publication Jul. 21, 2011 Sheet 48 of 50 US 2011/0177976 A1

C to thesiegeSis is grotags

28.3% of gregas i: 8 sistasises

93.2% of sex;eices saic grogs is 8 ticiaioiorites

variate crobiose

4 -8

o 50 160 150 200 250 Sequences (thousands FIG 22 Patent Application Publication Jul. 21, 2011 Sheet 49 of 50 US 2011/0177976 A1

Keiative axidarce (% of KEGG assignments KE36 category 4. 8 8 12 4 raise sightion &: '8siatio: 8.

8iosynthesis of Secessiciary 8ietaboites Reptication arc &epair &etaixoisix of & the Aasiks Acics Siyas: 8 issyrtisesis 3ric 8etaboist &ks. Caryothystrate 88taixoisa; & iigic: 8&iaisoist Biosyrtiesis of Polyketities

sei: grey; as sea: &etaixoisa; cf cofactors 88: Witaris Sergy &tassist Xe:3otiatics 8ixtiegraciatio: at 88taxis processies&etetic Protein irrier8388 83rties 8te:Betabais 3;xias 388x338s cassifies : 883:33& 3sats &:8: foicing, Sorting afic segradatio e:::::::::cesses 8-ci is s: Signating protein Faaiiies 8:338 rocesses as: Sigatig 33ciassified &igai assi:ctici poorly characterize: sciassified (Serietie aforestation. ocessig 33xiassified M &ei: 8&titity me k.k Sigrating 88ajecisies &g it&actic Patent Application Publication Jul. 21, 2011 Sheet 50 of 50 US 2011/0177976 A1

FIG 24

Variable microbiome Core microbiome X M Folate biosynthesis Depleted Enriched Fatty acid biosynthesis Memorane and intracellular structural molecules Biosynthesis of siderophore group nonribosomal peptides Methane metabolism ASCOrbate and aldarate metabolism Arginine and proline metabolism General function prediction only ... Bacterial chemotaxis FlagellarBacterial assemblymotility proteins Type III secretion system Other carbohydrate metabolism Non-enzyme Electron transfer carriers Function unknown nostol metabolism ; : Phosphotransferase system (PTS) Transcription factors Inorganic ion transport and metabolism Other energy metabolism Protein kinases wo-component system Other ion-coupled transporters Protein folding and associated processing Transporters Other enzymes Pyruvate/Oxoglutarate oxidoreductases Ubiquinone biosynthesis Peptidases Nitrogen metabolism ABC transporters Glyoxylate and dicarboxylate metabolism Fructose and mannose metabolism Ce division ycerolipid metabolism ethionine metabolisrn minosugars metabolism arbon fixation ycine, serine and threonine metabolism ycolysis Gluconeogenesis yanoamino acid metabolism steine metabolism Histidine metabolism DNA polymerase Glutamate metabolism Aminoacyl-tRNA biosynthesis Replication complex Pantothenate and CoA biosynthesis Lysine biosynthesis RNA polymerase Phenylalanine, tyrosine and tryptophan biosynthesis Alanine and aspartate metabolism Valine, leucine and isoleucine biosynthesis Ribosome Galactose metabolish N-Glycan degradation SERSESphingolipid metabolism biosynthesis - ganglioseries Transation factors Nucleotide sugars metabolism Tyrosine metabolism Purite metabolism Pyrimidine metabolism Protein export Peptidoglycan biosynthesis Pyruvate metabolism Pentose phosphate pathway Selenoamino acid metabolism Starch and sucrose metabolism US 2011/0177976 A1 Jul. 21, 2011

METHODS FOR PROMOTING WEIGHT 0007 Pharmacotherapeutic principles are limited. In addi LOSS AND ASSOCATED ARRAYS tion, because of undesirable side effects, the FDA has had to recall several obesity drugs from the market. Those that are CROSS REFERENCE TO RELATED approved also have side effects. Currently, two FDA-ap APPLICATIONS proved anti-obesity drugs are orlistat, a lipase inhibitor, and 0001. This application claims the priority of U.S. provi sibutramine, a serotonin reuptake inhibitor. Orlistat acts by sional application No. 61/076,887, filed Jun. 30, 2008, and blocking the absorption of fat into the body. An unpleasant provisional application No. 61/101.011, filed Sep. 29, 2008, side effect with orlistat, however, is the passage of undigested each of which is hereby incorporated by reference in its oily fat from the body. Sibutramine is an appetite Suppressant entirety. that acts by altering brain levels of serotonin. In the process, it also causes elevation of blood pressure and an increase in heart rate. Other appetite Suppressants, such as amphetamine GOVERNMENTAL RIGHTS derivatives, are highly addictive and have the potential for 0002 This invention was made in part with government abuse. Moreover, different subjects respond differently and support under grant DK078669 awarded by the National unpredictably to weight-loss medications. Institutes of Health. The government has certain rights in the 0008 Because surgical and pharmacotherapy treatments invention. are problematic, new non-cognitive strategies are needed to prevent and treat obesity and obesity-related disorders. FIELD OF THE INVENTION 0003. The present invention encompasses methods and SUMMARY OF THE INVENTION arrays associated with body fat and/or weight loss. 0009. One aspect of the present invention encompasses an array comprising a substrate. The Substrate has disposed REFERENCE TO SEQUENCE LISTING thereon at least one nucleic acid indicative of, or modulated in, an obese host microbiome compared to a lean host micro 0004. A paper copy of the sequence listing and a computer biome. Alternatively, the substrate has disposed thereon at readable form of the same sequence listing are appended least one nucleic acid indicative of, or modulated in, a lean below and herein incorporated by reference. Additionally, the host microbiome compared to an obese host microbiome. sequence listing filed with the provisional application is also 0010. Another aspect of the present invention encom hereby incorporated by reference. passes an array comprising a Substrate. The Substrate has disposed thereon at least one polypeptide indicative of, or BACKGROUND OF THE INVENTION modulated in, an obese host microbiome compared to a lean 0005 According to the Centers for Disease Control host microbiome. Alternatively, the substrate has disposed (CDC), over sixty percent of the United States population is thereon at least one polypeptide indicative of, or modulated overweight, and greater than thirty percent are obese. This in, a lean host microbiome compared to an obese host micro translates into more than 50 million adults in the United States biome. with a Body Mass Index (BMI) of 30 or above. Obesity is also 0011 Yet another aspect of the invention encompasses a a worldwide health problem with an estimated 500 million method for modulating body fat or for modulating weight loss overweight adult humans body mass index (BMI) of 25.0- in a Subject. The method typically comprises altering the 29.9 kg/m and 250 million obese adults (Bouchard, C microbiota population in the Subject's gastrointestinal tract (2000)N EnglJ Med. 343, 1888-9). This epidemic of obesity by modulating the relative abundance of Actinobacteria. In is leading to worldwide increases in the prevalence of obesity Some embodiments, the relative abundance is increased, in related disorders, such as diabetes, hypertension, cardiac other embodiments, the relative abundance is decreased. pathology, and non-alcoholic fatty liver disease (NAFLD: 0012 Still another aspect of the invention encompasses a Wanless, and Lentz (1990) Hepatology 12, 1106-1110. Sil composition. The composition usually comprises an antibi verman, et al. (1990). Am. J. Gastroenterol. 85, 1349-1355; otic having efficacy against Actinobacteria but not against Neuschwander-Tetri and, Caldwell (2003) Hepatology 37, Bacteroidetes; and a probiotic comprising Bacteroidetes. 1202-1219). According to the National Institute of Diabetes, 0013. Other aspects and iterations of the invention are Digestive and Kidney Diseases (NIDDK) approximately 280, described more thoroughly below. 000 deaths annually are directly related to obesity. The NIDDK further estimated that the direct cost of healthcare in REFERENCE TO COLOR FIGURES the U.S. associated with obesity is $51 billion. In addition, 0014. The application file contains at least one photograph Americans spend S33 billion per year on weight loss prod executed in color. Copies of this patent application publica ucts. In spite of this economic cost and consumer commit ment, the prevalence of obesity continues to rise at alarming tion with color photographs will be provided by the Office rates. From 1991 to 2000, obesity in the U.S. grew by 61%. upon request and payment of the necessary fee. 0006 Although the physiologic mechanisms that support development of obesity are complex, the medical consensus BRIEF DESCRIPTION OF THE FIGURES is that the root cause relates to an excess intake of calories 0015 FIG. 1 depicts the technical replicates (analyzed at compared to caloric expenditure. While the treatment seems four different sequencing centers) cluster. Fecal DNA quite intuitive, dieting is not an adequate long-term solution samples were split and sequenced separately at four different for most people; about 90 to 95 percent of persons who lose sequencing centers. Abbreviations: usc, Environmental weight Subsequently regain it. Although Surgical intervention Genomics Core Facility, University of South Carolina; ok, has had some measured success, the various types of surgeries Advanced Center for Genome Technology, University of have relatively high rates of morbidity and mortality. Oklahoma, ct; 454 Life Sciences Branford, Conn.; and ma, US 2011/0177976 A1 Jul. 21, 2011

Josephine Bay Paul Center, Marine Biological Laboratory, cant differences between related and unrelated individuals Woods Hole Mass. Unweighted UniFrac-based clustering Student's t-test with Monte Carlo (1,000 permutations); was performed on the combined dataset. Colored boxes *p-10s. enclose samples from the same individual (also indicated by (0019 FIG.5 depicts clustering of the fecal microbiotas of identical IDs followed by the number 1 or 2. The location of monozygotic (MZ) and dizygotic (DZ) twins and their moth the sequencing facility follows each sample ID.) Randomly ers sampled at the beginning of the study and two months selected sequences were analyzed (500 per replicate). FIGS. later. Unweighted UniFrac-based clustering. Colored boxes 1.1, 1.2, 1.3, 1.4, and 1.5 show details from FIG. 1. link samples from the same individual (also indicated by 0016 FIG. 2 depicts 16S rRNA gene surveys revealing identical IDs followed by the number 1 or 2). 34 of the familial similarity and reduced diversity of the gut microbiota individuals were only sampled once. 1,000 randomly V2/3 in obese individuals. (A) Comparison of the average UniFrac 16S rRNA gene sequences were analyzed per sample. FIGS. distance (a measure of differences in bacterial community 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6 show details from FIG. 5. structure) between individuals over time (self), twin-pairs, 0020 FIG. 6 depicts the relative abundance of the major twins and their mother, and unrelated individuals. Briefly, gut bacterial phyla across 120 gut samples obtained at two 1,000 sequences were randomly sampled from each V2/3 different timepoints. Fecal samples were collected at the ini dataset, OTUs were chosen, a UniFrac tree was built from tial and second timepoints (average interval between sample representative sequences, and random permutations were collection: 57+4 days). The relative abundance of the major done on the resulting UniFrac distance matrix. Asterisks indi gut bacterial phyla is based on analysis of V2/3 16S rRNA cate significant differences between the indicated categories gene sequences. Samples are organized based on the rank Student's t-test with Monte Carlo (1,000 permutations); order abundance of Firmicutes in the initial timepoint. *p-10-5; **p-10-14; ***p<10-41). (B)Evidence of reduced 0021 FIG. 7 depicts the number of shared phylotypes diversity in the fecal microbiota of obese individuals. Phylo (OTUs) as a function of the number of sequences per sample. genetic diversity curves were generated by randomly Sam 50-3,000 sequences were randomly selected from each sample, obtained from 93 different individuals. All sequences pling 1 to 10,000 sequences from each V6 16S rRNA dataset, were binned into -level phylotypes using a 97% and then calculating the total branch length leading to the identity threshold. Less stringent parameters were used for sampled sequences (meant95% CI shown). OTU binning at all levels of coverage to allow for analysis of 0017 FIG. 3 depicts 16S rRNA gene surveys revealing 3,000 sequences per sample (density cutoff-0.65, maximum evidence for familial aggregation and reduced diversity in the of 3000 nodes). obese gut microbiome. (A,B) Comparison of the average 0022 FIG. 8 depicts the validation of annotation param UniFrac distance (a measure of differences in bacterial com eters using control datasets. (A-C) Percent of randomly frag munity structure) between related and unrelated individuals. mented annotated genes (KEGG V44) assigned to the correct Briefly, 10,000 sequences were randomly sampled from each KEGG Orthologous group as a function of the (A) e-value, (B) V6 dataset (Panel A) and 200 sequences were randomly % identity, or (C) bit-score cutoffused. (D-F) Sensitivity true sampled from each full-length dataset (Panel B), OTUs were positives (TP) divided by true positives plus false negatives chosen, a UniFrac tree was built from representative (FN) as a function of the (D) e-value, (E) '% identity, or (F) sequences, and random permutations were done on the result bit-score cutoff used. (G-I) Precision true positives divided ing UniFrac distance matrix. Asterisks indicate significant by true positives plus false positives (FP) as a function of the differences between related and unrelated individuals Stu (G) e-value, (H) '% identity, or (I) bit-score cutoff used. The dent's t-test with Monte Carlo (1,000 permutations); p<0. Vertical gray line and circle indicates the cutoff values used in 001. (C.D) Phylogenetic diversity curves for the obese and this analysis. lean gut microbiome. Briefly, 1 to 1,000 sequences were 0023 FIG. 9 depicts the taxonomic profiles of microbial randomly sampled from each V2/3 dataset (Panel C), and 1 to gene content in the human gut (fecal) microbiome. Full 200 sequences were randomly sampled from each full-length length 16S sequences were obtained for each reference dataset (Panel D), and the average branch length leading to genome, likelihood parameters were determined using Mod the sampled sequences was calculated. (E.F) Rarefaction eltest, and a maximum-likelihood tree was generated using curves for the obese and lean fecal microbiota. Briefly, 1 to PAUP. Bootstrap values represent nodes found in >70 of 100 10,000 sequences were randomly sampled from each V6 repetitions. Branches and distributions are colored by phy dataset (Panel E), and 1 to 200 sequences were randomly lum: Bacteroidetes (orange), Firmicutes (blue), and Actino sampled from each full-length dataset (Panel F). The average (green). Proteobacteria (E. coli) and Archaea (M. number of OTUs in each sample was then calculated Smithii and M. Stadtmanae) are uncolored. The relative abun (meant-95% CI shown). dance of sequences homologous to each genome is depicted 0018 FIG. 4 depicts a graph illustrating the stratification on a scale of 0 to 30% (BLASTX comparisons of microbiome of related and unrelated individuals concordant for physi datasets to reference genomes). Sample ID nomenclature: ological states of obesity versus leanness confirms familial Family number, Twin number or mom, and BMI category similarity. (A,B) Comparison of the average UniFrac distance (Le-lean, Ov=overweight, Ob-obese; e.g. F1 T1Lestands for (a measure of differences in bacterial community structure) family 1, twin 1, lean). between related and unrelated individuals concordant for 0024 FIG. 10 depicts the assignment of fecal microbiome leanness (Panel A) or obesity (Panel B). Briefly, 1,000 reads to sequenced reference human gut-derived sequences were randomly sampled from each V2/3 dataset, Bacteroidetes and Firmicutes genomes. Histogram of the per OTUs were chosen, a UniFrac tree was built from represen cent identity (meantSEM) obtained from sequence align tative sequences, and random permutations were done on the ments between gut microbiome reads (n=18 datasets) and resulting UniFrac distance matrix. Asterisks indicate signifi Firmicutes or Bacteroidetes reference genomes. US 2011/0177976 A1 Jul. 21, 2011

0025 FIG. 11 depicts the percent identity plots of the fecal Bacteroidetes bins. Sequences from each of the 18 fecal microbiomes versus reference genomes. Each row C-axis) microbiomes were binned based on sequence homology to represents a different genome. The y-axis shows the percent the custom 42-member reference human gut genome data identity to microbiome sequences (red dots). The combined base. (A) The frequency of each KEGG pathway was tallied data from lean/overweight individuals are in the left column for each bin and significantly different pathways were iden while the combined data from obese individuals are displayed tified using a bootstrap re-sampling analysis (Xipe V2.4). in the right column. Supercontigs were used for draft Significantly different pathways reaching at least 0.6% rela genomes; the assembly version (V) can be found after the tive abundance in at least two microbiomes were clustered strain name. The lines found at 10% identity on each plot using single-linkage hierarchical clustering and the Pearson's depict the Sum of all sequences mapped across each genome. correlation distance metric. (B) The relative abundance of 0026 FIG. 12 depicts the dependence of percentage (A), CAZy families in the Bacteroidetes and Firmicutes sequence quality (B), and accuracy (C-D) of sequence assignments on bins. Asterisks indicate significant differences (Mann-Whit read-length. Two fecal samples were processed using extra ney test, p<0.0001). long read pyrosequencing (454 FLX Titanium kit; Samples 0031 FIG. 17 depicts the functional clustering of phylum TS28 and TS29). 10,000 sequences from the maximum of wide sequence bins and reference genomes from 36 human each read-length distribution (between 490 and 505 nt) were gut-derived Bacteroidetes and Firmicutes. The frequency of randomly selected from each sample. Simulated reads were each KEGG pathway in phylum-wide sequence bins, and in created by sampling the first 50-500 nt of each of these 10,000 10,000 simulated reads generated from each of the reference sequences, and each simulated read was compared using genomes (Readsim v0.10; ref. 56), was tallied and pathways NCBI-BLASTX against our custom gut genome database. reaching at least 0.6% relative abundance in at least two fecal Multiple BLAST thresholds were used (see key in panel A). microbiomes were clustered using principal components (A) Percent of sequences assigned to the reference genomes analysis (PCA). An average Firmicutes and Bacteroidetes as a function of read-length. (B) Average BLAST bit score as genome was generated by pooling all reads generated from a function of read-length. (C) Percent of gene assignments genomes within each phylum. (from the gut genome database) identical to full-length 0032 FIG. 18 depicts the comparison of taxonomic and sequence as a function of read-length. (D) Percent of group functional variations in the human gut microbiome. (A) Rela assignments (same assigned COG as the full-length tive abundance of major phyla across 18 fecal microbiomes sequence) as a function of read-length. from MZ twins and their mothers, based on BLASTX com 0027 FIG. 13 depicts the relative abundance of bacterial parisons of microbiomes and the NCBI non-redundant data phyla in 18 human gut microbiomes. (A-C) PCR-based 16S base. (B) Relative abundance of COG categories across each rRNA gene sequences (A) full-length, (B) V2/3 region, and sampled gut microbiome. (C) V6. (D-E) Microbiome data analyzed by BLAST com 0033 FIG. 19 depicts the relative abundance of KEGG parisons (D) NCBI non-redundant database and (E) a custom pathways and COG categories in the gut microbiomes of 18 42 gut genome database. (F) Analysis of 16S rRNA gene individuals (6 MZ twin pairs and their mothers), plus 9 pre fragments identified in each microbiome. (G) Correlation viously published adult microbiomes. Simulated reads were matrix based on all pairwise comparisons (R) of the relative generated from each of the 9 previously published micro abundance of the four major phyla (Actinobacteria, Firmic biomes datasets obtained by capillary sequencing to mimic utes, Bacteroidetes, and Proteobacteria) across all six meth pyrosequencing reads, then re-annotated using the KEGG ods. and STRING-extended COG databases. (A) The average 0028 FIG. 14 depicts the metabolic pathway-based clus relative abundance of KEGG pathways in MZ twin pairs and tering and analysis of the human gut microbiome of MZ their mothers graphed as a function of the average relative twins. (A) Metabolic pathways were tallied using the KEGG abundance of KEGG pathways in the 9 previously published database and annotation scheme. Functional profiles were adult gut microbiome datasets. (B) The distribution of COG clustered using a single-linkage hierarchical clustering with a categories across all 27 datasets. Pearson's distance metric. All pairwise comparisons were 0034 FIG. 20 depicts the relative abundance of COG cat made of the profiles by calculating each R value. (B) Alinear egories in 36 sequenced reference human gut-derived Firmi regression of the relative abundance of Bacteroidetes versus cutes and Bacteroidetes genomes. 10,000 simulated reads, the first principal component derived from a PCA analysis of generated from each of the reference genomes (Readsim KEGG metabolic profiles. (C) Comparisons of functional v0.10), were annotated using the STRING-extended COG similarity between twin pairs, between twins and their database. mother, and between unrelated individuals. Asterisks indicate 0035 FIG. 21 depicts the average functional diversity and significant differences (Student's t-test with Monte Carlo: evenness of simulated reads generated from reference p-0.01) and bars represent meant SEM. genomes from gut Firmicutes or Bacteroidetes. (A) Func 0029 FIG. 15 depicts the functional profiles of MZ fecal tional diversity was calculated in EstimateS (v8.0), based on microbiomes, based on the relative abundance of KEGG the abundance of each metabolic pathway across 10,000 pathways, which stabilize after ~20,000 sequences are col simulated reads generated from each of the 36 reference lected for a given sample. Datasets were randomly Sub genomes (Readsim v0.10). (B) Shannon evenness. Asterisks sampled between 500 and 25,000 sequences. The average indicate significant differences (Mann-Whitney test, p<0.01). functional similarity (R) between the subsampled dataset 0036 FIG. 22 depicts the enzyme-level functional and the full dataset is shown as a function of sequencing groups shared between all or a Subset of the sampled gut effort. microbiomes. Sequences from each of the 18 microbiomes 0030 FIG. 16 depicts the KEGG pathways and Carbohy characterized in this study were assigned to (A) KEGG drate Active Enzymes (CAZy) families whose representation groups, (B) CAZy families, and (C) STRING annotations. is significantly different between Firmicutes and Functional groups (inner circle), and the sequences assigned US 2011/0177976 A1 Jul. 21, 2011

to each group (outer circle) were then tallied based on their within the Bacteroidetes phylum (phylum is also known as a co-occurrence in any combination of 1 to 18 microbiomes. division) is increased and optionally, the relative abundance For example, the outer aqua-colored segment in Panel A of bacteria within the Actinobacteria and/or Firmicutes phy demonstrates that 96.2% of the total sequences generated lum is decreased. Alternatively, to increase energy harvesting, from all 18 samples were assigned to functional grouips that to increase body fat, or promote weight gain, the relative were common to all 18 microbiomes. (D) KEGG categories abundance of Bacteroidetes is decreased and optionally, the enriched or depleted in the core versus variable components relative abundance of Actinobacteria and/or Firmicutes is of the gut microbiome. Sequences from each of the 18 fecal increased. Additional agents may also be utilized to achieve microbiomes were binned into the “core or variable micro either weight loss or weight gain. Examples of these agents biome-based on the co-occurrence of KEGG Orthologous are detailed in section I(d). groups (core groups were found in all 18 microbiomes while (a) Altering the abundance of Bacteroides variable groups were present infewer (<18) microbiomes; see 0041. The relative abundance of Bacteroidetes may be FIG. 20A). General categories are shown. Asterisks indicate altered by increasing or decreasing the presence of one or significant differences (Student's t-test, p<0.05, *p-0.001, more Bacteroidetes species that reside in the gut. Addition ***p<10-5). ally, non-limiting examples of species may include B. thetaio 0037 FIG. 23 depicts the KEGG categories enriched or taOmicron, B. vulgatus, B. ovatus, P. distasonis, B. uniformis, depleted in the core versus variable components of the gut B. Stercoris, B. eggerthii, B. merdae, and B. caccae. In one microbiome. Sequences from each of the 18 fecal micro embodiment, the population of B. thetaiotaOmicron is altered. biomes were binned into the core or variable microbiome In still another embodiment, the population of B. vulgatus is based on the co-occurrence of KEGG Orthologous groups altered. In an additional embodiment, the population of B. (core groups were found in all 18 microbiomes while variable ovatus is altered. In another embodiment, the population of P groups were present in fewer (<18) microbiomes; see FIG. distasonis is altered. In yet another embodiment, the popula 20A). General categories are shown. Asterisks indicate sig tion of B. uniformis is altered. In an additional embodiment, nificant differences (Student's t-test, *p-0.05, *p-0.001, the population of B. Stercoris is altered. In a further embodi ***p<10-5). ment, the population of B. eggerthii is altered. In still another 0038 FIG. 24 depicts the clustering of pathways enriched embodiment, the population of B. merdae is altered. In or depleted in the core microbiome. Sequences from each of another embodiment, the population of B. caccae is altered. the 18 distal gut microbiomes were binned into the core or In a further embodiment, the species within the Bacteroidetes variable microbiome based on the co-occurrence of KEGG phylum may be as of yet unnamed. orthologous groups core groups were found in all 18 micro 0042. The present invention also includes altering various biomes while variable groups were present in fewer (<18) combinations of Bacteroidetes species, such as at least two microbiomes; see FIG. 20A). The frequency of each KEGG species, at least three species, at least four species, at least five pathway was tallied for each bin and significantly different species, at least six species, at least seven species, at least pathways were identified using a bootstrap re-sampling eight species, at least nine species, at least ten Bacteroidetes analysis (Xipe V2.4). Pathways significantly enriched (yel species, or more than ten species of Bacteroidetes. For low) or depleted (blue), reaching at least 0.6% relative abun example, the combination of B. thetaiotaomicron, B. vulga dance in at least two microbiomes, were clustered using tus, B. ovatus, P distasonis, and B. uniformis may be altered. single-linkage hierarchical clustering and the Pearson's cor 0043. In an exemplary embodiment, the relative abun relation distance metric. dance of Bacteroidetes is increased to decrease energy har vesting, decrease body fat, or promote weight loss in a Sub DETAILED DESCRIPTION OF THE INVENTION ject. Increased abundance of Bacteroidetes in the gut may be 0039. It has been discovered, as demonstrated in the accomplished by several Suitable means generally known in Examples, that there is a relationship between the human gut the art. In one embodiment, a food Supplement that increases microbiota and obesity. In particular, an obese human Subject the abundance of Bacteroidetes may be administered to the typically has fewer Bacteroidetes and more Actinobacteria Subject. By way of example, one such food Supplement is compared to a lean Subject. In some embodiments, an obese psyllium husks as described in U.S. Patent Application Pub human Subject has proportionately fewer Bacteroidetes and lication No. 2006/0229905, which is hereby incorporated by more Actinobacteria and Firmicutes compared to a lean Sub reference in its entirety. In an exemplary embodiment, a pro ject. Taking advantage of these discoveries, the present inven biotic comprising one or more Bacteroidetes species or tion provides compositions and methods to regulate energy strains may be administered to the subject. The amount of balance in a Subject. In particular, the invention provides probiotic administered to the subject can and will vary nucleic acid sequences that are associated with obesity in depending upon the embodiment. The probiotic may com humans. These sequences may be used as diagnostic or prog prise from about one thousand to about ten billion cfu/g nostic biomarkers for obesity risk, biomarkers for drug dis (colony forming units per gram) of the total composition, or covery, biomarkers for the discovery of therapeutic targets of the part of the composition comprising the probiotic. In one embodiment, the probiotic may comprise from about one involved in the regulation of energy balance, and biomarkers hundred million to about 10 billion organisms. The probiotic for the efficacy of a weight loss program. microorganism may be in any suitable form, for example in a powdered dry form. In addition, the probiotic microorganism I. Modulation of Energy Balance in a Subject may have undergone processing in order for it to increase its 0040. The energy balance of a subject may be modulated Survival. For example, the microorganism may be coated or by altering the Subject's gut microbiota population. Generally encapsulated in a polysaccharide, fat, starch, protein or in a speaking, to decrease energy harvesting, decrease body fat, or Sugar matrix. Standard encapsulation techniques known in promote weight loss, the relative abundance of bacteria the art can be used. For example, techniques discussed in U.S. US 2011/0177976 A1 Jul. 21, 2011

Pat. No. 6,190,591, which is hereby incorporated by refer of Actinobacteria in the gut may be accomplished by several ence in its entirety, may be used. Suitable means generally known in the art. In an exemplary 0044 Alternatively, the relative abundance of embodiment, a probiotic comprising one or more Actinobac Bacteroidetes is decreased to increase energy harvesting, teria Strains or species may be administered to the Subject. increase body fat, or promote weight gain in a Subject. 0050. It is contemplated that the abundance of gut Actino Decreased abundance of Bacteroidetes in the gut may be bacteria may be altered (i.e., increased or decreased) from accomplished by several Suitable means generally known in about a couple fold difference to about a hundred fold differ the art. In one embodiment, an antibiotic having efficacy ence or more, depending on the desired result (i.e., increased against Bacteroidetes may be administered. Generally speak energy harvesting (weight gain) or decreased energy harvest ing, antimicrobial agents may target several areas of bacterial ing (weight loss)). A method for determining the relative physiology: protein translation, nucleic acid synthesis, cell abundance of gut Actinobacteria is described in the examples. wall synthesis or potentially, the polysaccharide acquisition 0051 Stated another way, it is contemplated that the abun machinery. In an exemplary embodiment, the antibiotic will dance of gut Actinobacteria may be altered (i.e., increased or have efficacy against Bacteriodetes but not against Firmic decreased) from about 1% to about 100% or more depending utes. The susceptibility of the targeted species to the selected on the desired result (i.e., increased energy harvesting antibiotics may be determined based on culture methods or (weight gain) or decreased energy harvesting (weight loss)). genome screening. For weight loss, the abundance may be altered by a decrease 0045. It is contemplated that the abundance of gut of from about 20% to about 100%, from about 30% to about Bacteroidetes within an individual subject may be altered 100%, from about 40% to about 100%, from about 50% to (i.e., increased or decreased) from about a couple fold differ about 100%, from about 60% to about 100%, from about 70% ence to about a hundred fold difference or more, depending on to about 100%, from about 80% to about 100%, or from about the desired result (i.e., increased energy harvesting (weight 90% to 100%. A method for determining the relative abun gain) or decreased energy harvesting (weight loss)) and the dance of gut Actinobacteria is described in the examples. individual subject. A method for determining the relative abundance of gut Bacteroidetes is described in the examples, (c) Altering the Abundance of Firmicutes alternatively, an array of the invention, described below, may 0.052 The relative abundance of Firmicutes may be altered be used to determine the relative abundance. by increasing or decreasing the presence of one or more 0046 Stated another way, it is contemplated that the abun species that reside in the gut. Representative species include dance of gut Bacteroidetes within an individual Subject may species from , Bacilli, and Mollicutes. In one be altered (i.e., increased or decreased) from about 1% to embodiment, the relative abundance of one or more Clostridia about 100% or more depending on the desired result (i.e., species is altered. In another embodiment, the relative abun increased energy harvesting (weight gain) or decreased dance of one or more Bacilli species is altered. In yet another energy harvesting (weight loss)) and the individual Subject. embodiment, the relative abundance of one or more Molli For weight loss, the abundance may be altered by an increase cutes species is altered. It is also contemplated that the rela of from about 20% to about 100%, from about 30% to about tive abundance of several species of Firmicutes may be 100%, from about 40% to about 100%, from about 50% to altered without departing from the scope of the invention. By about 100%, from about 60% to about 100%, from about 70% way of non-limiting examples, a combination of one or more to about 100%, from about 80% to about 100%, or from about Clostridia species, one or more Bacilli species, and one or 90% to 100%. A method for determining the relative abun more Mollicutes species may be altered. In a further embodi dance of gut Bacteroidetes is described in the examples, alter ment, the species within the Firmicutes phylum may be as of natively, an array of the invention, described below, may be yet unnamed. used to determine the relative abundance. 0053. In some embodiments, the Mollicutes class is altered. For instance, E. dolichum, E. cylindroides, E. (b) Altering the Abundance of Actinobacteria biforme, or C. innocuum may be altered. In one embodiment, 0047. The relative abundance of Actinobacteria may be the species of the Mollicutes class may posses the genetic altered by increasing or decreasing the presence of one or information to create a cell wall. In another embodiment, the more species that reside in the gut. Representative, non-lim species of the Mollicutes class may produce a cell wall. In a iting species include B. longum, B. breve, B. catenulatum, B. further embodiment, the species within the class Mollicutes dentium, B. gallicum, B. pseudocatenulatum, C. aerofaciens, may be as of yet unnamed. C. Stercoris, C. intestinalis, and S. variabile. 0054. In an exemplary embodiment, the relative abun 0048. In an exemplary embodiment, the relative abun dance of Firmicutes is decreased to decrease energy harvest dance of Actinobacteria is decreased to decrease energy har ing, decrease body fat, or promote weight loss in a Subject. vesting, decrease body fat, or promote weight loss in a Sub Decreased abundance of Firmicutes in the gut may be accom ject. Decreased abundance of Actinobacteria in the gut may plished by several Suitable means generally known in the art. be accomplished by several Suitable means generally known In one embodiment, an antibiotic having efficacy against in the art. In one embodiment, an antibiotic having efficacy Firmicutes may be administered. In an exemplary embodi against Actinobacteria may be administered. In an exemplary ment, the antibiotic will have efficacy against Firmicutes but embodiment, the antibiotic will have efficacy against Actino not against Bacteriodetes. In another exemplary embodiment, bacteria but not against Bacteriodetes. The susceptibility of the antibiotic will have efficacy against Mollicutes, but not the targeted species to the selected antibiotics may be deter Bacteriodetes. The susceptibility of the targeted species to the mined based on culture methods or genome screening. selected antibiotics may be determined based on culture 0049. Alternatively, the relative abundance of Actinobac methods or genome screening. teria is increased to increase energy harvesting, increase body 0055 Alternatively, the relative abundance of Firmicutes fat, or promote weight gain in a Subject. Increased abundance is increased to increase energy harvesting, increase body fat, US 2011/0177976 A1 Jul. 21, 2011 or promote weight gain in a subject. Increased abundance of amount of a Fiaf polypeptide or the activity of a Fiaf polypep Firmicutes in the gut may be accomplished by several suitable tide. Typically, a suitable Fiaf polypeptide is one that can means generally known in the art. In an exemplary embodi substantially inhibit LPL when administered to the subject. ment, a probiotic comprising Firmicutes may be administered Several Fiaf polypeptides known in the art are suitable for use to the subject. in the present invention. Generally speaking, the Fiaf 0056. It is contemplated that the abundance of gut Firmi polypeptide is from a mammal. By way of non-limiting cutes may be altered (i.e., increased or decreased) from about example, Suitable Fiaf polypeptides and nucleotides are a about a couple fold difference to about a hundred fold delineated in Table A. difference or more, depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased TABLE A energy harvesting (weight loss)). A method for determining the relative abundance of gut Firmicutes is described in the Species PubMed Ref. examples. Homo sapiens NM 139314 0057 Stated another way, it is contemplated that the abun NM O16109 Mits musculus NM 020581 dance of gut Firmicutes may be altered (i.e., increased or Ratti is norvegicus NM 1991.15 decreased) from about 1% to about 100% or more depending Suisscrofa AY307772 on the desired result (i.e., increased energy harvesting BoStatiris AY192008 (weight gain) or decreased energy harvesting (weight loss)). Pan troglodytes AY411895 For weight loss, the abundance may be altered by a decrease of from about 20% to about 100%, from about 30% to about 0061. In certain aspects, a polypeptide that is a homolog, 100%, from about 40% to about 100%, from about 50% to ortholog, mimic or degenerative variant of a Fiafpolypeptide about 100%, from about 60% to about 100%, from about 70% is also suitable for use in the present invention. In particular, to about 100%, from about 80% to about 100%, or from about the subject polypeptide will typically inhibit LPL when 90% to 100%. A method for determining the relative abun administered to the subject. A variety of methods may be dance of gut Firmicutes is described in the examples. employed to determine whether a particular homolog, mimic or degenerative variant possesses Substantially similar bio (d) Additional Weight Modulating Agents logical activity relative to a Fiaf polypeptide. Specific activity 0058 Another aspect of the invention encompasses a com or function may be determined by convenient in vitro, cell bination therapy to regulate fat storage, energy harvesting, based, or in vivo assays, such as measurement of LPL activity and/or weight loss or gain in a Subject. In an exemplary in white adipose tissue. In order to determine whether a embodiment, a combination for decreasing energy harvest particular Fiaf polypeptide inhibits LPL, the procedure ing, decreasing body fat or for promoting weight loss is detailed in the examples of U.S. Patent Application No. provided. For this embodiment, a composition comprising an 20050239706, which is hereby incorporated by reference in antibiotic having efficacy against Firmicutes and/or Actino its entirety, may be followed. bacteria but not against Bacteroidetes; and a probiotic com 0062 Fiaf polypeptides suitable for use in the invention prising Bacteroidetes may be administered to the Subject. are typically isolated or pure and are generally administered Additionally, an anti-archaeal compound may be included in as a composition in conjunction with a Suitable pharmaceu the aforementioned composition to reduce the representation tical carrier, as detailed below. A pure polypeptide constitutes of gut methanogens and the efficiency of methanogenesis, at least about 90%, preferably, 95% and even more preferably, thereby reducing the efficiency of fermentation of dietary at least about 99% by weight of the total polypeptide in a polysaccharides by saccharolytic bacteria, such as given sample. Bacteroidetes. Other agents that may be included with the 0063. The Fiaf polypeptide may be synthesized, produced aforementioned composition are detailed below. by recombinant technology, or purified from cells using any 0059. The compositions utilized in this invention may be of the molecular and biochemical methods known in the art administered by any number of routes including, but not that are available for biochemical synthesis, molecular limited to, oral, intravenous, intramuscular, intra-arterial, expression and purification of the Fiaf polypeptides see e.g., intramedullary, intrathecal, intraventricular, pulmonary, Molecular Cloning, A Laboratory Manual (Sambrook, et al. transdermal. Subcutaneous, intraperitoneal, intranasal, Cold Spring Harbor Laboratory), Current Protocols in enteral, topical, Sublingual, or rectal means. The actual effec Molecular Biology (Eds. Ausubel, et al., Greene Publ. Assoc. tive amounts of compounds comprising a weight loss com Wiley-Interscience, New York). position of the invention can and will vary according to the 0064. The invention also contemplates use of an agent that specific compounds being utilized, the mode of administra increases Fiaf transcription or its activity. For example, an tion, and the age, weight and condition of the Subject. Dos agent may be delivered that specifically activates Fiaf expres ages for a particular individual Subject can be determined by sion: this agent may be a natural or synthetic compound that one of ordinary skill in the art using conventional consider directly activates Fiaf gene transcription, or indirectly acti ations. Those skilled in the art will appreciate that dosages Vates expression through interactions with components of may also be determined with guidance from Goodman & host regulatory networks that control Fiaf transcription. Suit Gilman's The Pharmacological Basis of Therapeutics, Ninth able agents may be identified by methods generally known in Edition (1996), Appendix II, pp. 1707-1711 and from Good the art, such as by Screening natural product and/or chemical man & Gilman's The Pharmacological Basis of Therapeutics, libraries using the gnotobiotic Zebrafish model described in Tenth Edition (2001), Appendix II, pp. 475-493. the examples of U.S. Patent Application No. 20050239706. In i. Fiaf Polypeptide another embodiment, a chemical entity may be used that 0060 A composition of the invention for promoting interacts with Fiaf targets, such as LPL, to reproduce the weight loss may optionally include either increasing the effects of Fiaf (e.g., in this case inhibition of LPL activity). In US 2011/0177976 A1 Jul. 21, 2011 an alternative of this embodiment, administering a Fiafago decreased. By way of non-limiting example, the presence of nist to the Subject may increase Fiaf expression and/or activ Methanobrevibacter Smithii and Methanosphaera stadtma ity. In one embodiment, the Fiaf agonist is a peroxisome nae is decreased. proliferator-activated receptor (PPARs) agonist. Suitable 0069. To decrease the presence of any of the archaeon PPARs include PPARC, PPARE/8, and PPARy. Fenofibrate is detailed above, methods generally known in the art may be another Suitable example of a Fiafagonist. Additional Suit utilized. In one embodiment, a compound having anti-micro able Fiafagonists and methods of administration are further bial activities against the archaeon is administered to the described in Manards, et al., J. Biol Chem, 279,34411 (2004), Subject. Non-limiting examples of Suitable anti-microbial and U.S. Patent Publication No. 2003/0220373, which are compounds include metronidZaole, clindamycin, timidazole, both hereby incorporated by reference in their entirety. macrollides, and fluoroquinolones. In another embodiment, a ii. Other Compounds compound that inhibits methanogenesis by the archaeon is administered to the Subject. Non-limiting examples include 0065. The compositions of the invention that decrease 2-bromoethanesulfonate (inhibitor of methyl-coenzyme M energy harvesting, decrease body fat, or promote weight loss reductase), N-alkyl derivatives of para-aminobenzoic acid may also include several additional agents suitable for use in (inhibitor of tetrahydromethanopterin biosynthesis), iono weight loss regimes. Generally speaking, exemplary combi phore monensin, nitroethane, lumazine, propynoic acid and nations of therapeutic agents may act synergistically to ethyl 2-butynoate. In yet another embodiment, a hydroxym decrease energy harvesting, decrease body fat, or promote ethylglutaryl-CoA reductase inhibitor is administered to the weight loss. Using this approach, one may be able to achieve subject. Non-limiting examples of suitable hydroxymethyl therapeutic efficacy with lower dosages of each agent, thus glutaryl-CoA reductase inhibitors include lovastatin, atorv reducing the potential for adverse side effects. In one embodi astatin, fluvastatin, pravastatin, simvastatin, and rosuvastatin. ment, acarbose may be administered with a composition of Alternatively, the diet of the subject may be formulated by the invention. Acarbose is an inhibitor of C-glucosidases and changing the composition of glycans (e.g., polyfructose-con is required to break down carbohydrates into simple Sugars taining oligosaccharides) in the diet that are preferred by within the gastrointestinal tract of the subject. In another polysaccharide degrading bacterial components of the micro embodiment, an appetite Suppressant, such as an amphet biota (e.g., Bacteroides spp) when in the presence of meso amine, or a selective serotonin reuptake inhibitor, such as philic methanogenic archaeal species such as Methanobrevi Sibutramine, may be administered with a composition of the bacter Smithii. invention. In still another embodiment, a lipase inhibitor such 0070 Generally speaking, when the archaeal population as orlistat, or an inhibitor of lipid absorption such as Xenical, in the Subject's gastrointestinal tract is decreased in accor may be administered with a composition of the invention. dance with the methods described above, the polysaccharide iii. Restricted Calorie Diet degrading properties of the Subject's gastrointestinal micro 0066. Optionally, in addition to administration of a com biota is altered such that microbial-mediated carbohydrate position of the invention for weight loss, a Subject may also be metabolism or its efficiency is decreased. Typically, depend placed on a restricted calorie diet. Restricted calorie diets ing upon the embodiment, the transcriptome and the metabo maybe helpful for increasing the relative abundance of lome of the gastrointestinal microbiota is altered. In one Bacteroidetes and decreasing the relative abundance of Fir embodiment, the microbe is a saccharolytic bacterium. In one micutes and/or Actinobacteria. Several restricted calorie diets alternative of this embodiment, the saccharolytic bacterium is known in the art are suitable for use in combination with the a Bacteroides species. In a further alternative embodiment, compositions of the invention. Representative diets include a the bacterium is Bacteroides thetaiotaOmicron. Typically, the reduced fat diet, reduced protein, or a reduced carbohydrate carbohydrate will be a plant polysaccharide or dietary fiber. diet. Plant polysaccharides may include starch, fructan, cellulose, iv. Alteration of the Gastrointestinal Archaeon Population hemicellulose, and pectin. 0067. An anti-archaeal compound may be included in a 0071. The compounds utilized in this invention to alter the composition of the invention to decrease energy harvesting, archaeon population may be administered by any number of decrease fat storage, and/or decrease weight gain. To promote routes including, but not limited to, oral, intravenous, intra weight loss in a Subject, the gutarchaeon population is altered muscular, intra-arterial, intramedullary, intrathecal, intraven such that microbial-mediated carbohydrate metabolism or its tricular, pulmonary, transdermal, Subcutaneous, intraperito efficiency is decreased in the Subject, whereby decreasing neal, intranasal, enteral, topical, Sublingual, or rectal means. microbial-mediated carbohydrate metabolism or its effi 0072 The actual effective amounts of compound ciency promotes weight loss in the Subject. described herein can and will vary according to the specific 0068 Accordingly, in one embodiment, the subject's gas composition being utilized, the mode of administration and trointestinal archaeal population is altered so as to promote the age, weight and condition of the Subject. Dosages for a weight loss in the Subject. Typically, the presence of at least particular individual subject can be determined by one of one genera of archaeon that resides in the gastrointestinal ordinary skill in the art using conventional considerations. tract of the subject is decreased. In most embodiments, the Those skilled in the art will appreciate that dosages may also archaeon is generally a mesophilic methanogenic archaea. In be determined with guidance from Goodman & Gilman's The one alternative of this embodiment, the presence of at least Pharmacological Basis of Therapeutics, Ninth Edition one species from the genera Methanobrevibacter or Metha (1996), Appendix II, pp. 1707-1711 and from Goodman & nosphaera is decreased. In another alternative embodiment, Gilman's The Pharmacological Basis of Therapeutics, Tenth the presence of Methanobrevibacter Smithii is decreased. In Edition (2001), Appendix II, pp. 475-493. still another embodiment, the presence of Methanosphaera 0073. By way of non-limiting example, weight loss may stadtmanae is decreased. In yet another embodiment, the be promoted by administering an HMG-CoA reductase presence of a combination of archaeon genera or species is inhibitor to a subject. In an exemplary embodiment, the US 2011/0177976 A1 Jul. 21, 2011

inhibitor will selectively inhibit the HMG-CoA reductase 0077. As described above, an HMG-CoA reductase expressed by M. Smithii and not the HMG-CoA reductase inhibitor may be specific for the M. Smithii enzyme, or for the expressed by the Subject. In another embodiment, a second Subject's enzyme, depending, in part, on the selectivity of the HMG CoA-reductase inhibitor may be administered that particular inhibitor and the area the inhibitor is targeted for selectively inhibits the HMGCoA-reductase expressed by the release in the subject. For example, an inhibitor may be tar subject in lieu of the HMG-CoA reductase expressed by M. geted for release in the upper portion of the gastrointestinal Smithii. In yet another embodiment, an HMG-CoA reductase tract of a subject to substantially inhibit the subject's enzyme. inhibitor that selectively inhibits the HMG-CoA reductase In contrast, the inhibitor may be targeted for release in the expressed by the Subject may be administered in combination lower portion of the gastrointestinal tract of a Subject, i.e., with an HMG-CoA reductase inhibitor that selectively inhib where M. Smithii resides, then the inhibitor may substantially its the HMG-CoA reducase expressed by M. Smithii. One inhibit M. Smithii's enzyme. means that may be utilized to achieve such selectivity is via 0078. In order to selectively control the release of an the use of time-release formulations as discussed below or by inhibitor to a particular region of the gastrointestinal tract for otherwise altering the properties of the compounds so that release, the pharmaceutical compositions of the invention they will not, or will, be efficiently absorbed from the gas may be manufactured into one or several dosage forms for the trointestinal tract. Alternatively, the compound that selec controlled, sustained or timed release of one or more of the tively inhibits the HMG-CoA reductase expressed by M. ingredients. In this context, typically one or more of the Smithii may be poorly absorbed by gastrointestinal tract of the ingredients forming the pharmaceutical composition is subject. Compounds that inhibit HMG-CoA reductase are microencapsulated or dry coated prior to being formulated well known in the art. For instance, non-limiting examples into one of the above forms. By varying the amount and type include atorvastatin, pravastatin, rosuvastatin, and other of coating and its thickness, the timing and location of release statins. of a given ingredient or several ingredients (in either the same 0074 These compounds, for example HMG-CoA reduc dosage form, such as a multi-layered capsule, or different tase inhibitors, may be formulated into pharmaceutical com dosage forms) may be varied. positions and administered to Subjects to promote weight 0079. In an exemplary embodiment, the coating may bean loss. According to the present invention, a pharmaceutical enteric coating. The enteric coating generally will provide for composition includes, but is not limited to, pharmaceutically controlled release of the ingredient, such that drug release can acceptable salts, esters, salts of Such esters, or any other be accomplished at Some generally predictable location in the adduct orderivative which upon administration to a subject in lower intestinal tract below the point at which drug release need is capable of providing, directly or indirectly, a compo would occur without the enteric coating. In certain embodi sition as otherwise described herein, or a metabolite or resi ments, multiple enteric coatings may be utilized. Multiple due thereof, e.g., a prodrug. enteric coatings, in certain embodiments, may be selected to 0075. The pharmaceutical compositions maybe adminis release the ingredient or combination of ingredients at various tered by several different means that will deliver a therapeu regions in the lower gastrointestinal tract and at various times. tically effective dose. Such compositions can be administered 0080. As will be appreciated by a skilled artisan, the orally, parenterally, by inhalation spray, rectally, intrader encapsulation or coating method can and will vary depending mally, intracisternally, intraperitoneally, transdermally, upon the ingredients used to form the pharmaceutical com bucally, as an oral or nasal spray, or topically (i.e. powders, position and coating, and the desired physical characteristics ointments or drops) in dosage unit formulations containing of the microcapsules themselves. Additionally, more than one conventional nontoxic pharmaceutically acceptable carriers, encapsulation method may be employed so as to create a adjuvants, and vehicles as desired. Topical administration multi-layered microcapsule, or the same encapsulation may also involve the use of transdermal administration Such method may be employed sequentially so as to create a multi as transdermal patches or iontophoresis devices. The term layered microcapsule. Suitable methods of microencapsula parenteral as used herein includes Subcutaneous, intravenous, tion may include spray drying, spinning disk encapsulation intramuscular, or intrasternal injection, or infusion tech (also known as rotational Suspension separation encapsula niques. In an exemplary embodiment, the pharmaceutical tion), Supercritical fluid encapsulation, air Suspension composition will be administered in an oral dosage form. microencapsulation, fluidized bed encapsulation, spray cool Formulation of drugs is discussed in, for example, Hoover, ing/chilling (including matrix encapsulation), extrusion John E., Remington's Pharmaceutical Sciences, Mack Pub encapsulation, centrifugal extrusion, coacervation, alginate lishing Co., Easton, Pa. (1975), and Liberman, H. A. and beads, liposome encapsulation, inclusion encapsulation, col Lachman, L., Eds. Pharmaceutical Dosage Forms, Marcel loidosome encapsulation, Sol-gel microencapsulation, and Decker, New York, N.Y. (1980). other methods of microencapsulation known in the art. 0076. The amount of an HMG-CoA reductase inhibitor Detailed information concerning materials, equipment and that constitutes an “effective amount can and will vary. The processes for preparing coated dosage forms may be found in amount will depend upon a variety of factors, including Pharmaceutical Dosage Forms: Tablets, eds. Lieberman et al. whether the administration is in single or multiple doses, and (New York: Marcel Dekker, Inc., 1989), and in Ansel et al., individual Subject parameters including age, physical condi Pharmaceutical Dosage Forms and Drug Delivery Systems, tion, size, and weight. Those skilled in the art will appreciate 6th Ed. (Media, Pa.. Williams & Wilkins, 1995). that dosages may also be determined with guidance from Goodman & Goldman's The Pharmacological Basis of II. Biomarkers Comprising the Gut Microbiome Therapeutics, Ninth Edition (1996), Appendix II, pp. 1707 I0081. Another aspect of the invention encompasses use of 1711 and from Goodman & Goldman's The Pharmacological the gut microbiome as a biomarker for obesity. The biomarker Basis of Therapeutics, Tenth Edition (2001), Appendix II, pp. may be utilized to construct arrays that may be used for 475-493. several applications including as a diagnostic or prognostic US 2011/0177976 A1 Jul. 21, 2011

tool to determine obesity risk, judge the efficacy of existing 0086. In one embodiment, the biomolecule or biomol weight loss regimes, aid in drug discovery, identify additional ecules attached to the Substrate are located at a spatially biomarkers involved in obesity oran obesity related disorder, defined address of the array. Arrays may comprise from about and aid in the discovery of therapeutic targets involved in the 1 to about several hundred thousand addresses or more. In one regulation of energy balance, including but not limited to embodiment, the array may be comprised of less than 10,000 those that may directly affect the composition of the gut addresses. In another alternative embodiment, the array may microbiome. Generally speaking, the array may comprise be comprised of at least 10,000 addresses. In yet another biomolecules modulated in an obese host microbiome or a alternative embodiment, the array may be comprised of less lean host microbiome. than 5,000 addresses. In still another alternative embodiment, the array may be comprised of at least 5,000 addresses. In a (a) Array further embodiment, the array may be comprised of less than 0082. The array may be comprised of a substrate having 500 addresses. In yet a further embodiment, the array may be disposed thereon at least one biomolecule that is modulated in comprised of at least 500 addresses. an obese host microbiome compared to a lean host micro I0087. A biomolecule may be represented more than once biome. Several substrates suitable for the construction of on a given array. In other words, more than one address of an arrays are known in the art, and one skilled in the art will array may be comprised of the same biomolecule. In some appreciate that other Substrates may become available as the embodiments, two, three, or more than three addresses of the art progresses. The Substrate may be a material that may be array may be comprised of the same biomolecule. In certain modified to contain discrete individual sites appropriate for embodiments, the array may comprise control biomolecules the attachment or association of the biomolecules and is ame and/or control addresses. The controls may be internal con nable to at least one detection method. Non-limiting trols, positive controls, negative controls, or background con examples of Substrate materials include glass, modified or trols. functionalized glass, plastics (including acrylics, polystyrene I0088. The array may be comprised of biomolecules and copolymers of styrene and other materials, polypropy indicative of an obese host microbiome (e.g. the nucleic acid lene, polyethylene, polybutylene, polyurethanes, Teflon.J. sequences listed in Table 13). Alternatively, the array may be etc.), nylon or nitrocellulose, polysaccharides, nylon, resins, comprised of biomolecules indicative of a lean host micro silica or silica-based materials including silicon and modified biome (e.g. the nucleic acid sequences listed in Table 14). A silicon, carbon, metals, inorganic glasses and plastics. In an biomolecule is “indicative' of an obese or lean microbiome if exemplary embodiment, the substrates may allow optical it tends to appear more often in one type of microbiome detection without appreciably fluorescing. compared to the other. Additionally, the array may be com 0083. A substrate may be planar, a substrate may be a well, prised of biomolecules that are modulated in the obese host i.e. a 364 well plate, or alternatively, a substrate may be a microbiome compared to the lean host microbiome. As used bead. Additionally, the substrate may be the inner surface of a herein, "modulated may refer to a biomolecule whose rep tube for flow-through sample analysis to minimize sample resentation or activity is different in an obese host micro volume. Similarly, the substrate may be flexible, such as a biome compared to a lean host microbiome. For instance, flexible foam, including closed cell foams made of particular modulated may refer to a biomolecule that is enriched, plastics. depleted, up-regulated, down-regulated, degraded, or stabi 0084. The biomolecule or biomolecules may be attached lized in the obese host microbiome compared to a lean host to the substrate in a wide variety of ways, as will be appreci microbiome. In one embodiment, the array may be comprised ated by those in the art. The biomolecule may either be syn of a biomolecule enriched in the obese host microbiome thesized first, with subsequent attachment to the substrate, or compared to the lean host microbiome. In another embodi may be directly synthesized on the substrate. The substrate ment, the array may be comprised of a biomolecule depleted and the biomolecule may be derivatized with chemical func in the obese host microbiome compared to the lean host tional groups for Subsequent attachment of the two. For microbiome. In yet another embodiment, the array may be example, the substrate may be derivatized with a chemical comprised of a biomolecule up-regulated in the obese host functional group including, but not limited to, amino groups, microbiome compared to the lean host microbiome. In still carboxyl groups, oxo groups or thiol groups. Using these another embodiment, the array may be comprised of a bio functional groups, the biomolecule may be attached using molecule down-regulated in the obese host microbiomecom functional groups on the biomolecule either directly or indi pared to the lean host microbiome. In still yet another rectly using linkers. embodiment, the array may be comprised of a biomolecule 0085. The biomolecule may also be attached to the sub degraded in the obese host microbiome compared to the lean strate non-covalently. For example, a biotinylated biomol host microbiome. In an alternative embodiment, the array ecule can be prepared, which may bind to Surfaces covalently may be comprised of a biomolecule stabilized in the obese coated with Streptavidin, resulting in attachment. Alterna host microbiome compared to the lean host microbiome. tively, a biomolecule or biomolecules may be synthesized on I0089 Generally speaking, an array of the invention may the Surface using techniques such as photopolymerization comprise at least one biomolecule indicative of, or modulated and photolithography. Additional methods of attaching bio in, an obese host microbiome compared to a lean host micro molecules to arrays and methods of synthesizing biomol biome. In one embodiment, the array may comprise at least 5, ecules on substrates are well known in the art, i.e. VLSIPS 10, 15, 20, 25, 30, 35,40, 45,50,55, 60, 65,70, 75,80, 85,90, technology from Affymetrix (e.g., see U.S. Pat. No. 6,566, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 495, and Rockett and Dix, “DNA arrays: technology, options 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, and toxicological applications.' Xenobiotica 30(2): 155-177, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 275, 280, all of which are hereby incorporated by reference in their 285,290, 295,300, 305,310,315, 320, 325, 330, 335,340, entirety). 345,350,355,360,365,370,375, 380,385, 390,395, or 400 US 2011/0177976 A1 Jul. 21, 2011

biomolecules indicative of, or modulated in, an obese host 0.095 The arrays may be utilized in several suitable appli microbiome compared to a lean host microbiome. In another cations. For example, the arrays may be used in methods for embodiment, the array may comprise at least 200, at least detecting association between two or more biomolecules. 300, at least 400, at least 500, at least 600, at least 700, at least This method typically comprises incubating a sample with 800, or at least 900 biomolecules indicative of, or modulated the array under conditions such that the biomolecules com in, an obese host microbiome compared to a lean host micro prising the sample may associate with the biomolecules biome. attached to the array. The association is then detected, using 0090. As used herein, “biomolecule' may refer to a means commonly known in the art, Such as fluorescence. nucleic acid, an oligonucleic acid, an amino acid, a peptide, a “Association,” as used in this context, may refer to hybridiza polypeptide, a protein, a lipid, a carbohydrate, a metabolite, tion, covalent binding, or ionic binding. A skilled artisan will or a fragment thereof. Nucleic acids may include RNA, DNA, appreciate that conditions under which association may occur and naturally occurring or synthetically created derivatives. A will vary depending on the biomolecules, the Substrate, and biomolecule may be present in, produced by, or modified by the detection method utilized. As such, suitable conditions a microorganism within the gut. may have to be optimized for each individual array created. 0091. In one embodiment, the biomolecules of the array 0096. In yet another embodiment, the array may be used as may be selected from the biomolecules listed in Table 13. For a tool in a method to determine whether a compound has instance, the biomolecules of the array may be selected from efficacy for treatment of obesity oran obesity-related disorder the group comprising nucleic acids corresponding to SEQID in a host. Alternatively, the array may be used as a tool in a NO:1 through SEQID NO:273. In another embodiment, the method to determine whether a compound increases or biomolecules of the array may be selected from the biomol decreases the relative abundance of Bacteriodes, Actinobac ecules listed in Table 14. For instance, the biomolecules of the teria, or Firmicutes in a Subject. Typically, Such methods array may be selected from the group comprising nucleic comprise comparing a plurality of biomolecules of the host's acids corresponding to SEQ ID NO:274 through SEQ ID microbiome before and after administration of a compound, NO:383. In yet another embodiment, the biomolecules of the such that if the abundance of biomolecules associated with array may be selected from the biomolecules listed in Table obesity decreased after treatment, or the abundance of bio 13 and Table 14, for instance, the nucleic acids corresponding molecules indicative of Bacteroides increases, or the abun to SEQID NO:1 through SEQID NO:383. dance of biomolecules indicative of Firmicutes and/or Acti 0092. Additionally, the biomolecule may beat least 70, 75, nobacteria decreases, the compound may be efficacious in 80, 85,90, or 95% homologous to a biomolecule listed in treating obesity in a host. Table 13 or Table 14 above. In one embodiment, the biomol 0097. The array may also be used to quantitate the plural ecule may beat least 80, 81, 82, 83, 84,85, 86, 87,88, or 89% ity of biomolecules of the host microbiome before and after homologous to a biomolecule derived from an accession administration of a compound. The abundance of each bio number detailed above. In another embodiment, the biomol molecule in the plurality may then be compared to determine ecule may beat least 90,91, 92,93, 94.95, 96, 97,98, or 99% if there is a decrease in the abundance of biomolecules asso homologous to a biomolecule derived from an accession ciated with obesity after treatment. number detailed above. 0098. In some embodiments, the array may be used as a 0093. In determining whether a biomolecule is substan diagnostic or prognostic tool to identify Subjects that are tially homologous or shares a certain percentage of sequence Susceptible to more efficient energy harvesting, and therefore, identity with a sequence of the invention, sequence similarity more Susceptible to weight gain and/or obesity. Such a may be defined by conventional algorithms, which typically method may generally comprise incubating the array with allow introduction of a small number of gaps in order to biomolecules derived from the subject's gut microbiome to achieve the best fit. In particular, “percent identity” of two determine the relative abundance of nucleic acids or nucleic polypeptides or two nucleic acid sequences is determined acid products associated with Bacteroidetes, Actinobacteria, using the algorithm of Karlin and Altschul (Proc. Natl. Acad. or Firmictues. In some embodiments, the array may be used to Sci. USA 87:2264-2268, 1993). Such an algorithm is incor determine the relative abundance of Mollicutes, Mollicute porated into the BLASTN and BLASTX programs of Alts associated nucleic acids, or Mollicute-associated nucleic acid chulet al. (J. Mol. Biol. 215:403-410, 1990). BLAST nucle products in a Subject's gut microbiome. Methods to collect, otide searches may be performed with the BLASTN program isolate, and/or purify biomolecules from the gut microbiome to obtain nucleotide sequences homologous to a nucleic acid of a subject to be used in the above methods are known in the molecule of the invention. Equally, BLAST protein searches art, and are detailed in the examples. may be performed with the BLASTX program to obtain amino acid sequences that are homologous to a polypeptide (b) Microbiome Profiles of the invention. To obtaingapped alignments for comparison 0099. The present invention also encompasses use of the purposes, Gapped BLAST is utilized as described in Altschul microbiome as a biomarker to construct microbiome profiles. et al. (Nucleic Acids Res. 25:3389-3402, 1997). When utiliz Generally speaking, a microbiome profile is comprised of a ing BLAST and Gapped BLAST programs, the default plurality of values with each value representing the abun parameters of the respective programs (e.g., BLASTX and dance of a microbiome biomolecule. The abundance of a BLASTN) are employed. See http://www.ncbi.nlm.nih.gov microbiome biomolecule may be determined, for instance, by for more details. sequencing the nucleic acids of the microbiome as detailed in 0094 For each of the above embodiments, methods of the examples. This sequencing data may then be analyzed by determining biomolecules that are indicative of, or modulated known Software, as detailed in the examples, to determine the in, an obese host microbiome compared to a lean host micro abundance of a microbiome biomolecule in the analyzed biome may be determined using methods detailed in the sample. The abundance of a microbiome biomolecule may Examples. also be determined using an array described above. For US 2011/0177976 A1 Jul. 21, 2011

instance, by detecting the association between a biomol medium such that software known in the art and detailed in ecules comprising a microbiome sample and the biomol the examples may be used to compare the microbiome profile ecules comprising the array, the abundance of a microbiome and the reference profiles. biomolecule in the sample may be determined. 0104. The host microbiome may be derived from a subject 0100. A profile may be digitally-encoded on a computer that is a rodent, a human, a livestock animal, a companion readable medium. The term “computer-readable medium' as animal, or a Zoological animal. In one embodiment, the host microbiome is derived from a rodent, i.e. a mouse, a rat, a used herein refers to any medium that participates in provid guinea pig, etc. In another embodiment, the host microbiome ing instructions to a processor for execution. Such a medium is derived from a human. In a yet another embodiment the may take many forms, including but not limited to non-vola host microbiome is derived from a livestock animal. Non tile media, Volatile media, and transmission media. Non limiting examples of livestock animals include pigs, cows, Volatile media may include, for example, optical or magnetic horses, goats, sheep, llamas and alpacas. In still another disks. Volatile media may include dynamic memory. Trans embodiment, the host microbiome is derived from a compan mission media may include coaxial cables, copper wire and ion animal. Non-limiting examples of companion animals fiber optics. Transmission media may also take the form of include pets. Such as dogs, cats, rabbits, and birds. In still yet acoustic, optical, or electromagnetic waves, such as those another embodiment, the host microbiome is derived from a generated during radio frequency (RF) and infrared (IR) data Zoological animal. As used herein, a "Zoological animal' communications. Common forms of computer-readable refers to an animal that may be found in a Zoo. Such animals media include, for example, a floppy disk, a flexible disk, hard may include non-human primates, large cats, wolves, and disk, magnetic tape, or other magnetic medium, a CD-ROM, bears. CDRW, DVD, or other optical medium, punch cards, paper tape, optical mark sheets, or other physical medium with III. Kits patterns of holes or other optically recognizable indicia, a 0105. The present invention also encompasses a kit for RAM, a PROM, and EPROM, a FLASH-EPROM, or other evaluating a compound, therapeutic, or drug. Typically, the memory chip or cartridge, a carrier wave, or other medium kit comprises an array and a computer-readable medium. The from which a computer can read. array may comprise a substrate, the Substrate having disposed 0101 A particular profile may be coupled with additional thereon at least one biomolecule that is modulated in an obese data about that profile on a computer readable medium. For host microbiome compared to a lean host microbiome. The instance, a profile may be coupled with data about what computer-readable medium may have a plurality of digitally therapeutics, compounds, or drugs may be efficacious for that encoded profiles wherein each profile of the plurality has a profile, or about other features of the subject’s digestive plurality of values, each value representing the abundance of health when consuming a given diet or set of diets. Con a biomolecule in a host microbiome detected by the array. The array may be used to determine a profile for a particular host versely, a profile may be coupled with data about what thera under particular conditions, and then the computer-readable peutics, compounds, or drugs may not be efficacious for that medium may be used to determine if the profile is similar to profile. Alternatively, a profile may be coupled with known known profile stored on the computer-readable medium. risks associated with that profile. Non-limiting examples of Non-limiting examples of possible known profiles include the type of risks that might be coupled with a profile include obese and lean profiles for several different hosts, for disease or disorder risks associated with a profile. The com example, rodents, humans, livestock animals, companion puter readable medium may also comprise a database of at animals, or Zoological animals. least two distinct profiles. 0102. Such a profile may be used, for instance, in a method DEFINITIONS of selecting a compound for treating obesity or an obesity related disorder in a host. Generally speaking, Such a method 0106 The term “abundance” refers to the representation would comprise providing a microbiome profile from the host of a given taxonomic group (e.g. phylum, order, family, gen and providing a plurality of reference microbiome profiles, era, or species) of microorganism present in the gastrointes each associated with a compound, and selecting the reference tinal tract of a Subject. profile most similar to the host microbiome profile, to thereby 0107 The term “activity of the microbiota population' select a compound for treating obesity or an obesity-related refers to the microbiome's ability to harvest energy and nutri disorder in the host. The host profile and each reference entS. profile may comprise a plurality of values, each value repre 0108. The term “antagonist” refers to a molecule that senting the abundance of a microbiome biomolecule. inhibits or attenuates the biological activity of a Fiaf polypep 0103) The microbiome profiles may be utilized in a variety tide and in particular, the ability of Fiaf to inhibit LPL, and/or of applications. For example, the microbiome profiles may be the ability of the microbiota to regulate Fiaf. Antagonists may used in a method for predicting risk for obesity or an obesity include proteins such as antibodies, nucleic acids, carbohy related disorder in a host. The method comprises, in part, drates, Small molecules, or other compounds or compositions providing a microbiome profile from a host, and providing a that modulate the activity of a Fiaf polypeptide either by plurality of reference microbiome profiles, then selecting the directly interacting with the polypeptide or by acting on com reference profile most similar to the host microbiome profile, ponents of the biological pathway in which Fiaf participates. such that if the host's microbiome is most similar to a refer 0109 The term “agonist” refers to a molecule that ence obese microbiome, the host is at risk for obesity or an enhances or increases the biological activity of a Fiaf obesity-related disorder. The microbiome profile from the polypeptide and in particular, the ability of Fiaf to inhibit host may be determined using an array of the invention. The LPL. Agonists may include proteins, peptides, nucleic acids, reference profiles may be stored on a computer-readable carbohydrates, Small molecules (e.g., Such as metabolites), or US 2011/0177976 A1 Jul. 21, 2011 12 other compounds or compositions that modulate the activity the scope of the invention, it is intended that all matter con of a Fiaf polypeptide either by directly interacting with the tained in the above description and in the examples given polypeptide or by acting on components of the biological below, shall be interpreted as illustrative and not in a limiting pathway in which Fiaf participates. 0110. The term “altering as used in the phrase “altering SS. the microbiota population' is to be construed in its broadest I0121 The following examples are included to demon interpretation to mean a change in the representation of strate preferred embodiments of the invention. It should be microbes or the functions/activities of microbial communi appreciated by those of skill in the art that the techniques ties in the gastrointestinal tract of a subject. The change may disclosed in the examples that follow represent techniques be a decrease or an increase in the presence of a particular discovered by the inventors to function well in the practice of microbial species, genus, family, order, or class, or change in the invention. Those of skill in the art should, however, in light the expression of microbial community associated nucleic of the present disclosure, appreciate that many changes can be acids or a change in the protein and metabolic products pro made in the specific embodiments that are disclosed and still duced by members of the community. obtain a like or similar result without departing from the spirit 0111. “BMI as used herein is defined as a human sub jects weight (in kilograms) divided by height (in meters) and scope of the invention. Therefore all matter set forth or Squared. shown in the accompanying drawings is to be interpreted as 0112 An “effective amount” is a therapeutically-effective illustrative and not in a limiting sense. amount that is intended to qualify the amount of agent that will achieve the goal of a decrease in body fat, or in promoting EXAMPLES weight loss. 0113 Fas stands for fatty acid synthase. 0.122 The following examples illustrate various iterations 0114 Fiaf stands for fasting-induced adipocyte factor, also known as angiopoietin like protein 4 (Angplit14). of the invention. 0115 LPL stands for lipoprotein lipase. 0116. The term “obesity-related disorder” includes disor Example 1 ders resulting from, at least in part, obesity. Representative disorders include metabolic syndrome, type II diabetes, The Gut Microbiota is Linked to Family and BMI hypertension, cardiovascular disease, and nonalcoholic fatty liver disease. I0123. The bacterial lineages of the human gut microbiota 0117 The term “metagenomics' refers to the application are largely unexplored. In this study, the lineages of gut of modern genomic techniques to the study of the composi microbiota of 31 monozygotic (MZ) twin pairs, 23 dizygotic tion and operations of communities of microbial organisms sampled directly in their natural environments, by passing the (DZ) twin pairs, and where available their mothers (n=46), need for isolation and lab cultivation of individual species. were characterized. (Tables 1-5). MZ and DZ co-twins and 0118 PPAR stands for peroxisome proliferator-activator parent-offspring pairs provide an attractive paradigm for receptor. assessing the impact of genotype and shared early environ 0119. A “subject in need of treatment for obesity’ gener ment exposures on the gut microbiome. Moreover, geneti ally will have at least one of three criteria: (i) BMI over 30; (ii) cally identical MZ twin pairs gain weight in response to 100 pounds overweight; or (iii) 100% above an “ideal body overfeeding in a more reproducible way than do unrelated weight as determined by generally recognized weight charts. individuals and are more concordant for body mass index 0120. As various changes could be made in the above (BMI) than dizygotic twin pairs, Suggesting shared features compounds, products and methods without departing from of their energy balance influenced by host genotype.

TABLE 1 V2/31 16S rRNA gene sequencing statistics

Data ID Months time- Family Twin? BMI without Total SubjectID point number Mom Ancestry Zygosity category Antibiotics sequences F1T1Le1 TS1 1 Twin EA MZ LC8 >6 641S F1T1Le2 TS1.2 1 Twin EA MZ LC8 >6 1627 F1 T2Le1 TS2 1 Twin EA MZ LC8 NA 15495 F1 T2Le2 TS2.2 1 Twin EA MZ LC8 >6 1957 F1MOy1 TS3 1 Mom EA NA Overweight >6 7870 F1MOw2 TS3.2 1 Mom EA NA Overweight >6 1799 F2T1Le1 TS4 2 Twin EA MZ LC8 >6 9343 F2T1Le2 TS42 2 Twin EA MZ LC8 >6 2886 F2T2Le1 TSS 2 Twin EA MZ LC8 >6 13991 F2T2Le2 TSS.2 2 Twin EA MZ LC8 >6 3606 F2MOb1 TS6 2 Mom EA NA Obese >6 7717 F2MOb2 TS6.2 2 Mom EA NA Obese >6 432S F3T1Le1 TS7 3 Twin EA MZ LC8 >6 11808 F3T1Le2 TS7.2 3 Twin EA MZ LC8 >6 2962

US 2011/0177976 A1 Jul. 21, 2011 17

TABLE 2 TABLE 3 V6 16S rRNA gene sequencing statistics Full-length 16S rRNA gene sequencing statistics Subject ID Data ID Twin/Mom Family BMI Sequences Subject ID Data ID Twin/Mom Family BMI Sequences FIT Le TS1 Twin 1 Lean 25,140 F1T1Le TS1 Twin 1 Lean 349 F1 2Le TS2 Will 1 Lean 42,186 F1 T2Le TS2 Twin 1 Lean 351 FMov TS3 Mom 1 Overweight 17,726 F1MOy TS3 Mom 1 Overweight 331 F2T1Le S4 Will 2 Lean 25,705 - F2T2Le TSS Twin 2 Lean 26,608 F2T1Le TS4 Twin 2 Lean 351 F2MOb TS6 Mom 2 Obese 27,007 F2T2Le S5 Will 2 Lean 345 F3T1Le TS7 Twin 3 Lean 17.469 F2MOb TS6 Mom 2 Obese 348 F3T2Le TS8 Twin 3 Lean 17,170 F3T1Le TS7 Twin 3 Lean 237 F3MOy TS9 Mom 3 Overweight 14,787 F3T2Le TS8 Twin 3 Lean 3S4 FST1Le TS13 Twin 5 Lean 15,296 F3MOy TS9 Mom 3 Overweight 357 FST2Le TS14 Twin 5 Lean 14.220 FST1Le TS13 Twin 5 Lean 337 FSMOy TS15 Mom 5 Overweight 14,244 FST2Le TS14 Twin 5 Lean 350 F7T1Ob1 TS19 Twin 7 Obese 43,635 FSMOy TS15 Mom 5 Overweight 338 F7T2Ob1 TS2O Twin 7 Obese 13,476 F7T1Ob1 TS19 Twin 7 Obese 333 F7MOb TS21 Mom 7 Obese 23,714 F7T2Ob1 TS2O Twin 7 Obese 340 F9T1Le TS25 Twin 9 Lean 20.491 F7MOb TS21 Mom 7 Obese 332 F9T2Le TS26 Twin 9 Lean 27,626 F9T1Le TS25 Twin 9 Lean 351 F9MOb TS27 Mom 9 Obese 25,494 F9T2Le TS26 Twin 9 Lean 252 F1 OT Ob1 TS28 Twin 10 Obese 20,905 F9MOb TS27 Mom 9 Obese 343 F10 2Ob1 TS29 Will 10 Obese 15,698 F1 OT1 Ob1 TS28 Twin 10 Obese 344 F10Movi TS30 Mom 10 Overweight 32,083 F1 OT2Ob1 TS29 Twin 10 Obese 337 F11T1Le1 S31 Will 11 Lean 16,530 F11T2Le1 TS32 Twin 11 Lean 31,690 F1 Movil TS30 Mom 10 Overweight 261 F11MOw1 TS33 Mom 11 Overweight 28,962 F1ST1Ob1 TS49 Twin 15 Obese 338 F1ST1Ob1 TS49 Twin 15 Obese 22,201 F1ST2Ob1 TSSO Twin 15 Obese 319 F1ST2Ob1 TSSO Twin 15 Obese 30,498 F1SMOb1 TSS1 Mon 15 Obese 331 F1SMOb1 TS51 Mom 15 Obese 22,691 F16T1Ob1 TS55 Twin 16 Obese 353 F16T1 Ob1 TS55 Twin 16 Obese 37,027 F16T2Ob1 TS56 Twin 16 Obese 278 F16T2Ob1 TSS6 Twin 16 Obese 31,512 F16MOb1 TS57 Mom 16 Obese 348 F16MObl TSS Mom 16 Obese 30,392 F43T1 Ob1 TS148 Twin 43 Obese 323 E. E. E. E. C. E. F43T2Obl TS149 Twin 43 Obese 340 F43MOb1 TS150 Mom 43 Obese 23.465 F43MOb1 S150 Mom 43 Obese - -

TOTAL 817,942 TOTAL 9,920

ID nomenclature: Family number, Twin number or mother, and BMI category (Le = lean; ID nomenclature: Family number, Twin number or mother, and BMIcategory (Le = lean; Ov = overweight, Ob = obese; e.g. F1 T1Le stands for family 1, twin 1, lean) Ov = overweight, Ob = obese; e.g. F1 T1LE stands for family 1, twin 1, lean)

TABLE 4

Phytotypes shared across 2.70% of all individuals (V2/3 dataset: 1,000 random sequences/individual)

Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification

1 151 98.1 7942 28.7 O 6.53 + 0.41 Bacteria: Fimircutes: Clostridia: Faecalibacterium 2 151 98.1 5375 25.5 O 4.41+0.34 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 3 144 93.5 2518 14.7 O 2.06 + 0.16 Bacteria: Firmicutes: Clostridia: Clostridiales 4 143 92.9 S606 3O.S O 4.56 - 0.41 Bacteria: Firmicutes: Clostridia: Clostridiales: Eubacterium reciaie US 2011/0177976 A1 Jul. 21, 2011 18

TABLE 4-continued Phytotypes shared across 270% of all individuals (V2/3 dataset: 1,000 randon sequences/individual) Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene Sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification 5 140 90.9 1629 8.1 O 1.34 - 0.11 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium Cliostridioforme 134 87.O 757 12.7 O.62- 0.09 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcus; Riminococcits Schinki 33 86.4 1485 12.2 1.23 - 0.14 Bacteria: Firmicutes: Clostridia: Clostridiales: Coprococcus 33 86.4 1392 6.5 1.14 - 0.10 Bacteria: Firmicutes: Clostridia: Clostridiales 33 86.4 O.99. O.12 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 28 83.1 819 5.2 O.68 0.06 Bacteria: Firmicutes: Clostridia: Clostridiales 27 82.5 747 3.7 O.62- 0.05 Bacteria: Fimircutes: Clostridia: Faecalibacterium 26 81.8 11598 S1.6 9.39 O.79 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 25 81.2 2585 34.3 2.15 0.31 Bacteria: Fimircutes: Clostridia: Faecalibacterium 23 79.9 3512 15.3 2.89 0.25 Bacteria: Fimircutes: Clostridia: Faecalibacterium 77.9 792 8.4 O66 0.08 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium nexile 76.6 632 2.7 Bacteria: Fimircutes: Clostridia: Faecalibacterium 74.7 3422 43.3 2.79 0.41 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 73.4 441 2.3 O.37 0.03 Bacteria: Firmicutes: ostridia; ostridiales; iostridium nexile 72.7 11.68 17.4 O.98 0.16 8. C eria: Firmicutes; ostridia; ostridiales; timinococcits 72.1 749 5.2 8. C eria: Firmicutes; ostridia; ostridiales 21 70.1 640 3.5 O530.06 eria: Firmicutes; ostridia; ostridiales; timinococcits 1,000 sequences were randomly sampled from a single timepoint for each individual Based on the consensus of 290% sequences within each phylotype (best-BLAST-hit against the Greengenes database) US 2011/0177976 A1 Jul. 21, 2011 19

TABLE 5 Phylotypes shared across 90% of all individuals (V6 dataset: 10,000 randon sequences/individual Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification 1 33 1OO.O 10400 9.7 O.O11 3.40 + 0.45 Bacteria: Firmicutes: ostridia; ostridiales; iostridium nexile 2 33 1OO.O S161 5.9 O.O11 1.67 0.23 acteria; Firmicutes; ostridiales; ostridium nexile; iostridium fusiformis 3 33 1OO.O 6077 6.7 O.O21 1.97 0.32 8. C eria: Firmicutes; ostridia; ostridiales; timinococcits 4 33 1OO.O 16600 26.8 O.O11 5.36 1.02 8. C eria: Firmicutes; ostridia; ostridiales; tibacterium reciaie 5 33 1OO.O 11654 12.5 O.O11 3.78 O.S8 8. C eria: Firmicutes; ostridia; ostridiales; timinococcits 6 32 97.0 31.13 5.8 O.OOO 1.01 - 0.23 8. C eria: Firmicutes; ostridia; ostridiales; Cliostridium nexile 7 32 97.0 2908 4.2 O.OOO 0.96 + 0.21 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 8 32 97.0 2382 3.7 O.OOO 0.78 + 0.13 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 9 32 97.0 1712 4.4 O.OOO 0.56 + 0.14 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcus; Riminococcits schinkii 10 31 93.9 3940 6.6 O.OOO 1.29 + 0.26 Bacteria: Fimircutes: Clostridia: Faecalibacterium 11 31 93.9 3729 4.9 O.OOO 1.21 + 0.18 Bacteria: Firmicutes: Clostridia:

12 30 90.9 454 0.7 O.OOO 0.15 + 0.03 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 13 30 90.9 687 1.1 O.OOO 0.23 + 0.04 Bacteria: Firmicutes: Clostridia 14 30 90.9 999 2.3 O.OOO 0.33- 0.08 Bacteria: Firmicutes: Clostridia: Preptostreptococaceae; Peptostreptococcus anaerobius; Cliostridium bifermenians 15 30 90.9 1241 5.3 O.OOO 0.40 + 0.16 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium boiteae 16 30 90.9 160 O.2 O.OOO 0.05 + 0.01 Bacteria: Actinobacteria: Actinobacteridae: Actinomycineae US 2011/0177976 A1 Jul. 21, 2011 20

TABLE 5-continued Phylotypes shared across 90% of all individuals (V6 dataset: 10,000 randon sequences/individual Number Highest Lowest Meant Semyo % of of reads relative relative of 16S rRNA Individuals individuals grouped abundance abundance gene sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals classification 17 30 90.9 1417 2.0 O.OOO 0.46 + 0.09 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcits 18 30 90.9 1014 1.2 O.OOO 0.33- 0.06 Bacteria: Firmicutes: Clostridia: Clostridiales 19 30 90.9 1353 1.6 O.OOO 0.44 + 0.08 Bacteria: Firmicutes: Clostridia: Clostridiales: Riminococcus; Ruminococcus luti 2O 30 90.9 2686 6.O O.OOO 0.88 + 0.22 Bacteria: Firmicutes: Clostridia: Clostridiales: Cliostridium Cliostridioforme 21 30 90.9 7454 12.2 O.OOO 2.43 + 0.63 Bacteria: Fimircutes: Clostridia: Faecalibacterium Based on the consensus taxonomy of >90% sequences within each phylotype (best-BLAST-hit against the Greengenes database)

TABLE 6

Phylotypes shared across 270% of all individiuals (Full-length dataset:200 random sequences/individuaÓ)

Meant Semyo Number Highest Lowest of 16S rRNA % of of reads relative relative gene Individuals individuals grouped abundance abundance sequences Phylotype with with into across all across all across all Taxonomic ID phylotype phylotype phylotype individuals individuals individuals Classification 1 28 93.3 378 17.9 O.O 7.81 + 1.04 Bacteria: Firmicutes: Clostridia; Faecalibacter(2) 2 27 90.0 347 2S.O O.O 6.90 + 1.20 Bacteria: Firmicutes: Clostridia:Clostridiales: Riminococcits 3 26 86.7 128 9.9 O.O 2.62 + 0.47 Bacteria: Firmicutes: Clostridia:Clostridiales 4 26 86.7 298 23.1 O.O 6.00 + 1.14 Bacteria: Firmicutes: Clostridia:Clostridiales: Etibacterium reciaie 5 26 86.7 127 12.O O.O 2.64 + 0.49 Bacteria: Firmicutes: Clostridia:Clostridiales: clostridioform 6 22 73.3 110 10.9 O.O 2.33 + 0.55 Bacteria: Bacteroidetes: Bacteroidales: Bacteroidaceae 7 22 73.3 87 5.7 O.O 1.76 + 0.29 Bacteria: Firmicutes: Clostridia:Clostridiales: Ciostridium nexile; Clostridium fusiformis 8 21 70.O 112 11.9 O.O 2.32 + 0.49 Bacteria: Firmicutes: Clostridia:Clostridiales: Coprococcus 9 21 70.O 75 6.9 O.O 1.53 + 0.32 Bacteria: Firmicutes: Clostridia:Clostridiales: Cliostridium nexile 10 21 70.O S4 5.7 O.O 1.14 + 0.23 Bacteria: Firmicutes: Clostridia:Clostridiales: Cliostridium nexile Based on the consensus taxonomy of >90% sequences within each phylotype (best-BLAST-hit against the Greengenes database) (2) indicates text missing or illegible when filed US 2011/0177976 A1 Jul. 21, 2011

Sample Characteristics (1.3 kb) were then gel-purified using the Qiaquick kit (Qiagen), subcloned into TOPOTA pCR4.0 (Invitrogen), and 0.124 Twin pairs who had been enrolled in the Missouri Adolescent Female Twin Study (MOAFTS) were recruited the ligated DNA transformed into E. coli TOP10 (Invitrogen). for this study (mean period of enrollment, 11.7+1.2 years; For each sample, 384 colonies containing cloned 16S rRNA range, 4.4-13.0 years). The MOAFTS twin cohort, comprised nucleic acid amplicons were processed for sequencing. Plas of female like-sex twin pairs, was identified from Missouri mid inserts were sequenced bi-directionally using vector birth records over the period 1994-1999, when the twins were specific primers plus the internal primer 907R (5'-CCGT median age 15. A total of 350 twins from the larger MOAFTS CAATTCCTTTRAGTTT-3'). cohort completed Screening interviews for the present study. I0127. 16S rRNA gene sequences were edited and Pairs most likely to meet study criteria were identified at the assembled into consensus sequences using the PHRED and wave five interview of the MOAFTS twin cohort (which has PHRAP software packages within the Xplorseq program. 90% retention of wave four participants). Eligibility was then Sequences that did not assemble were discarded and bases confirmed at screening interview. All twins were 25-32 years with PHRED quality scores <20 were trimmed. Sequences old, of European or African ancestry (EA and AA, respec were checked for chimeras using Bellerophon program ver tively), were generally concordant for obesity (BMD-30 sion 3 with the default parameters (final dataset n=8,941 near kg/m) or leanness (BMI=18.5-24.9 kg/m) (1 twin pair was full-length 16S rRNA gene sequences; for sequence designa lean/overweight (overweight defined as BMI225 and <30) tions see Table 1). Alignments for reference genome 16S and 6 pairs were overweight/obese, and had not taken anti rRNA gene sequences were manually edited in ARB. biotics for at least 5.49+0.09 months. Each participant com V2/3 16S rRNA Sequence-Based Surveys pleted a detailed medical, lifestyle, and dietary questionnaire. I0128. Four replicate PCR reactions targeting the V2/3 Participants were broadly representative of the overall Mis region of bacterial 16S rRNA genes were performed on the souri population with respect to BMI, parity, education, and same fecal DNA samples used above. Each 20 ul reaction marital status. Although all were born in Missouri, they cur contained 100 ng of gel purified DNA (Qiaquick, Qiagen), 8 rently live throughout the USA: 29% live in the same house, ul 2.5x HotMaster PCR Mix (Eppendorf), 0.3 uM of the but some live >800 km apart. Since fecal samples are readily primer 8F 5'-GCCTTGCCAGCCCGCTCAG-TCAGAGTT attainable and representative of interpersonal differences in TGATCCTGGCTCAG-3'; composite of 454 primer B (un gut microbial ecology, they were collected from each indi derlined), linker nucleotides (TC), and the universal bacterial vidual and frozen immediately. The collection procedure was primer 8F (italics), and 0.3 uM of the primer 338R 5'- repeated again with an average interval between sample col GCCTCCCTCGCGCCATCAGNNNNNNNNCA-TGCTG lections of 57+4 days. CCTCCCGTAGGAGT-3'; 454 Life Sciences primer A (un derlined), a unique 8 base barcode (Ns), linker nucleotides Community DNA Preparation (CA), and the broad-range bacterial primer 338R (italics). 0.125 Frozen de-identified fecal samples were stored at Cycling conditions were 95°C. for 2 min, followed by 30 -80° C. before processing. In order to homogenize each cycles of 95°C. for 20 sec, 52° C. for 20 sec, and 65° C. for sample, a 10-20 g aliquot of each sample was pulverized in 1 min. Replicate PCRs were pooled and purified with Ampure liquid nitrogen with a mortar and pestle. An aliquot (~500 magnetic purification beads (Agencourt). mg) of each sample was then Suspended, while frozen, in a I0129. PCR products were quantified with the bisbenzim solution containing 500 ul of extraction buffer 200 mM Tris ide Hassay. An aliquot of each PCR product was incubated (pH 8.0), 200 mM NaCl, 20 mM EDTA), 210 ul of 20% SDS, for 5 minat room temperature in THE reagent 10 mM Trizma 500 ul of a mixture of phenol:chloroform:isoamyl alcohol HCl pH 8.1, 100 mM. NaCl, 1 mM EDTA, and 50 ng/ml (25:24:1, pH 7.9), and 500 ul of a slurry of 0.1 mm-diameter freshly prepared bisbenzimide H (Sigma). Samples were zirconia/silica beads (BioSpec Products, Bartlesville, Okla.). read on a fluorometer or plate reader (excitation at 365 nm, Microbial cells were subsequently lysed by mechanical dis emission at 460 nm) relative to a standard curve constructed ruption with a bead beater (BioSpec Products) set on high for using E. coli DNA (Sigma). Multiple pools, each containing 2 min at room temperature, followed by extraction with phe approximately equimolar amounts of PCR products, were nol:chloroform:isoamyl alcohol, and precipitation with iso assembled for 454 FLXamplicon pyrosequencing (n=33-100 propanol. DNA obtained from three separate 10 mg frozen barcoded samples/pool). Technical replicates were analyzed aliquots of each fecal sample were pooled (2200 ug DNA) from selected representatives of each pool across four differ and used for pyrosequencing (see below). ent sequencing centers; results were highly reproducible, dis Full-Length 16S rRNA Sequence-Based Surveys criminating between individuals and between samples from 0126. Five replicate PCR reactions were performed for the same individual over time (FIG. 1). each fecal DNA sample. To generate full length or near full V6 16S rRNA Sequence-Based Surveys length bacterial 16S rRNA amplicons, each 25 ul reaction 0.130 PCR reactions targeting the V6 region of bacterial contained 100 ng of gel purified DNA (Qiaquick, Qiagen), 10 16S rRNA genes were performed on the same fecal DNA mM Tris (pH 8.3), 50 mM KC1, 2 mM MgSO4, 0.16 uM samples used above. Each 32 ul reaction contained 100 ng of dNTPs, 0.4 uM of the bacteria-specific primer 8F (5'- gel purified DNA (Qiaquick, Qiagen), PCR buffer (PurePeak AGAGTTTGATCCTGGCTCAG-3'), 0.4 uMofthe universal DNA polymerization mix. Thermo-Fisher), 0.625 mM Pure primer 1391R (5'-GACGGGCGGTGWGTRCA-3'), 0.4 M PeakdNTPs (Thermo-Scientific), 0.625uMFusion Primer A, betaine, and 3 units of Taq polymerase (Invitrogen). Cycling 0.625 uM Fusion Primer B, and 5U Pfu polymerase (Strat conditions were 94°C. for 2 min, followed by 25 cycles of 94° agene). The primer set included 5 forward primers (Fusion A) C. for 1 min, 55° C. for 45 sec, and 72° C. for 2 min. Replicate and 4 reverse primers (Fusion B) fused to the 454 Life Sci PCRs were pooled and concentrated (Millipore; Montage ences adaptors A and B respectively. Cycling conditions were PCR filter columns). Full-length 16S rRNA gene amplicons 94° C. for 3 min, followed by 30 cycles of 94° C. for 30 sec, US 2011/0177976 A1 Jul. 21, 2011 22

57°C. for 45 sec, and 72°C. for 1 min, with a final extension Phylogenetic Diversity (PD) measurements, as described by period of 72° C. for 2 min. PCR products were purified with Faith (Biological Conservation 1992), were made for each MinElute columns (Qiagen), and DNA was quantified using sample. PD is the total amount of branch length in a phylo a Bioanalyzer (Agilent) and the PicoGreen assay (Invitro genetic tree constructed from the combined 16S rRNA gen). Two pools of PCR products were constructed for 454 dataset, leading to the sequences in a given sample. To FLX amplicon pyrosequencing, composed of 18 and 20 account for differences in sampling effort between individu samples, respectively (the second run contained 3 samples als, and to estimate the thoroughness of sampling of each from the V2/3 region and 3 technical replicates, one addi individual, the accumulation of PD (branch length) with sam tional sample (TS30) was sequenced in a third run, bringing pling effort was plotted in a manner analogous to rarefaction the total number of V6 samples processed to 33). Since tech curves. The PD rarefaction curve for each individual was nical replicates were highly reproducible (see above and FIG. generated by applying custom python code that can be down 5), datasets for a given individual's biospecimen were pooled loaded from http://bayes.colorado.edu/unifrac, to the Arb for all Subsequent analyses. Any sequences that did not have parsimony insertion tree. an exact match to the proximal primer or that contained one or more ambiguous bases were removed as low quality. The Results proximal primer and any fuzzy matches (identified with I0135) To characterize the bacterial lineages present in the BLAST and the fuZZnuc program) to the distal primer were fecal microbiotas of these 44 individuals, 16S rRNA sequenc then trimmed from the sequences. Finally, any trimmed ing was performed, targeting the full-length gene with an ABI sequences shorter than 50 nucleotides were also removed as 3730x1 capillary sequencer. Additionally, multiplex sequenc low quality. ing with a 454 FLX pyrosequencer was used to Survey the Picking Operational Taxonomic Units (OTUs) V2/3 variable region and the V6 hypervariable region (Tables 1, 2 and 3). Complementary phylogenetic and taxon-based 0131 Pyrosequencing data was pre-processed to remove methods were used to compare 16S rRNA sequences among sequences with low quality scores, sequences with ambigu fecal communities. Phylogenetic clustering with UniFrac is ous characters, or sequences outside of the length bounds based on the principle that communities can be compared in (V6<50 nt, V2/3<200 nt) and binned according to sample terms of their shared evolutionary history, as measured by the based on the error-correcting barcodes. Similar sequences degree to which they share branch length on a phylogenetic were identified using the Megablast software and the follow tree. This approach was complemented with taxon-based ing parameters: E-value 1'; minimum coverage, 99%; and methods; these methods disregard some of the information minimum pairwise identity, 97%. Candidate OTUs were contained in the phylogenetic tree of the taxa in question, but identified as sets of sequences connected to each other at this have the advantage that specific taxa unique to, or shared level using the top 4000 hits per sequence. Each candidate among, groups of samples can be identified (e.g., those from OTU was considered valid if the average density of connec leanorobese individuals). Prior to both types of analyses, 16S tion was above threshold; otherwise it was broken up into rRNA gene sequences were grouped into Operational Taxo Smaller connected components. nomic Units (OTUS/phylotypes) using the furthest-neighbor like algorithm and a sequence identity threshold of 97%, Tree Building and UniFrac Clustering for PCA Analysis which is commonly used to define species-level phylotypes. Taxonomic assignments were made using BLAST and 0132 A relaxed neighbor-joining tree was built from one Hugenholtz taxonomy annotations in the Greengenes data representative sequence per OTU using Clearcut, employing base. the Kimura correction (the PH lanemask was applied to V2/3 0.136. No matter which region of the 16S rRNA gene was data), but otherwise with default comparisons. Unweighted examined (V2/3 or V6 pyrosequencing reads, or the near UniFrac was run using the resulting tree and the counts of complete gene from Sanger reads), individuals from the same each sequence in each sample. Priniciple component analysis family (a twin and her co-twin, or twins and their mother) had (PCA) was performed on the resulting matrix of distances a more similar bacterial community structure than unrelated between each pair of samples. To determine if the UniFrac individuals (FIGS. 2A and 3A, B) and shared significantly distances were on average significantly different for pairs of more phylotypes G=55.2, p<10° (V2/3); G=112.3, p<0. samples (i.e. between twin-pairs, between twins and their 001 (V6); G=11.3, p<0.001 (full-length). No significant cor mother, or between unrelated individuals), a t-test was per relation was seen between the degree of physical separation formed on the UniFrac distance matrix, and a p-value was of family members current homes and the degree of similar generated for the t-statistic by permutation of the rows and ity between their microbial communities (defined by Uni columns as in the Mantel test, regenerating the t-statistic for Frac). The observed familial similarity was not due to an 1000 random samples, and using the distribution to obtain an indirect effect of the physiologic states of obesity versus empirical p-value. leanness; similar results were observed after stratifying twin pairs and their mothers by BMI category (concordant lean or Taxonomy Assignment concordant obese individuals; FIG. 4). Surprisingly, there 0.133 Taxonomy was assigned using the best-BLAST-hit was no significant difference in the degree of similarity in the against Greengenes (E-value cutoff of 1e', minimum 88% gut microbiotas of adult MZ versus DZ twin-pairs (FIG. 2A). coverage, 88% percent identity) and the Hugenholtz tax However, in the present study it was not assessed whether MZ omony, downloaded May 12, 2008, excluding sequences and DZ twin pairs had different degrees of similarities at annotated as chimeric (http://greengenes.lbl.gov/Download/ earlier stages of their lives. Sequence Data/Greengenes format/). 0.137 Multiplex pyrosequencing of V2/3 and V6 ampli cons allowed higher levels of coverage of community diver Rarefaction and Phylogenetic Diversity Measurements sity compared to what was feasible using Sanger sequencing, 0134) To determine which individuals had the most reaching on average 3,984+232 (V2/3) and 24.786+ 1.403 diverse communities of gut bacteria, rarefaction plots and (V6) sequences per sample. To control for differences in US 2011/0177976 A1 Jul. 21, 2011 23 coverage between samples, all analyses were performed on Example 2 an equal number of randomly selected sequences 200 full length, 1,000 V2/3, and 10,000 V6. At this level of coverage, Distribution of Phylotypes in Individuals there was little overlap between the sampled fecal communi ties: only 2, 5, and 21 phylotypes were found in >90% of the 0141 All hosts were searched for bacterial phylotypes individuals surveyed (full-length, V2/3, and V6 data respec presentathigh abundance using a sampling model based on a tively). Moreover, the number of 16S rRNA gene sequences combination of standard Poisson and binomial sampling sta belonging to these phylotypes varied greatly between fecal tistics. microbiotas (Tables 4, 5 and 6). 0138 Samples taken from the same individual at the initial Phylotype Sampling Model collection point and 57+4 days later were remarkably consis 0142. A sampling model was developed that allows place tent with respect to the specific phylotypes found (FIGS. 1 ment of bounds on the maximum abundance of any phylotype and 5), but showed variations in the relative abundance of the found across all samples. The principle here is that if a given major gut bacterial phyla (FIG. 6). There was no significant phylotype made up not less than Some proportion p of the association between UniFrac distance and the time between microbiome of all humans, it is then possible to calculate (i) sample collections. Overall, fecal samples from the same the number of samples of a given size expected to lack that individual were much more similar to one another than phylotype due to sampling error, and (ii) the probability that samples from family members or unrelated individuals (FIG. an actual proportion p-hat as low as the minimum abundance 2A), demonstrating that short-term temporal changes in com would be observed in any sample. munity structure within an individual are minor compared to inter-personal differences. 0143. The probability P of failing to observe a given 0.139. After assigning V2/3, V6 and full-length 16S rRNA microbe at proportion p in a sample of size n is given by gene sequences to bacterial taxa (see Example 3 below), it Poisson statistics as simply e”. For equal sample sizes, the was found that obese individuals generally had a lower rela probability of observing the phylotype in at least k samples tive abundance of the Bacteroidetes and a higher relative using binomial sampling with Pr(Success)=(1-P) can there abundance of the Firmicutes and Actinobacteria: the statisti fore be calculated. Then, the inverse binomial can be used to cal significance of these observations varied depending upon ask what value of P. and therefore of p, gives a specified the sequencing methods used (Table 7), likely due to differ probability (say, 5%) of observing a given phylotype in as few ences in PCR conditions (for example, the 8F primer has a samples as actually observed for the most abundant phylo known bias against Actinobacteria). type. This calculation yields an upper bound for p (i.e. the 0140. In Summary, across all methods, obesity was asso value of p at which we can reject the idea that we would have ciated with a significant decrease in the level of diversity seen the phylotype in as few samples as actually observed at (FIG.2B and FIGS. 3C-F). This reduced diversity suggests an the 95% confidence level). analogy: the obese gut microbiota is not like a rainforest or 0144. For unequal sizes, there is no analytical solution to reef, which are adapted to high energy flux and are highly the equivalent of the binomial in which Pr(success) differs for diverse, but rather may be more like a fertilizer runoff where each trial. Therefore, numerical optimization must be used to a reduced diversity microbial community blooms with abnor solve for p. Because the function relating p and the probabil mal energy input. ity of observing the phylotype in at least a given number of

TABLE 7 Phylum-level taxonomic assignments' lean obese

mean sem N mean Sem N p-value' V2/3 (EA) % Bacteroidetes 26.76 246 26 24.39 1.89 42 % Firmicutes 71.48 2.50 26 72.57 1.92 42 % Actinobacteria O.72 0.14 26 17O O.S8 42 V2/3 (AA) % Bacteroidetes 37.52 3.05 8 29.41 149 62 % Firmicutes 60.74 3.04 8 68.14 142 62 % Actinobacteria O.97 0.40 8 1.27 O.21 62 V6 (EA) % Bacteroidetes 6.85 125 12 3.15 O.93 16 % Firmicutes 81.72 2.41 12 75.99 4.60 16 % Actinobacteria 7.14 1.76 12 17.91 S.O1 16 Full-length (EA) % Bacteroidetes 11.44 2.77 10 7.58 2.35 16 % Firmicutes 83.SO 2.28 10 84.60 3.03 16 % Actinobacteria 2.78 O.78 10 4.41 1.14 16 BLAST (EA): % Bacteroidetes 42.60 8.75 6 34.69 8.16 9 % Firmicutes S1.54 8.35 6 S1.25 S.47 9 % Actinobacteria 2.07 O.33 6 10.34 3.35 9 A subset of each dataset was included in the analysis; 10,000 sequences sample (V6), 1,000 sequences/sample (V2.3) and 200 sequences sample (full-length). Sequences from the same individual across both timepoints were pooled, Values are from a Student's t-test of the obese versus lean distribution The AA leanindividuals surveyed ave significantly more Bacteroidetes andless Firmicutes than the lean EA individuals (p<0.05) BLASTX comparisons between microbiomes and NCBI non-redundant database US 2011/0177976 A1 Jul. 21, 2011 24 samples is monotonic, a bisection search (bounded by p=0 3.8%-47.1%). The overall most abundant phylotype was and p=1) can be used to find the appropriate value of p for a present in 270 of 274 samples at this depth of coverage desired confidence level. In practice, P was calculated for (Bacteria; Bacteroidetes; Bacteroidales; Bacteroidaceae). each sample, a vector of random numbers between 0 and 1 The sampling model indicated that this frequency was con was chosen, and the number of times the random number at a given position was less than P was counted. Repeating this sistent with a true abundance of no more than 0.53%. These procedure for a fixed number of iterations (100,000 for the results were confirmed, with excellent agreement, by the V6 reported values) gives Sufficiently smooth values to approxi data: at 1,000 sequences/sample, the maximum abundance mate the monotonic function and to allow the bisection search OTU is found in 32 of 33 samples, consistent with an abun to converge on the same value of p to three significant figures dance of no more than 0.66%. However, at a coverage depth across repeated trials. of 10,000 sequences/sample, this OTU is found in all 33 0145. In the case where a phylotype was found in all samples but at a minimum observed abundance of 0.02%, samples, a similar procedure could be used to identify the consistent with a true abundance of no more than 0.1%. Using maximum value of p consistent with the observed minimum all the V6 data without controlling for sampling effort, the abundance of the phylotype whose minimum abundance minimum observed abundance is consistent with a true abun across all samples is highest. In this case, instead of calculat dance of no more than 0.07% (the estimate of the true abun ing the fraction of samples in which the phylotype was absent, dance falls with increased sample size because it is less likely (i) binomial sampling could be used to randomly sample the that the low frequency would be observed due to sampling number of observed counts of a phylotype given the paramet error when more total sequences contribute to the result). Thus, we conclude, with 95% confidence, based on the even ric value of p and the sample size of each sample, (ii) the sampling used for the other analyses in this study (i.e., 1,000 minimum abundance across all samples could be measured, sequences/sample from V2/3, 10,000 sequences/sample for and (iii) this minimum abundance compared to the minimum V6) that the maximum abundance of any OTU across all abundance actually observed. Again, an analytical Solution samples cannot exceed the V2/3 result of 0.53%, although the using extreme-value statistics is possible if sample sizes are true maximum abundance might be as much as an order of equal, but the solution must be obtained by numerical meth magnitude lower than this based on the greater depth of ods (in this case, the same type of bisection search used coverage in the V6 samples. above). The sampling model was implemented in Python 0150. In summary, the analysis showed that no phylotype using PyCogent. is present at more than ~0.5% abundance in all of the samples in this study, and that although individual microbiotas are Results dominated by a few abundant phylotypes, these groups vary 0146. Using this model the full-length 16S rRNA dataset dramatically in their proportional representation in the described in Example 1 was first analyzed. The most abun sampled gut communities. Also, no phylotypes were detect dant species-level phylotype in each sample made up 11% able in all individuals sampled within this range of coverage of that sample on average (range: 4.2%-22.0%), and the most (FIG. 7). abundant phylotype found across the combined dataset was found in 25 of the 27 fecal microbiotas (taxonomy Example 3 assignment=Bacteria: Firmicutes; Clostridia; Clostridiales: Ruminococcus). These data are consistent with no phylotype Taxonomic Assignments of Metagenomic Reads being present at more than 1.3% abundance in all samples. 0147 The deeper pyrosequencing data confirmed this 0151. The International Human Microbiome Project has result. In the V6 dataset, using even sampling of 10,000 emphasized the importance of sequencing the genomes of a sequences/sample, the most abundant phylotype in each panel of reference microbial strains. Therefore, shotgun sample made up 12% of that sample on average (range: 5.0%- pyrosequencing was used to sample the fecal microbiomes of 36.6%). The overall most abundant phylotype was found in 18 individuals representing 6 of the families described in all 33 samples (Bacteria; Firmicutes; Clostridia; Clostridi Example 1. ales; Eubacterium rectale). However, in Some samples, this phylotype was present in frequencies as low as 0.01%. Pyrosequencing of Total Community DNA 0148. The sampling model allows one to ask what level of 0152 Shotgun sequencing runs were performed on the abundance in every individual the most abundant phylotype 454 FLX pyrosequencer from total community DNA of 3 lean could have before its absence from, or limited representation European American MZ twin-pairs and their mothers plus 3 in some samples becomes Surprising. For example, with obese European American MZ twin pairs and their mothers, 1,000 sequences/samples, it would be very Surprising if a yielding 8,294,835 reads and 14,730 16S rRNA fragments. species at 50% abundance across all samples in any out of 30 Two Samples were also analyzed on a single run employing samples was missed, but it would not be surprising if a species 454/Roche GS FLX Titanium extra long read sequencing at 0.00001% abundance were missed. technology (Tables 8 and 9). Sequencing reads with degen 014.9 The sampling model (using 1000 random sequences erate bases (“Ns) were removed along with all duplicate per sample) indicated that this minimum observed abundance sequences, as sequences of identical length and content area was consistent with a true abundance of no more than common artifact of the pyrosequencing methodology. 0.66%. In the V2/3 dataset, the most abundant phylotype in Finally, human sequences were removed by identifying each sample made up 14.6% of that sample on average (range: sequences homologous to the H. US 2011/0177976 A1 Jul. 21, 2011 25

TABLE 8 Microbione sequencing statistics 16S rRNA Subject Data Twin Number Filtered gene Da ID Mom Family BMI Platform Total nt Reads Reads' fragments F1T1Le TS1 Twin 1 Lean FLX 60,016,519 254,044 217,386 439 F1 T2Le TS2 Twin 1 Lean FLX 90,271,969 514,022 443,640 512 F1MOy TS3 Mom 1 Overweight FLX 13,506,401 571,301 510,972 723 F2T1Le TS4 Twin 2 Lean FLX O7,008,761 472,154 414,754 626 F2T2Le TSS Twin 2 Lean FLX 12,835,879 553,142 490,776 928 F2MOb TS6 Mom 2 Obese FLX 35,976,476 623,027 535,763 1,039 F3T1Le TS7 Twin 3 Lean FLX 46,946,832 607,386 555,853 1,188 F3T2Le TS8 Twin 3 Lean FLX 13,177,766 468,769 414,497 976 F3MOy TS9 Mom 3 Overweight FLX 37,564,473 552,870 499.499 934 F7T1Ob1 TS19 Twin 7 Obese FLX 95,538,760 583,989 498,880 569 F7T2Ob1 TS2O Twin 7 Obese FLX O8,342,331 550,695 495,040 829 F7MOb TS21 Mom 7 Obese FLX 95.960,723 451,177 413,772 774 F1 OT1 Ob1 TS28 Twin 10 Obese Titanium 38,364,927 399,717 302,780 652 F1 OT2Ob1 TS29 Twin 10 Obese Titanium 239,971,702 672,196 502,399 1,190 F1 OMOw1 TS30 Mom 10 Overweight FLX O5,932,316 564,184. 495,865 791 F1ST1Ob1 TS49 Twin 15 Obese FLX O4,449,087 596,149 519,072 769 F1ST2Ob1 TSSO Twin 15 Obese FLX 29,037,456 642,191 549,700 1,209 F1SMOb1 TSS1 Mon 15 Obese FLX O1,531,105 557,165 434,187 582 SUM 2,136,433,483 9,634,178 8,294,835 14,730 ID nomenclature: Family Number, Twin number or mom, and BMIcategory (Le = lean, Ov = overweight, Ob = Obese; e.g. F1 T1Le Stands for family 1, twin 1, lean) Sequences used after removing low quality, duplicate, and human sequences 16S rRNA gene fragments identified in microbiome sequencing reads sapiens reference genome (BLASTN e-value-10-5, 96iden tity)-75, and score>50).

(9 Microbiome BLAST statistics

Mean Data Raw Reads % Sequences Nucleotides Read- % % % % % % (2)ect ID ID Reads Used Used Used length Hsa RDP KEGG STRING NR Gut 1 TS1 254,044 217,386 85.6 51,708,794. 237.9 0.42 0.21 29.1 34.5 S4.9 S7.9 (2)2Le1 TS2 514,022 443,640 86.3 78,853,892 177.7 0.08 0.12 20.3 28.7 46.9 51.7 (2)Ov1 TS3 571,301 510,972 89.4 O2,717.417 201.0 0.16 0.15 23.8 33.6 56.5 61.2 (2)1 Le1 TS4 472,154 414,754 87.8 95,003,113 229.1 0.14 0.15 26.2 445 72.3 74.9 (2)2Le1 TS5 553,142 490,776 88.7 O0,599.979 205.0 0.22 0.19 23.0 27.8 54.1 6.2.1 (2)Ob1 TS6 623,027 535,763 86.O 18,207,161 220.6 0.62 0.20 26.9 37.2 S8.9 62.1 (2)1 Le1 TS7 607,386 555,853 91.5 34,889,015 242.7 0.13 0.22 26.9 34.0 58.4 61.7 (2)2Le1 TS8 468,769 414,497 88.4 O0,520,072 242.5 0.20 0.24 28.5 35.7 61.1 64.4 C2Ov1 TS9 552,870 499.499 90.3 24,768,172 249.8 0.14 0.19 26.8 36.6 63.2 66.3 (2)1Ob1 TS19 583,989 498,880 85.4 82,117,565 164.6 0.06 0.12 19.1 30.6 52.9 57.1 (2)2Ob1 TS2O 550,695 495,040 89.9 98,053,098 198.1 0.32 0.17 22.3 29.3 47.2 49.9 (2)Ob1 TS21 451,177 413,772 91.7 88,786,017 214.6 0.09 0.19 25.5 37.6 62.8 66.3 C2T1Ob1 TS28 399,717 302,780 75.7 O1434,082 335.0 0.06 0.36 24.5 28.4 53.2 55.5 (2)T2Ob1 TS29 672,196 502,399 74.7 73,386,030 345.1 0.11 0.29 27.5 34.8 63.2 63.9 QMOv1 TS30 564,184. 495,865 87.9 94.405,318 190.4 0.21 0.16 22.4 32.O 54.7 60.7 (2)T1Ob1 TS49 596,149 519,072 87.1 91.987,878 177.2 0.29 0.15 18.6 23.0 43.7 46.4 C2T2Ob1 TS50 642,191 549,700 85.6 11,999,603 203.7 0.24 0.22 24.6 29.4 51.9 57.9 (2)MOb1 TS51 557,165 434,187 77.9 81,330,211 1873 0.40 0.14 21.0 26.3 44.2 43.9 Average 535,232 460,824 86.1 O1,709,301 223.5 0.22 0.19 24.3 32.5 SS6 S9.1 Sum 9,634,178 8,294,835 1,830,767417 Key: % sequences used = percentage of sequences remaining after removing low quality, duplicate, and human sequences; Hsa = reads matching the H. sapiens genome;% RDP = percentage ofreads matching the RDP16S rRNA database;%KEGG,96 STRING, 96 NR = percentage of reads that were assignable to entries in these various databases; % Gut = percentage of reads assigned to the database of 42 reference genomes (2) indicates text missing or illegible when filed

Database Searches and Metabolic Reconstructions sequence found in each reference database. For KEGG analy 0153. The distributions of taxa, genes, orthologs, meta- sis, the closest matching gene with an annotation was used, bolic pathways, and high-level gene categories were tallied since many genes in the database remain unannotated, includ based on the corresponding annotation of the best-BLAST-hit ing all KEGG Orthologous groups (KOS) assigned to genes US 2011/0177976 A1 Jul. 21, 2011 26 with an identical e-value (commands -e 0.00001-m 9-b 100 each longer capillary read was generated. Nucleotides span were used to run NCBI BLASTX). Custom Perl scripts were ning positions 100 to 322 were used from all capillary reads of used for all KEGG, STRING, and NCBI NR analyses. Suitable length, to avoid low quality regions that commonly Selected genes from recently sequenced reference genomes occur at the beginning and end of the reads. These simulated were manually annotated using NCBI-BLASTP searches reads were then annotated as described above. against the KEGG, STRING, and NR database. The 42 ref 015.6 16S rRNA gene fragments were identified in each erence genome database includes predicted proteins from microbiome through BLASTN searches of the RDP database draft or complete assemblies of Alistipes putredinis, (version 9.33; e-value-10; Bit-score>50; % identity>50; Bacteroides WH2. Bacteroides thetaiotaomicron 3731, alignment length 100). Putative 16S rRNA gene fragments Bacteroides thetaiotaomicron 7330, Bacteroides thetaio were then aligned using the NAST multi-aligner with a mini taOmicron 5482, Bacteroides fragilis, Bacteroides caccae, mum template length of 100 bases and minimum '% identity Bacteroides distasonis, Bacteroides ovatus, Bacteroides Ster of 75%. Taxonomy was assessed after insertion into an ARB coris, Bacteroides uniformis, Bacteroides vulgatus, Para neighbor-joining tree. bacteroides merdae, caccae, Anaerotruncus 0157 Microbiomes were clustered based on their profiles colihominis, Anaerofustis Stercorihominis, Bacteroides cap after normalizing across all sampled communities (Z-score), illosus, Clostridium bartlettii, Clostridium bolteae, using the Pearson's correlation distance metric, followed by Clostridium eutactus, Clostridium leptum, Clostridium single-linkage hierarchical clustering in addition to Principal ramosum, Clostridium scindens, Clostridium sp.L2-50, Components Analysis (Cluster3.0). Results were visualized Clostridium spiroforme, Dorea longicatena, Eubacterium using the Treeview Java applet. Functional diversity (Shan dolichum, Eubacterium eligens, Eubacterium rectale, Eubac non index and evenness) was calculated using the number of terium Siraeum, Eubacterium ventriosum, Faecalibacterium assignements in each microbiome to each of the 254 path prausnitzii M212, Peptostreptococcus micros, Ruminococcus ways present in the KEGG database (EstimateS 8.0). The gnavus, Ruminococcus Obeum, Ruminococcus torques, Col maximum possible index is the natural log of the total number linsella aerofaciens, Bifidobacterium adolescentis, Bifido of pathways: In (254) or 5.54. Shannon evenness was calcu bacterium longum, Escherichia coli K12, Methanobrevi lated by dividing the Shannon index for a given microbiome bacter Smithii, and Methanobrevibacter stadtmanae (see by the maximum possible index (scale of 0 to 1, with 1 http://genome.wustl.edu/pub? and NCBI GenBank). Draft representing a microbiome with all pathways found at an assemblies of Clostridium sp. SS2-1 and Clostridium sym equal abundance). Results were compared to simulated biosum were also used for functional clustering and diversity metagenomic reads generated from 36 recently sequenced analyses (http://genome.wustl.edu/pub/). Coverage plots reference human gut-derived Bacteroidetes and Firmicutes (percent identity plots) were generated using nucmer and genomes (http://genome.wustl.edu/pub/organism/). Reads mummerplot (part of the MUMmer v3.19 package), and were produced by Readsim v0.10, using the following default parameters. options: -n 10000-modlr normal-meanlir 223-stdlr 0.3. The 0154 Annotations were validated with simulated datasets mean and standard deviation for length of the simulated reads (FIG. 8). To do so, the frequency of annotated genes from the was based on the observed read-length distribution of the 18 KEGG database (v44) was first tallied across the aggregate fecal microbiome datasets (Table 9). human gut microbiomes (n=18 datasets). The 1,000 most frequent microbial genes were then used to generate simu Results lated reads between 50 and 500 nt long. The simulated reads 0158. One fundamental parameter that governs the utility were subsequently annotated (BLASTX against the KEGG of reference genomes is the ability to accurately assign frag database), with self-hits excluded. This analysis revealed a mentary reads from metagenomic datasets to these genomes. low rate of false positives (i.e. high precision), but using very Therefore, the filtered pyrosequencing reads from the fecal short sequences (e.g. 50-100 nt) increased the rate of false microbiomes of 18 individuals from the 6 different families negatives (lower sensitivity) (FIG. 8). Given the increased described in Example 1 (3 lean twin-pairs and their mothers: read-length relative 454 GS20 pyrosequencing data, simu 3 obese twin pairs and their mothers; Tables 1 and 2) were lated reads with an average length comparable to our data compared to a custom database of 42 human gut associated (200-250 nt), demonstrated robust assignments with an bacterial and archaeal genomes (FIG. 7) using BLASTX, and e-value<10.9% identity)-50, and/or bit-score>50. Using all validated these assignments independently against NCBI's three cutoffs, sequences 200 nt in length returned 81.5% of non-redundant protein database. The relative abundance of the correct assignments, with a precision of 0.93 and sensi sequences from the 18 individual microbiome datasets tivity of 0.88, similar to what was observed by re-annotating assigned to each reference genome was highly variable (see the original full-length gene sequences after ignoring self FIG. 9; R=0.26+0.02 for all pairwise comparisons of taxo hits. The KEGG cutoff criteria were also applied to BLASTX nomic profiles), consistent with the considerable heterogene analysis results for STRING-based predictions, given the ity in microbial community structure among the fecal micro similar size of the databases. biomes observed from sequencing 16S rRNA gene 0155 ABI 3730x1 capillary sequencing reads from 9 pre amplicons. viously published adult human gut microbiomes were 0159. The custom database of 42 reference genomes obtained from the NCBI Trace.Archive. The full dataset from included 23 Firmicutes but only 13 Bacteroidetes. Since the each sample was annotated by BLASTX comparisons against Firmicutes dominate the gut microbiotas of subjects (FIG. 6) the KEGG and STRING database (see above; BLASTX and the reference genome database, it might be expected that e-value-10.9% identity>50, and score>50). To allow quan reads assigned to Firmicutes would match the reference titative comparisons between these datasets and pyrose genomes more closely than reads assigned to Bacteroidetes. quencing data, all forward sequencing reads was first The opposite was true: on average, 46.3+2.6% of the pyrose extracted and then one simulated pyrosequencer read from quencing reads assigned to Bacteroidetes matched the refer US 2011/0177976 A1 Jul. 21, 2011 27 ence genomes at 100% identity, as compared to only 16.7+1. distance matrices: the matrix of each pairwise comparison of 1% of the reads assigned to Firmicutes (p<10, Mann the abundance of each reference genome, and the abundance Whitney: FIGS. 10 and 11). This observation underscores the of each metabolic pathway, were compared (Mantel program high level of phylogenetic and genomic diversity within the in Python using PyCogent; 10,000 replicates). Data are rep gut-associated Firmicutes, indicates that the readily cultur resented as meantSEM unless otherwise indicated. able sequenced gut Firmicutes are not closely related to the 0.165 Odds ratios were used to identify commonly-en abundant gut genomes present in the 18 gut microbiomes, and riched genes in the gut microbiome. In short, all gut micro Suggests that future reference microbial genome sequencing biome sequences were compared against the custom database efforts should be directed towards representatives of this of 42 gut genomes (BLASTXe-value-10, bitscore>50, and dominant phylum. % identity. 50). A gene by sample matrix was then screened 0160 The effect of technical advances that produce longer to identify genes commonly-enriched in either the obese or reads on improving these assignments was also tested by lean gut microbiome (defined by an odds ratio greater than 2 sequencing fecal community samples from one twin pair or less than 0.5 when comparing the pooled obese twin micro using next-generation Titanium pyrosequencing methods biomes to the pooled lean twin microbiomes and when com average read length of 341+134 nt (SD) versus 208+68 for paring each individual obese twin microbiome to the aggre the standard FLX platform. FIG. 12 shows that the frequency gate lean twin microbiome, or vice versa). The statistical and quality of sequence assignments is improved as read significance of enriched or depleted genes was then calcu length increases from 200 to 350 nt. lated using a modified t-test (q-values 0.05; calculated with 0161 FIG. 13 Summarizes the relative abundance of the code kindly supplied by Mihai Pop and J. R. White, Univer major bacterial phyla present in these 18 microbiomes, as sity of Maryland). To search for genes that were consistently defined by six different approaches (sequencing full-length, enriched or depleted in all six MZ twin-pairs, a gene-by V2/3 and V6 amplicons; BLAST comparisons of shotgun sample matrix was generated based on BLASTX compari pyrosequencer reads with the NCBI non-redundant and the Sons of each microbiome with our custom 42-genome data custom 42 gut genome databases, plus analysis of 16S rRNA base, and an odds ratio was calculated by directly comparing gene fragments). Pairwise comparisons of relative abundance the frequency of each gene in each twin versus the respective data from 16S rRNA gene fragments generated from shotgun co-twin. The analysis revealed only 49 genes (odds ratio>2 or sequencing reads correlate most closely with V2/3 PCR data <0.5): they representa variety of taxonomic groups, including (FIG. 13 and Table 7). Firmicutes, Bacteroidetes, and Actinobacteria and did not show any clear functional trends. Example 4 Results In Silico Functional Analysis of Gut Microbiomes 0166 Sequences matching 156 total CAZyme families were found within at least one human gut microbiome, 0162 The filtered sequences obtained in Example 3 from including 77 glycoside hydrolase, 21 carbohydrate-binding the 18 microbiomes were used to conduct a functional analy module, 35 glycosyltransferase, 12 polysaccharide lyase, and sis of gut microbiomes. 11 carbohydrate-esterase families (Table 10A and B). On average 2.62+0.13% of the gut microbiome could be assigned CAZyme Analysis to CAZymes (a total of 217,615 sequences), a percentage that 0163 Metagenomic sequence reads described in Example is greater than the most abundant KEGG pathway in the gut 3 were searched against a library of modules derived from all microbiome (Transporters; 1.20+0.06%), and indicative of entries in the Carbohydrate-Active enZymes (CAZy) data the abundant and diverse set of microbial genes in the distal base (www.cazy.org using FASTY, e-value-10). This gut microbiome directed towards accessing a wide range of library consists of ~180,000 previously annotated modules polysaccharides. (catalytic modules, carbohydrate binding modules (CBMs) 0.167 Category-based clustering of the functions from and other non-catalytic modules or domains of unknown each microbiome was performed using Principal Compo function) derived from ~80,000 protein sequences. The num nents Analysis (PCA) and hierarchical clustering. This analy ber of sequencing reads matching each CAZy family was sis revealed two distinct clusters of gut microbiomes based on divided by the number of total sequences assigned to metabolic profile, corresponding to samples with an CAZymes and multiplied by 100 to calculate a relative abun increased abundance of Firmicutes and Actinobacteria, and dance. An R value was calculated for each pair of CAZy samples with a high abundance of Bacteroidetes (FIG. 14A). profiles. The distribution of glycoside hydrolase similarity A linear regression of the first principal component (PC1, scores was then compared to the distribution of glycosyltrans explaining 20% of the functional variance) and the relative abundance of the Bacteroidetes showed a highly significant ferase similarity Scores. correlation (R=0.96, p<10-12: FIG. 14B). Functional pro Statistical Analyses files stabilized within each individual’s microbiome after ~20,000 sequences had been accumulated (FIG. 15). Family 0164 Xipe (version 2.4) was employed for bootstrap members had more similar functional profiles than unrelated analyses of pathway enrichment and depletion, using the individuals (FIG.14C), Suggesting that shared bacterial com parameters sample size=10,000 and confidence level=0.95. munity structure (who's there based on 16S rRNA analyses) Linear regressions were performed in Excel (version 11.0, also translates into shared community-wide relative abun Microsoft). Mann-Whitney and Student's t-tests were uti dance of metabolic pathways. Accordingly, a direct compari lized to identify statistically significant differences between Son of functional and taxonomic similarity disclosed a sig two groups (Prism v4.0, GraphPad; Excel version 11.0, nificant association: individuals that share similar taxonomic Microsoft). The Bonferronicorrection was used to correct for profiles also share similar metabolic profiles (p<0.001; Man multiple hypotheses. The Mantel test was used to compare tel test).

US 2011/0177976 A1 Jul. 21, 2011 29

TABLE 1 OB-continued Relative abundance of CAZymes across 9 gut microbiomes (% of sequence assignments across all identified CAZymes) Subject ID GT28 O.S8 O.94 O.83 1.31 1.OO 1.01 148 2.12 1.33 GT5 O46 O.83 O.65 1.54 1.24 O.96 1.74 1.90 O.96 GT51 O.68 1.06 0.72 1.82 1.27 O.88 1.06 1.63 1.02 Carbohydrate binding 1.90 2.06 2.15 2.66 2.88 2.08 2.22 2.28 1.98 molecules Carbohydrate esterases 5.19 5.19 S.O2 5.24 3.94 6.O1 4.68 3.84 4.15 CE4 0.73 O.84 O.92 1.35 O.96 1.04 1.31 1.51 O.91 Polysaccharide lyases 2.78 1.95 3.02 1.55 O.93 2.43 1.43 O.28 O.87 Groups found at an average relative abundance 1% are shown ID nomenclature: Family number, Twin number or mother and BMIcategory (Le= lean, Ov= overweight, Ob= obese e.g. F1 T1Lestands for family 1 twin 1 lean)

Example 5 Substantial variation in the relative abundance of each cat egory (FIG. 20). Furthermore, pair-wise comparisons of Different Functions for Bacteroides and Firmicutes metabolic profiles revealed an average R of 0.97+0.0023 0168 Functional clustering of phylum-wide sequence (FIG. 14A), indicating a high level of functional similarity bins representing reads from the Firmicutes or the Bacteroidetes showed discrete clustering by phylum (FIG. between adult human gut microbiomes. 16A). A direct comparison of the Firmicutes and Bacteroidetes sequence bins to simulated reads generated TABLE 11 from 36 reference Bacteroides and Firmicute genomes rep Relative abundance of metabolic pathways in resented in the 42 member custom database described in the gut microbiome (% of KEGG assignments) Example 3, revealed that the metabolic profile of each micro Meant Sem across biome was similar to the average metabolic profile of each KEGG Metabolic Pathway all 18 microbiomes phylum (FIG. 17). Bootstrap analyses of the relative abun Transporters 4.93 + 0.21 dance of metabolic pathways in the Firmicutes and Other replication, recombination and repair proteins 3.35 - O.04 Bacteroidetes, disclosed 26 pathways with a significantly ABC transporters 3.24. O.13 different relative abundance (FIG. 16A). The Bacteroidetes General function prediction only 2.60 OO6 were enriched for a number of carbohydrate metabolism Purine metabolism 2.29 O.O2 Other enzymes 2.16 O.O3 pathways, while the Firmicutes were enriched for transport Aminoacyl-tRNA biosynthesis 2.14 O.OS systems. The finding is consistent with information gleaned Glutamate metabolism 98 O.O3 from a number of sequenced Bacteroidetes genomes that Starch and Sucrose metabolism 92 O.O3 demonstrate expansive families of genes involved in carbo Pyruvate metabolism 73 O.O2 Pyrimidine metabolism 70 O.O2 hydrate metabolism, as well as the CAZyme analysis in Peptidases 69 O.OS Example 3, which revealed a significantly higher relative Alanine and aspartate metabolism 58 O.O2 abundance of glycoside hydrolases, carbohydrate-binding Glycine, serine and threonine metabolism 53 O.O2 modules, glycosyltransferases, polysaccharide lyases, and Other translation proteins 37 - O.O2 Galactose metabolism 37 - O.O3 carbohydrate esterases in the Bacteroidetes sequence bins Glycolysis. Gluconeogenesis 35 O.O2 (FIG. 16B). Other ion-coupled transporters 34 0.06 Fructose and mannose metabolism 31 O.O3 Example 6 Two-component system 31 O.O3 Ribosome 27 O.O3 Identifying a Core Human Gut Microbiome Replication complex 18 O.O2 Phenylalanine; tyrosine and tryptophan biosynthesis 17 O.O2 0169. One of the major goals of the international human Valine, leucine and isoleucine biosynthesis 15 O.O2 microbiome project is to determine whether there is an iden Carbon fixation 15 O.O1 Nitrogen metabolism 13 O.O2 tifiable core microbiome of shared organisms, genes, or Glycerolipid metabolism O7 O.O2 functional capabilities found in a given body habitat of all or Oxidative phosphorylation O7 O.O3 the vast majority of humans. Although all of the 18 gut micro Butanoate metabolism OS O.O2 biomes surveyed showed a high level of beta-diversity with Chaperones and folding catalysts 99 O.O1 Pentose phosphate pathway 95 O.O1 respect to the relative abundance of bacterial phyla (FIG. Tyrosine metabolism 95 O.O2 18A), analysis of the relative abundance of broad functional Histidine metabolism 92 O.O2 categories of genes (COG) and metabolic pathways (KEGG) Cell division 91 O.O1 revealed a generally consistent pattern regardless of the Aminosugars metabolism 89 O.O3 sample surveyed (FIG. 18B and Table 11): the pattern is also Arginine and proline metabolism 85 O.O1 Citrate cycle (TCA cycle) 84 O.O2 consistent with results obtained from a meta-analysis of pre Methlionine metabolism 83 O.O2 viously published gut microbiome datasets from 9 adult indi Lysine biosynthesis 82, O.O1 viduals (FIG. 19). This consistency was not simply due to the RNA polymerase 81 O.O2 broad level of these annotations, as a similar analysis of Reductive carboxylate cycle (CO2 fixation) 80 O.O3 Bacteroidetes and Firmicutes reference genomes revealed US 2011/0177976 A1 Jul. 21, 2011 30

intervals (FIG. 22D). Based on this analysis, the core micro TABLE 11-continued biome is approaching a total of 2,142 total orthologous groups (one site binding hyperbola curve fit to the resulting Relative abundance of metabolic pathways in rarefaction curve, R=0.9966), indicating that 93% of func the gut microbiome (% of KEGG assignments) tional groups (defined by STRING) found within the core Mean it sem across microbiome, were already identified. Of these core groups, KEGG Metabolic Pathway all 18 microbiomes 64% (KEGG) and 56% (STRING) were also found in 9 previously published but much lower coverage datasets gen Propanoate metabolism 80 O.O Peptidoglycan biosynthesis 79 O.O erated by capillary sequencing of adult fecal DNA (average of N-Glycan degradation 78 O.OS 78.413+2,044 bidirectional reads/sample). Urea cycle and metabolism of amino groups 78 O.O 0172 Metabolic reconstructions of the core microbiome Translation factors 78 O.O2 revealed significant enrichment for a number of expected Selenoamino acid metabolism 77 O.O2 functional categories, including those involved in transcrip Glyoxylate and dicarboxylate metabolism 73 O.O DNA polymerase 72 O.O tion, translation, and amino acid metabolism (FIG. 23). Meta Pentose and glucuronate interconversions 70 O.O2 bolic profile-based clustering indicated that the representa Cysteine metabolism 68 0.02 tion of core functional groups was highly consistent across Pantothenate and CoA biosynthesis 67 O.O samples (FIG. 24), and includes a number of pathways likely Nucleotide Sugars metabolism 67 O.O2 important for life in the gut, such as those for carbohydrate Glycosaminoglycan degradation 66 0.04 and amino acid metabolism (e.g. fructose/mannose metabo Function unknown 66 O.O lism, aminosugars metabolism, and N-Glycan degradation). One carbon pool by folate 65 O.O Sphingolipid metabolism 64 0.03 Variably represented pathways and categories include cell Protein export 62 O.O motility (only a Subset of Firmicutes produce flagella), secre tion systems, and membrane transport Such as phosphotrans Pathways with an average relative abundance of >0.6% are shown ferase systems involved in the import of nutrients, including sugars (FIGS. 23 and 24). 0170 Overall functional diversity was compared using the 0173 CAZyme profiles of glycoside hydrolases and gly Shannon index, a measurement that combines diversity (the cosyltransferases were compared by calculating the R value number of different types of metabolic pathways) and even between each pair of microbiomes (see Table 10 for families ness (the relative abundance of each pathway). The human gut with a relative abundance >1%). This analysis revealed that microbiomes Surveyed had a stable and high Shannon index all individuals have a similar profile of glycosyltransferases value (4.63+0.01), close to the maximum possible level of (mean R=0.96+0.003), while the profiles of glycoside functional diversity (5.54: See Example 4). Despite the pres hydrolases were significantly more variable, even between ence of a small number of abundant metabolic pathways family members (mean R=0.80+0.01; p-10-30, paired Stu (listed in Table 11), the overall functional profile of each gut dent's t-test). This suggests that the number and spectrum of microbiome is quite even (Shannon evenness of 0.84+0.001 glycoside hydrolases is probably affected by external factors on a scale of 0 to 1), demonstrating that most metabolic Such as diet more than the glycosyltransferases. pathways are found at a similar level of abundance. Interest Example 7 ingly, the level of functional diversity in each microbiome Obesity Associated Pathways was significantly linked to the relative abundance of the 0.174. To identify metabolic pathways associated with Bacteroidetes (R=0.81, p<10); microbiomes enriched for obesity, only non-core associated (variable) functional Firmicutes/Actinobacteria had a decreased level of functional groups were included in a comparison of the gut microbiomes diversity. This observation is consistent with an analysis of of lean and obese twin pairs. A bootstrap analysis was used to simulated metagenomic reads generated from each of 36 identify metabolic pathways that were enriched or depleted in Bacteroidetes and Firmicutes genomes (FIG. 21): on average, the variable obese gut microbiome. For example, similar to a the Bacteroidetes genomes have a significantly higher level of mouse model of diet-induced obesity, the obese human gut both functional diversity and evenness (Mann-Whitney, p<0. microbiome was enriched for phosphotransferase systems 01). involved in microbial processing of carbohydrates (Table 12). (0171 At a finer level, 26-53% of enzyme-level func To identify specific genes that were significantly associated tional groups were shared across all 18 microbiomes, while with obesity, all gut microbiome sequences were compared 8-22% of the groups were unique to a single microbiome against the custom database of 42 gut genomes described in (FIGS. 22A-C). The core functional groups present in all example 3. A gene-by-sample matrix was then screened to microbiomes were also highly abundant, representing identify genes commonly-enriched in either the obese or 93-98% of the sequences found in the gut (fecal) microbiome. lean gut microbiome (defined by an odds ratio>2 or <0.5 Given the higher relative abundance of these core groups, when comparing all obese twin microbiomes to the aggregate >95% were found after 26.11+2.02 Mb of sequence was lean twin microbiome or vice versa). The analysis yielded 383 collected from a given microbiome, whereas the variable genes that were significantly different between the obese and groups continue to increase Substantially with each additional lean gut microbiome (q-values 0.05; 273 enriched and 110 Mb sequence. Of course, any estimate of the total size of the depleted in the obese microbiome; see Tables 13 and 14). By core microbiome will be dependent upon sequencing effort, contrast, only 49 genes were consistently enriched or especially for functional groups foundata low abundance. On depleted between all twin-pairs. average, this survey achieved greater than 450,000 sequences 0.175. These obesity-associated genes were representative per fecal sample, which, assuming an even distribution, of the taxonomic differences described above: 75% of the would allow us to sample groups found at a relative abun obesity-enriched genes were from Actinobacteria (vs. 0% of dance of 10. In order to estimate the total size of the core lean-enriched genes; the other 25% are from Firmicutes) microbiome based on the 18 sampled individuals, each while 42% of the lean-enriched genes were from microbiome was randomly sub-sampled in 1,000 sequence Bacteroidetes (vs. 0% of the obesity-enriched genes). Their US 2011/0177976 A1 Jul. 21, 2011 31 functional annotation indicated that many are involved in carbohydrate, lipid, and amino acid metabolism (Tables TABLE 12-continued 13-14). Together, they comprise an initial set of microbial biomarkers of the obese gut microbiome. Pathways enriched or depleted in obese gut microbiomes Depleted Bacterial chemotaxis TABLE 12 Bacterial motility proteins Benzoate degradation via CoA ligation Pathways enriched or depleted in obese gut microbiomes Butanoate metabolism Enriched Fatty acid biosynthesis Citrate cycle (TCA cycle) Nicotinate and nicotinamide metabolism Glycosaminoglycan degradation Other ion-coupled transporters Other enzymes Pentose and glucuronate interconversions Oxidative phosphorylation Phosphotransferase system (PTS) Pyruvate/Oxoglutarate oxidoreductases Protein folding and associated processing Starch and Sucrose metabolism Signal transduction mechanisms Tryptophan metabolism Transcription factors

TABLE 13 Bacterial genes enriched in the gut microbiomes of obese MZ twins COG KEGG Cate- orthologous (2) Genome and NCBI proteinID Annotation COG gories groups 1 Bifidobacterium adolescentis 15448.6403 tRNA-ribosyltransferase COGO343 KOO773 2 Bifidobacterium longum 23465114 Transcriptional regulators COG1609 K 3 Bifidobacterium longum 23466 186 ABC-type Sugar transport system, COG1653 G periplasmic component 4 Bifidobacterium adolescentis 154488903 Superfamily I DNA and RNA COG3973 R helicases 5 Bifidobacterium adolescentis 154486727 DNA polymerase IV COGO389 KO2346 6 Bifidobacterium adolescentis 154488.882 peptide?nickel transport system ATP- COG1123 R KO2O31.2 binding protein 7 Bifidobacterium adolescentis 154488.633 Trk-type K+ transport systems COGO168 C 8 Bifidobacterium adolescentis 154488131 Asp-tRNAASn/Glu-tRNAGln COGOO64 KO2434 amidotransferase B subunit 9 Bifidobacterium adolescentis 154487571 Threonine dehydratase COG1171 E KO1754 10 Bifidobacterium adolescentis 154486641 Glucose-6-phosphate isomerase COGO166 G KO1810 11 Bifidobacterium adolescentis 154488790 ATP-dependent helicase Lhr and Lhr- COG12O1 R KO3724 ike helicase 12 Bifidobacterium adolescentis 119025482 Predicted ATPase involved in cell COG2884 D KO9812 division 13 Bifidobacterium adolescentis 154486531 Predicted phosphohydrolases COG1409 R 14 Bifidobacterium adolescentis 154486606 RNA-(guanine-N1)-methyltransferase COGO336 KOOSS4 15 Bifidobacterium adolescentis 154486895 MP dehydrogenase/GMP reductase COGOS16.7 FR KOOO88 16 Bifidobacterium adolescentis 154486720 Aspartate?tyrosinefaromatic COGO436 E KOO812 aminotransferase 17 Bifidobacterium adolescentis 119026599 Cation transport ATPase COGO474 C KO1529 18 Bifidobacterium adolescentis 154486334 hypothetical protein 19 Bifidobacterium adolescentis 119025743 NAD:NADP transhydrogenase alpha COG3288 C KOO324 Subunit 20 Bifidobacterium longum 233366.17 UspA and related nucleotide-binding COGOS89 T proteins 21 Bifidobacterium adolescentis 154486937 ABC-type sugar transport system COG1653 G KO2O27 22 Bifidobacterium longum 23465912 hypothetical protein 23 Bifidobacterium longum 23335963 K+ transporter COG3158 P KO3S49 24 Bifidobacterium adolescentis 119025729 ABC-type transport system, Fe–S COGO719 O cluster assembly 25 Bifidobacterium adolescentis 154487396 Glutamine synthetase COG1391 OT KOO982 adenylyltransferase 26 Bifidobacterium adolescentis 154488156 hypothetical protein 27 Bifidobacterium adolescentis 154486668 Acetylpropionyl-CoA carboxylase COG4770 I KO1946 28 Bifidobacterium adolescentis 154487299 Nuclease subunit of the excinuclease COGO322 L KO3703 complex 29 Bifidobacterium longum 23465540 Acetate kinase COGO282 C KOO92S 30 Clostridium bartlettii. 164687465 putative conjugative transposon NOG13238 protein 31 Bifidobacterium longum 23465,037 Dipeptidase COG4690 E KO8659 32 Bifidobacterium adolescentis 154488210 Predicted hydrolase of the metallo- COGO595 R KO7021 beta-lactamase Superfamily 33 Bifidobacterium adolescentis 154487598 tRNA, rRNA methyltransferase protein KOOS99 34 Bifidobacterium adolescentis 119025149 hypothetical protein 35 Bifidobacterium adolescentis 154487052 hypothetical protein NOGO7592 US 2011/0177976 A1 Jul. 21, 2011 32

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins COG KEGG Cate orthologous (2) Genome and NCBI protein ID Annotation COG gories groups 36 Bifidobacterium adolescentis 154486554 PTS system, enzyme I KOO935 37 Bifidobacterium longum 23335.005 Selenocysteine lyase COGOS2O KO1763 38 Bifidobacterium longum 23465294 Branched-chain amino acid COG1114 KO3311 permeases 39 Bifidobacterium adolescentis 119025432 Acyl-CoA thioesterase COG1946 KO1076 40 Bifidobacterium adolescentis 154486.528 Aspartate-semialdehyde COGO136 E KOO133 dehydrogenase 41 Bifidobacterium adolescentis 154487076 Predicted ATPase with chaperone COGO606 KO7391 activity 42 Bifidobacterium longum 23466221 Alcohol dehydrogenase, class IV COG1454 C KOOO48 43 Bifidobacterium adolescentis 119025541 Phosphoribosylformylglycinamidine COGOO46.7 KO1952 synthase 44 Bifidobacterium adolescentis 119026031 Geranylgeranyl pyrophosphate COGO142 synthase 45 Bifidobacterium longum 23465502 Signal transduction histidine kinase COG458S 46 Bifidobacterium adolescentis 154486631 Predicted metal-binding, possibly COG1399 R nucleic acid-binding protein 47 Bifidobacterium adolescentis 154488O13 Sugar (pentulose and hexulose) COG1070 KOO853 kinases 48 Bifidobacterium adolescentis 119025777 Aspartate carbamoyltransferase COGOS4O KOO609 49 Bifidobacterium adolescentis 119025510 Superfamily II DNA helicase COGOS14 KO3654 50 Bifidobacterium adolescentis 119026360 Protease II COG1770 KO1354 51 Bifidobacterium adolescentis 119025672 Signal transduction histidine kinase COG392O 52 Bifidobacterium adolescentis 154487392 Orotidine-5'-phosphate decarboxylase COGO284 KO1591 53 Bifidobacterium adolescentis 154487114 Permeases of the major facilitator COGO477 GEPR Superfamily 54 Bifidobacterium adolescentis 119025804 Predicted Fe—S-cluster redox enzyme COGO82O KO6941 55 Bifidobacterium longum 23465197 Permeases of the major facilitator COGO477 GEPR Superfamily 56 Bifidobacterium adolescentis 154487064 Superfamily II RNA helicase COG4581 57 Bifidobacterium longum 23465727 ABC-type dipeptide transport system COGO747 58 Bifidobacterium adolescentis 154486507 hypothetical protein 59 Bifidobacterium longum 23465,472 Predicted transcriptional regulator COG286S KE 60 Bifidobacterium adolescentis 154486695 ABC-type phosphate transport system COGO226 KO2O40 61 Bifidobacterium longum 23466332 Dihydroxyacid COGO129 G KO1687 dehydratase/phosphogluconate dehydratase 62 Bifidobacterium adolescentis 154489143 Predicted COGO637 phosphatase/phosphohexomutase 63 Bifidobacterium adolescentis 154486988 Phosphoribosylaminoimidazole COGOO26 KO1589 carboxylase 64 Bifidobacterium adolescentis 154486732 glycoside hydrolase family 77 COG1640 KOO7OS 65 Bifidobacterium adolescentis 154487590 Uncharacterized conserved protein COG3247 66 Bifidobacterium adolescentis 154486669 Acetyl-CoA carboxylase COG4799 KO1966 67 Bifidobacterium adolescentis 154488O16 Homoserine kinase COGOO83 E KOO872 68 Bifidobacterium adolescentis 119026221 glycoside hydrolase family 43 69 Bifidobacterium adolescentis 119025727 CTP synthase (UTP-ammonia lyase) COGOSO)4 KO1937 70 Bifidobacterium adolescentis 154486325 Uncharacterized protein conserved in COG3583 bacteria 71 Bifidobacterium adolescentis 119025371 Transcription elongation factor COGO195 KO2600 72 Bifidobacterium adolescentis 154486867 Sugar (pentulose and hexulose) COG1070 KOO854 kinases 73 Bifidobacterium adolescentis 154487511 putative cell division protein 74 Bifidobacterium adolescentis 154487124 hypothetical protein 75 Bifidobacterium adolescentis 119025212 hypothetical protein 76 Bifidobacterium adolescentis 154487481 hypothetical protein 77 Bifidobacterium adolescentis 154488824 putative two-component sensor kinase 78 Bifidobacterium adolescentis 154488224 serine threonine protein kinase 79 Bifidobacterium adolescentis 154487149 carbohydrate esterase family 1 80 Bifidobacterium adolescentis 154488135 rRNA methylases COGOS 66 KOOS99 81 Bifidobacterium adolescentis 154489172 glycoside hydrolase family 77 COG1640 KOO7OS 82 Bifidobacterium adolescentis 154487327 Superfamily II RNA helicase COG4581 KO3727 83 Bifidobacterium adolescentis 119025670 Transcription elongation factor COGO782 KO3624 84 Bifidobacterium adolescentis 154486326 Dimethyladenosine transferase COGOO3O KO2S28 85 Bifidobacterium longum 23465.077 glycosyl-transferase family 51 COGO744 KO3693 86 Bifidobacterium longum 23464647 hypothetical protein NOG25707 87 Bifidobacterium adolescentis 1544863.63 hypothetical protein 88 Bifidobacterium adolescentis 154486438 Permeases of the major facilitator COGO477 GEPR Superfamily US 2011/0177976 A1 Jul. 21, 2011 33

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins COG KEGG Cate- orthologous (2) Genome and NCBI protein ID Annotation COG gories groups 89 Bifidobacterium longum 23335686 ABC-type antimicrobial peptide COGO577 V KO2004 transport system 90 Bifidobacterium adolescentis 154486327 4-diphosphocytidyl-2C-methyl-D- COG1947 KOO919 erythritol 2-phosphate synthase 91 Bifidobacterium adolescentis 154488959 twitching motility protein PilT KO2669 92 Bifidobacterium adolescentis 154486273 Leucyl-tRNA synthetase COGO495 KO1869 93 Bifidobacterium adolescentis 154486329 tRNA nucleotidyltransferase/poly(A) COGO617 KOO970 polymerase 94 Bifidobacterium adolescentis 154487191 putative phage protein 95 Bifidobacterium adolescentis 154486270 DNA polymerase III, delta Subunit COG1466 KO2340 96 Bifidobacterium adolescentis 154486380 hypothetical protein 97 Anaerostipes caccae 167747544 Non-ribosomal peptide synthetase COG102O Q modules and related proteins 98 Bifidobacterium adolescentis 154486,501 Predicted unusual protein kinase COGO661 R 99 Bifidobacterium adolescentis 154486855 LacI-family transcriptional regulator 00 Bifidobacterium adolescentis 154486358 Hemolysins and related proteins COG1253 R KO3699 01 Bifidobacterium adolescentis 154486649 Acetylornithine deacetylase/Succinyl- COGO624 E KO1439 diaminopimelate desuccinylase 02 Bifidobacterium adolescentis 119025555 Orotidine-5'-phosphate decarboxylase COGO284 KO1591 03 Bifidobacterium longum 23465600 Gamma-glutamyl phosphate COGOO14 E KOO147 reductase 04 Bifidobacterium adolescentis 154486786 FAD synthase?riboflavin kinase/FMN COGO196 H KOO861,0953 adenylyltransferase 05 Bifidobacterium adolescentis 154488712 Ribonuclease D COGO349 KO3684 06 Bifidobacterium adolescentis 154488649 N-acetylglutamate synthase (N- COG1364 E KO0620,0642 acetylornithine aminotransferase) 07 Bifidobacterium adolescentis 154489082 Ribonucleoside-triphosphate COG1328 KOOS27 reductase 08 Bifidobacterium adolescentis 154487141 transcriptional regulator, AraC family 09 Bifidobacterium longum 23335562 Acetyltransferase (isoleucine patch COGO110 R KOO68O Superfamily) O Bifidobacterium adolescentis 119025600 ABC-type amino acid transport COGO76S E system, permease component 1 Bifidobacterium adolescentis 154486349 Recombinational DNA repair ATPase COG1195 KO3629 (RecR pathway) 2 Bifidobacterium adolescentis 154487341 Succinyl-CoA synthetase COGOO45 C KO1903 3 Bifidobacterium adolescentis 1544864-19 AdenyloSuccinate synthase COGO104 KO1939 4 Bifidobacterium adolescentis 154486.323 transcriptional regulator, AraC family 5 Bifidobacterium adolescentis 119025.197 3-isopropylmalate dehydratase large COGOO6S E KO1702.3 Subunit 6 Bifidobacterium adolescentis 154489094 Predicted dehydrogenases and COGO673 R related proteins 7 Bifidobacterium longum 23336262 O-acetylhomoserine sulfhydrylase COG2873 E KO1740 8 Bifidobacterium longum 23465907 ABC-type COGO6O1 EP KO2O33 dipeptidefoligopeptide?nickel transport systems 9 Bifidobacterium adolescentis 154487000 Threonine aldolase COG2008 E KO162O 20 Bifidobacterium adolescentis 154487167 Sortase and related acyltransferases COG1247 M KO3823 21 Bifidobacterium longum 23465198 Thioredoxin reductase COGO492fOS OC KOO384 26 22 Bifidobacterium adolescentis 154488926 Arabinose efflux permease COG2814 G 23 Bifidobacterium longum 23465931 ABC-type antimicrobial peptide COG1136 V KO2003f4 transport system, ATPase component 24 Bifidobacterium adolescentis 154486352 Type IIA topoisomerase (DNA COGO188 KO1863,2469 gyrasetopo II, topoisomerase IV) 25 Bifidobacterium adolescentis 119026009 Pyruvate-formate lyase-activating COG118O O KO4O69 enzyme 26 Bifidobacterium adolescentis 154487279 Methionine synthase II (cobalamin- COGO62O E KOOS49 independent) 27 Bifidobacterium adolescentis 119025238 Acetolactate synthase COGO440 E KO1653 28 Bifidobacterium adolescentis 119025 129 Signal recognition particle GTPase COGO552 U KO3110 29 Bifidobacterium adolescentis 154488132 Asp-tRNAASn/Glu-tRNAGln COGO154 KO2433 amidotransferase 30 Bifidobacterium adolescentis 154486940 ABC-type dipeptide transport system COGO747 E KO2O3S 31 Bifidobacterium adolescentis 154488789 Type IIA topoisomerase (DNA COGO188 KO1863,2469 gyrasetopo II, topoisomerase IV) 32 Bifidobacterium adolescentis 154487377 Long-chain acyl-CoA synthetases COG1022 KO1897 33 Bifidobacterium adolescentis 154488794 DNA-directed RNA polymerase, COGOS68 K KO3O86 Sigma Subunit US 2011/0177976 A1 Jul. 21, 2011 34

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins COG KEGG Cate- orthologous (2) Genome and NCBI protein ID Annotation COG gories groups 34 Bifidobacterium adolescentis 15448.8989 Superfamily I DNA and RNA COGO210 L KO1529 helicases 35 Bifidobacterium adolescentis 154486903 Prolyl-tRNA synthetase COGO442 J KO1881 36 Bifidobacterium adolescentis 154488.684 putative helicase 37 Bifidobacterium adolescentis 154486399 Lysophospholipase COG2267 I 38 Bifidobacterium adolescentis 119026611 ABC-type sugar transport systems, COG3839 G KOS816 ATPase components 39 Bifidobacterium adolescentis 154486670 Putative fatty acid synthase/reductase COGO304/03 IQ KOOOS9,209, 31.2O3O4981, 665,666,680 4982 40 Bifidobacterium adolescentis 154488852 ABC-type oligopeptide transport COG4166 E KO2O3S system 41 Bifidobacterium adolescentis 154486664 putative ABC-type sugar transport system 42 Bifidobacterium adolescentis 119025257 Ribonucleases G and E COG1530 J KO1128 43 Bifidobacterium adolescentis 154486472 ABC-type antimicrobial peptide COGO577 V KO2004 transport system 44 Bifidobacterium adolescentis 154487036 hypothetical protein 45 Bifidobacterium adolescentis 154487636 glycoside hydrolase family 2 COG32SO G KO1190 46 Eubacterium dolichum 160915695 glycoside hydrolase family 31 47 Bifidobacterium adolescentis 154489092 Aspartate?tyrosine? aromatic COGO436 E KOO812 aminotransferase 48 Bifidobacterium adolescentis 119026440 hypothetical protein NOG21350 49 Bifidobacterium adolescentis 119025397 Myosin-crossreactive antigen COG4716 S 50 Bifidobacterium adolescentis 119026143 Glutamine amidotransferase COGO118 E KO2SO1 51 Bifidobacterium adolescentis 154487050 Universal stress protein Uspa COGOS89 T 52 Bifidobacterium adolescentis 154486729 Phosphoglycerate dehydrogenase COGO111 HE 53 Bifidobacterium adolescentis 154488261 Predicted hydrolases or COGOS96 R acyltransferases 54 Bifidobacterium adolescentis 154489101 hypothetical protein 55 Bifidobacterium adolescentis 154487476 Phosphotransacetylase COGO280,08. CR KOO625 57 56 Bifidobacterium adolescentis 154488788 Uncharacterized proteins of the AP COG1524 R Superfamily 57 Ruminococcus obelum 1538.09835 putative ketose-bisphosphate aldolase 58 Clostridium leptum 160933115 hypothetical protein 59 Bifidobacterium adolescentis 119026429 Ribulose-5-phosphate 4-epimerase COGO23S G KO3O8O 60 Bifidobacterium adolescentis 154487579 glycoside hydrolase family 36 COG3345 G KO74O7 61 Bifidobacterium longum 23464678 hypothetical protein 62 Bifidobacterium adolescentis 154486391 Serine/threonine protein phosphatase COGO631 T KO1090 63 Bifidobacterium adolescentis 154486962 ABC-type amino acid transport signal COGO834 ET KO2O3O transduction systems 64 Bifidobacterium adolescentis 154486954 DNA primase COGO358 KO2316 65 Bifidobacterium adolescentis 154486993 Glutamine COGOO34 KOO764 phosphoribosylpyrophosphate amidotransferase 66 Bifidobacterium adolescentis 154488913 HrpA-like helicases COG1643 KO3578 67 Bifidobacterium adolescentis 154486787 Predicted ATP-dependent serine COG1066 O KO4485 brotease 68 Bifidobacterium adolescentis 154486493 Ammonia permease COGOOO4 C KO332O 69 Bifidobacterium adolescentis 154487494 Methenyl tetrahydrofolate COGO190 H KO0288,1491 cyclohydrolase 70 Bifidobacterium adolescentis 119025 196 Transcriptional regulator COG1414 K 71 Dorea longicatena 153853202 hypothetical protein 72 Bifidobacterium adolescentis 154487329 putative transcriptional regulator 73 Bifidobacterium adolescentis 154487591 LacI-family transcriptional regulator 74 Bifidobacterium adolescentis 154486321 glycoside hydrolase family 3 75 Bifidobacterium adolescentis 119025741 GTPase COG1159 R KO3595 76 Clostridium scindens 167758922 dUTPase COGO756 KO1520 77 Bifidobacterium adolescentis 119025587 Signal transduction histidine kinase COGO642 T 78 Bifidobacterium adolescentis 154486470 Predicted membrane protein COG4393 S 79 Clostridium scindens 167760262 putative sporulation protein 80 Bacteroides Stercoris 167763769 hypothetical protein 81 Anaerostipes caccae 167746872 putative ABC transporter 82 Bifidobacterium adolescentis 154486920 ABC-type amino acid transport signal COGO834 ET KO2O3O transduction systems 83 Bifidobacterium adolescentis 154487063 Uncharacterized conserved protein COG2326 S 84 Bifidobacterium adolescentis 119025989 glycoside hydrolase family 13 COGO366 G KO1187 85 Clostridium bartlettii. 164687864 Lactoylglutathione lyase COGO346 E KO1759 US 2011/017797 6 A1 Jul. 21, 2011 35

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins COG KEGG Cate orthologous (2) Genome and NCBI protein ID Annotation COG gories groups 86 Bifidobacterium ado escentis 154486443 ABC-type antimicrobial peptide COGO577 V KO2004 transport system 87 Bifidobacterium ado escentis 154488245 NADH:flavin COG1902 C oxidoreductases, NADPH2 dehydrogenase 88 Bifidobacterium longum 23465963 atypical histidine kinase sensor of NOG21560 two-component system 89 Bifidobacterium ado escentis 154488949 hypothetical protein 90 Bifidobacterium ado escentis 154486865 maltose O-acetyltransferase 91 Clostridium scindens 167759009 cytidylate kinase KOO945 92 Bifidobacterium ado escentis 154486901 ATP-dependent exoDNAse COGO507 93 Ruminococcus torques 153814251 hypothetical protein 94 Bifidobacterium ado escentis 119025327 Ribosomal protein L13 COGO1 O2 J KO2871 95 Bifidobacterium ado escentis 154488916 ABC-type antimicrobial peptide COG1136 transport system 96 Bifidobacterium ado escentis 119025389 putative histidine kinase sensor of two component system 97 Ruminococcus gnavu S 154504598 Translation elongation factor P (EF COGO231 KO2356 P) initiation factor 5A (eIF-5A) 98 Bifidobacterium ado escentis 119026648 ribonuclease P NOG21633 99 Clostridium scindens 167760715 hypothetical protein 200 Bifidobacterium ado escentis 119026098 Uncharacterized conserved protein COG2606 201 Clostridium scindens 167761320 ABC-type antimicrobial peptide COG1136 KO2003 transport system 202 Bacteroides stercoris 167762249 hypothetical protein 203 Anaerostipes caccae 1677.46530 putative ion channel 204 Bifidobacterium adolescentis 119025.057 Serine/threonine protein kinase COGOS 15 RTKL. 205 Clostridium bartlettii. 16468,6672 Molybdopterin biosynthesis enzymes COGOS21 KO3638 206 Ruminococcus obelum 153811887 hypothetical protein 207 Clostridium spiroforme 169349879 protein-Np-phosphohistidine-Sugar KOO890 phosphotransferase 208 Clostridium ramosum 167756439 type I restriction enzyme, S subunit KO1154 209 Bifidobacterium adolescentis 119025640 Short-chain alcohol dehydrogenase of COG4221 unknown specificity 210 Eubacterium ventriosum 154483925 Uncharacterized conserved protein COG2SO1 211 Bifidobacterium adolescentis 154487477 Phosphoketolase COG3957 KO1621,32.36 212 Bifidobacterium adolescentis 154489149 Putative molecular chaperone COGO443 s KO1529,404.3, 8070 213 Bifidobacterium adolescentis 119025585 hypothetical protein 214 Clostridium scindens 167759334 ABC-type antimicrobial peptide COG1136 KO2003 transport system 215 Anaerostipes caccae 167748732 Serine-pyruvate COGOO75 KO3430 aminotransferase archaeal aspartate aminotransferase 216 Ruminococcus gnavus 154505702 Putative phage replication protein COG2946 KO7467 RStA 217 Bifidobacterium adolescentis 154486389 Cell division protein FtsI COGO768 218 Bifidobacterium ado escentis 154488668 ABC-type cobalt transport system COG1122 KO2OO6 219 Bifidobacterium ado escentis 154486277 Fructose-2,6- COGO4O6 KO1834 bisphosphatasephosphoglycerate mutase 220 Clostridium scindens 167758556 hypothetical protein 221 Dorea longicatena 153855715 putative acetyltransferase 222 Eubacterium dolichum 160915136 ABC-type antimicrobial peptide COG1136 V KO2003 transport system 223 Bifidobacterium ado escentis 119026205 soleucyl-tRNA synthetase COGOO60 KO1870 224 Ruminococcus obelum 153810514 glycoside hydrolase family 23 COGO741,91 225 Eubacterium eligens Contig2011.538 putative phosphohydrolase 226 Bifidobacterium ado escentis 154487387 Transcriptional regulator COGOS83 227 Ruminococcus obelum 153812199 putative flavodoxin 228 Bifidobacterium ado escentis 154486996 Phosphoribosylformylglycinamidine COGOO46.7 KO1952 (FGAM) synthase 229 Dorea longicatena 153854.194 Ornithinefacetylornithine COG4992 KOO818 aminotransferase 230 Ruminococcus gnavus 154505209 Predicted GTPases COG1160 231 Dorea longicatena 153853531 Predicted transcriptional regulators COG1695 232 Ruminococcus torques 153814203 Acetyltransferases COGO456 KO3826 233 Clostridium scindens 167761371 putative ABC-type transport system 234 Bifidobacterium longum 38906105 FOF1-type ATP synthase COGOOSS KO2112 235 Collinsella aerofaciens 139439837 hypothetical protein US 2011/0177976 A1 Jul. 21, 2011 36

TABLE 13-continued Bacterial genes enriched in the gut microbiomes of obese MZ twins COG KEGG Cate orthologous () Genome and NCBI proteinID Annotation COG gories groups 236 Clostridium leptum 160933570 ABC-type antimicrobial peptide COGO577, 11 V KO2003 transport system 36 237 Eubacterium rectale 2731 ative sensor histidine kinase 238 Bifidobacterium adolescentis 154489126 ABC-type multidrug transport system COG1132 KO6147 239 Ruminococcus obelum 153812105 ative conjugative transposon NOGOS968 protein 240 Dorea longicatena 153853999 ny pothetical protein 241 Clostridium bolteae 160937390 ny pothetical protein 242 Ruminococcus torques 153814809 cytidylate kinase KOO945 243 Ruminococcus obelum 153810530 ny pothetical protein 244 Clostridium scindens 167758273 l ative alanine racemase 245 Clostridium scindens 167760222 l ative ABC transporter 246 Dorea longicatena 153854759 Sp orulation protein COG2088 KO6412 247 Bifidobacterium adolescentis 119025414 glycosyl-transferase family 4 248 Ruminococcus obelum 153813075 ny pothetical protein 249 Eubacterium ventriosum 154482695 Queuine/archaeosine tRNA COGO343 KOO773 rib osyltransferase 250 Ruminococcus obelum 153811892 ny pothetical protein 251 Ruminococcus obelum 153810246 Type IV secretory pathway, VirB4 COG3451 components 252 Dorea longicatena 153854838 Ri bosomal protein S16 COGO228 KO2959 253 Dorea longicatena 153855241 l ative DNA gyrase, Subunit A 2S4 Collinsella aerofaciens 139438412 l ative transcriptional regulator 255 Clostridium leptum 160934853 l ative ribosomal-protein-alanine acetyltransferase Eubacterium rectale 3602 Type IV secretory pathway, Vir)4 COG3505 components 257 Bifidobacterium adolescentis 1544864-60 ABC-type multidrug transport system COG1132 KO6147 258 Anaerostipes caccae 167746203 exonuclease SbcC KO3S46 259 Ruminococcus obelum 153813732 ny pothetical protein 260 Eubacterium ventriosum 154484729 protein-Np-phosphohistidine-Sugar KOO890 phosphotransferase 261 Eubacterium rectale 3363 l ative ABC transporter 262 Ruminococcus obelum 1538.09913 ny pothetical protein 263 Anaerostipes caccae 167748861 l ative arylsulfate sulfotransferase 264 Eubacterium eligens Contig2011.154 Uncharacterized conserved protein COG4283 26S Clostridium scindens 167759418 l ative competence protein ComEA 266 Eubacterium rectale 3439 l ative RNA-directed DNA O ymerase 267 Clostridium bolteae 16094.0954 SAM-dependent methyltransferases COGOSOO KOOS99 268 Ruminococcus obelum 15381.1726 ative DNA topoisomerase 269 Ruminococcus obelum 153813044 ative transposase 270 Eubacterium rectale 2410 type I restriction enzyme, R. Subunit KO1152,3 271 Clostridium bolteae 160941795 l ative recombination protein 272 Bifidobacterium adolescentis 154486,724 l ative esterase 273 Collinsella aerofaciens 139438485 l ative amidohydrolase

(2) indicates text missing or illegible when filed

TABLE 1.4 Bacterial genes enriched in gut microbiomes of lean MZ twins COG KEGG Cate- orthologous (2) Genome and NCBI protein ID Annotation COG gories groups 274 Bacteroides capillosus 1545.00567 putative amidohydrolase 275 Clostridium leptum 160934848 putative acetyltransferase 276 Ruminococcus obelum 153810033 phosphocarrier protein HPr KO2784 277 Eubacterium siraeum 167749283 putative ABC transporter related protein 278 Bacteroides capillosus 154497054 Polyribonucleotide COG1.185 J KOO962 nucleotidyltransferase 279 Eubacterium siraeum 167749675 Isoleucyl-tRNA synthetase COGOO60 J KO1870 280 Eubacterium rectale 3617 hypothetical protein 281 Bacteroides capillosus 154498.345 putative sporulation protein 282 Parabacteroides merdae 154490921 hypothetical protein US 2011/0177976 A1 Jul. 21, 2011 37

TABLE 14-continued Bacterial genes enriched in gut microbiomes of lean MZ twins COG KEGG Cate orthologous (2) Genome and NCB proteinID Annotation COG gories groups 283 Bacteroides capil OSuS 1545.00960 putative chromosome segregation protein 284 Ruminococcus torques 153814925 putative sporulation protein 285 Clostridium scin ens 167758815 glycosyl-transferase family 4 286 Clostridium sp. L2 50 160893.842 Protease subunit of ATP-dependent COGO740 OU KO1358 Clp proteases 287 B theta WH2 000545 putative type I restriction enzyme EcoAI specificity protein 288 Bacteroides capil osus 15450O843 Erk system potassium uptake protein KO3499 TrkA 289 Clostridium bolte ae 160936948 putative two-component transcriptional regulator 290 Bacteroides capil osus 154498.005 ATP-dependent serine COG1066 KOOS67 protease/cysteine S methyltransferase 291 Parabacteroides merdae 154492394 hypothetical protein 292 Bacteroides capil osus 154498.009 Fructoseftagatose bisphosphate COGO191 KO1622 aldolase 293 B theta 3731 000845 hypothetical protein 294 Anaerotruncus co lihominis 167769594 Predicted ATPase (AAA+ COG1373 Superfamily) 295 Bacteroides capillosus 1545.00228 putative translation protein 296 Anaerofustis stercorihominis 1693346.67 putative DNA recombinase 297 B theta 3731 003400 hypothetical protein 298 Parabacteroides distasonis 150008749 hypothetical protein 299 Bacteroides fragilis 19068109 mobilization protein BmgA NOG11714 300 Eubacterium dolichum 160914-154 glycoside hydrolase family 20 COG3525 G KO12O7 301 Bacteroides capillosus 154497125 RNA methyltransferase, TrmH family KO3218 302 Clostridium sp. L2 50 160894.658 NTP pyrophosphohydrolases COGO494,33 LRS KO3S74 23 303 Parabacteroides merdae 154494925 Glyceraldehyde-3-phosphate COGOO57 KOO134 dehydrogenase 304 Bacteroides capillosus 1544.96139 Type IIA topoisomerase (DNA COGO188 KO1863,2469 gyrasetopo II, topoisomerase IV) 305 Clostridium ramo Sum 1677S5346 MoxR-like ATPase KO3924 306 Bacteroides uniformis 160888848 hypothetical protein 307 Ruminococcus gnavus 154504651 Putative translation initiation inhibitor COGO2S1 KO7567 308 Bacteroides unifo rmis 160890270 putative phage protein 309 Bacteroides capi osus 154500164 putative DNA recombinase 310 B theta WH2. O00807 sulfotransferase? FAD synthetase COGO175 EH KOO957 311 Bacteroides unifo rmis 160892052 carbohydrate esterase family 4 and 2 312 Clostridium sp. L2 50 160893671 hypothetical protein 313 Bacteroides capi osus 15450.0952 hypothetical protein KO971O 314 Clostridium scin ens 167759293 putative ribonucleoside-triphosphate reductase activating protein 315 Bacteroides capi osus 154498.134 Predicted GTPases COG1160 KO3977 316 Bacteroides capi osus 1545.00412 ribosomal protein 317 Bacteroides fragi is 60683403 midazolonepropioinase and related COG1228 KO1468 amidohydrolases 318 Peptostreptococcus micros 160946111 hypothetical protein NOG15344 319 B theta 7330 OO1524 putative transposase 320 Bacteroides capillosus 1545.00229 putative peptidase 321 Bacteroides vulgatus 150006208 integrase COGOS82 L 322 Bacteroides capillosus 154501540 hypothetical protein 323 Bacteroides stercoris 167762500 Site-specific recombinase XerD COG4974 L 324 Bacteroides fragilis 60679880 glycoside hydrolase family 38 COGO383 G KO1.191 325 Bacteroides capillosus 154497979 putative replication protein 326 Bacteroides capillosus 154500160 putative helicase 327 Bacteroides Stercoris 167752230 Retron-type reverse transcriptase COG3344 L 328 B theta WH2 003792 hypothetical protein NOG14996 329 Bacteroides capillosus 154497731 hypothetical protein 330 Parabacteroides merdae 154494117 UDP-N-acetyl-D-mannosaminuronate COGO677 M KO2472 dehydrogenase 331 Bacteroides caccae 1538.07847 2-succinyl-6-hydroxy-2,4- COG116S H cyclohexadiene-1-carboxylate synthase 332 Anaerotruncus co lihominis 167771309 N-acetylglutamate synthase (N- COG1364 E KOO618 acetylornithine aminotransferase) 333 B theta WH2. O putative outer membrane protein US 2011/0177976 A1 Jul. 21, 2011 38

TABLE 14-continued Bacterial genes enriched in gut microbiomes of lean MZ twins COG KEGG Cate orthologous () Genome and NCBI proteinID Annotation COG gories groups 334 Eubacterium dolichum 160914-195 ative copper-translocating P-type KO1529 ATPase 335 Bacteroides fragilis 53715551 Predicted ATPase COG1373 R 336 Clostridium bolteae 160937.654 l ative phage protein 337 Bacteroides fragilis 53712550 Al kyl hydroperoxide reductase COG3634 O KO3387 338 Parabacteroides merdae 154492101 ny pothetical protein 339 Clostridium bolteae 160936352 Uncharacterized conserved protein COG2606 S 340 Bacteroides uniformis 160889340 TraM 341 B theta 7330 002089 A. enine-specific DNA methylase COGO827.46 KL 46 342 B theta WH2 003982 l ative outer membrane protein 343 Bacteroides capillosus 154496743 ny pothetical protein 344 Clostridium bolteae 160941240 l ative citrate lyase 345 Bacteroides capillosus 154496327 l ative v-type ATPase 346 Bacteroides capillosus 154496839 l ative cobalamin biosynthesis protein 347 Bacteroides fragilis 60683742 Small-conductance mechanosensitive COGO668 M channel 348 Eubacterium siraeum 1677496.11 putative transcriptional regulator 349 Parabacteroides distasonis 150007998 Cobyric acid synthase COG1492 H KO2232 350 Parabacteroides distasonis 150008.480 putative pyruvate formate-lyase 3 activating enzyme 351 Bac eroides capillosus 154496329 Na+-transporting two-sector KO1549; SO ATPase/ATP synthase 352 Bac eroides capillosus 154496850 hypothetical protein 353 Bac eroides capillosus 154496749 putative spore maturation protein 3S4 Bac eroides capillosus 154496148 putative spore protease 355 Clostridium bolteae 160937655 DNA polymerase KOO961 356 Bac eroides fragilis 60683107 Putative copper/silver efflux pump COG3696 P KO7239,7787 357 Bac eroides capillosus 154496295 putative short-chain dehydrogenase/reductase 358 Anaerotruncus colihominis 167771023 stage V sporulation protein AC KO64OS 359 B heta WH2 004992 ABC-type multidrug transport system COGO842 V KO9686 360 Bac eroides capillosus 1545004.09 Transcription antiterminator COGO2SO K KO26O1 361 B heta 3731 003445 putative tyrosine type site-specific NOG36763 recombinase 362 B heta WH2 003671 putative 3-oxoacyl-acyl-carrier protein synthase 363 Parabacteroides distasonis 1500 10457 hypothetical protein 364 Bac eroides fragilis 6068.1723 putative hydrolase lipoprotein NOGO9493 365 Clostridium scindens 167758928 putative transcriptional regulator 366 Bac eroides capillosus 154498.046 Exonuclease VII Small subunit COG1722 L KO36O2 367 Ruminococcus gnavus 154504691 putative phage protein 368 Anaerotruncus colihominis 167772969 hypothetical protein 369 Bac eroides caccae 153808785 Predicted nucleoside-diphosphate COG1086 MG Sugar epimerases 370 Alis ipes putredinis 167751920 phosphoglycolate phosphatase KO1091 371 Anaerotruncus colihominis 167772790 hypothetical protein 372 Para bacteroides merdae 1544.94124 putative transcriptional regulator 373 Bac eroides caccae 1538.09523 glycoside hydrolase family 29 COG3669 G KO12O6 374 Bac eroides fragilis 46242778 Trad conjugation protein 375 Bac eroides capillosus 154499075 putative site-specific recombinase 376 Anaerotruncus colihominis 163816273 putative DNA helicase 377 Bac eroides capillosus 154495881 Pentose-5-phosphate-3-epimerase COGOO36 G KO1783 378 Bac eroides uniformis 160887913 hypothetical protein 379 Dorea longicatena 153853397 putative phage protein 380 Bac eroides vulgatus 150003721 putative outer membrane protein 381 B heta WH2 002145 putative outer membrane protein 382 Bac eroides capillosus 154500525 hypothetical protein (8- 383 Alis ipes putredinis 167752229 putative DNA primase NOG22337

(2) indicates text missing or illegible when filed

Example 8 height and weight is summarized in Table 15. Dizygotic (DZ) BMI Categorization by Ethnicity in Participants in twins had a significantly higher mean BMI than monozygotic (MZ) twins 25.8+6.5 vs. 24.8+5.9, p<0.001, meantsd), and Missouri Adolescent Female Twin Study a higher prevalence of overweight (22.8 vs 20.9%) and obese (0176) BMI category by ethnicity for the entire MOAFTS (20.7 vs 16.1%;2=31.6, p<0.001). This may reflect a higher wave 5 cohort, based on 3326 twins with complete data on dizygotic twinning rate among obese women (MZ twinning US 2011/0177976 A1 Jul. 21, 2011 39 occurs randomly39). BMI was more highly correlated in MZ twins than in DZ twins, both in EA pairs (rMZ=0.80, rDZ=0. 48) and in AA pairs (rMZ=0.73, rDZ=0.26), and this remained true when analysis was restricted to pairs concor dant for obesity (EA: rMZ=0.61, rDZ=0.27: AA rMZ=0.62, rDZ=-0.11) or concordant for leanness (EA: rMZ-0.43, rDZ=0.14; AA: rMZ=0.55, rDZ=0.39). After age-adjust ment, quantitative genetic modeling yielded an estimated additive genetic variance for BMI of 68% (95% Confidence Interval ICI: 57-79%), shared environmental variance of 14% (95% CI: 2-24%), and non-shared environmental vari ance of 14% (95%CI: 17-21%). Data from the Behavioral Risk Factor Surveillance System for Missouri women of comparable age in 2006 yield higher rates of overweight and obesity in EA women (23.8% overweight and 25% obese) compared to rates observed in MOAFTS (19.6% overweight EA, 14.8% obese EA).

TABLE 1.5 BMI category in the Missouri Adolescent Female Twin Study Obese Obese Underweight Lean Overweight Obese I II III (n = 138) (n = 1893) (n = 711) (n = 309) (n = 174) (n = 113) EA 4.79 60.87 19.58 8.08 4.27 2.41 (n = 2860) AA O.21 31.80 31.59 16.32 10.88 9.21 (n = 478) All numbers are percentages. Underwight:,18.5 kg/m. Lean 18.5-24.9 kg/m 25-29.9 kg/m. Obese I: 30-34.9 kg/m. Obese II: 35-39.9 kg/m. Obese III: 240 kg/m

0177 Lean and obese women selected for inclusion in the microbiota, it might be expected that the relative abundance biospecimen collection project were representative of the of each phylum be consistent regardless of the amplification entire cohort of lean and obese MOAFTS twins in terms of and sequencing methods used. However, differences were parity (nulliparous/parous), educational attainment (more observed between methods in this study (FIGS. 13 A-E). than high School education/high School education or less)and Relative to the sampled gut microbiomes (defined by pyrose marital status (married or living with someone as married/not quencing of total community DNA), the full-length, V2/3, married; p-0.05 for all comparisons). Obese EA women pro and V6 16S rRNA gene datasets were all significantly viding biospecimens had a mean BMI at wave 5 of 36.9+4.7 depleted for Bacteroidetes (paired Student's t-test, p<0.001), compared with a mean among EA lean women of 21.4-1.5 and significantly enriched for Firmicutes (p<0.01). One pos (meantsd). EA twins were selected as being stably lean sible explanation for these differences is that the across all waves of data collection (i.e., baseline at median Bacteroidetes reference genomes are more closely related to age 15, one-year follow-up, 5-year follow-up and 7-year fol those in the microbiomes than the Firmicutes reference low-up), with a self-reported BMI of 18.5-24.9 kg/m. genomes, thereby inflating estimates of the relative abun dance of this phylum (FIG. 10). To address this potential Example 9 confounding factor, 16S rRNA gene fragments from all 18 Comparison of Amplification Methods in Taxonomic microbiome datasets were identified and classified them taxo Assignments nomically. The results of this analysis confirmed that the three PCR-based methods underestimate the relative abundance of (0178 A frequently reported result from any 16S rRNA the Bacteroidetes (FIG. 13F). Moreover, results obtained gene sequence-based Survey is the relative abundance of bac from shotgun sequencing 16S rRNA gene fragments and PCR terial phyla. Given the broad nature of these phyla and the fact amplification of the V2/3 region showed the strongest corre that a relatively few phyla dominate the human distal gut lation (FIG. 13G).

SEQUENCE LISTING The patent application contains a lengthy “Sequence Listing section. A copy of the “Sequence Listing is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20110177976A1). An electronic copy of the “Sequence Listing will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR1.19(b)(3). US 2011/0177976 A1 Jul. 21, 2011 40

1. An array comprising a Substrate, the Substrate having 8. An array comprising a Substrate, the Substrate haying disposed thereon disposed thereon (a) at least one nucleic acid indicative of, or modulated in, (a) at least one polypeptide indicative of, or modulated in, an obese host microbiome compared to a lean host an obese host microbiome compared to a lean host microbiome, or microbiome, or (b) at least one polypeptide indicative of, or modulated in, (b) at least one nucleic acid indicative of, or modulated in, a lean host microbiome compared to an obese host a lean host microbiome compared to an obese host microbiome. microbiome. 9. The array of claim 8, wherein the polypeptide is encoded 2. The array of claim 1, wherein the nucleic acid comprises by a nucleic acid sequence selected from the nucleic acid a nucleic acid sequence selected from the nucleic acid sequences listed in Table 13 or Table 14. sequences listed in Table 13 or Table 14, or a nucleic acid 10. The array of claim 8, wherein the polypeptide or sequence capable of hybridizing to a nucleic acid sequence polypeptides are located at a spatially defined address of the listed in Table 13 or 14. array. 3. The array of claim 1, wherein the nucleic acid or nucleic 11. The array of claim 10, wherein the array has no more acids are located at a spatially defined address of the array. than 500 spatially defined addresses. 12. The array of claim 10, wherein the array has at least 500 4. The array of claim3, wherein the array has no more than spatially defined addresses. 500 spatially defined addresses. 13. The array of claim 9, wherein the nucleic acid sequence 5. The array of claim 3, wherein the array has at least 500 is selected from the group consisting of sequences encoded spatially defined addresses. by SEQID NO:1-273. 6. The array of claim 1, wherein the nucleic acid sequence 14. The array of claim 9, wherein the nucleic acid sequence is selected from the group consisting of sequences encoded is selected from the group consisting of sequences encoded by SEQID NO:1-273. by SEQID NO:274-383. 7. The array of claim 1, wherein the nucleic acid sequence 15-32. (canceled) is selected from the group consisting of sequences encoded by SEQID NO:274-383.