A metagenomic β-glucuronidase uncovers a core adaptive function of the human intestinal microbiome

Karine Gloux, Olivier Berteau, Hanane El oumami, Fabienne Béguet, Marion Leclerc, and Joël Doré1

Institut National de la Recherche Agronomique, Unité Mixte de Recherche 1319 Micalis, F-78352 Jouy en Josas, France

Edited by Todd R. Klaenhammer, North Carolina State University, Raleigh, NC, and approved June 1, 2010 (received for review February 4, 2010)

In the human gastrointestinal tract, bacterial β-D-glucuronidases (BG; XIVa), major reservoirs of BG-positive (11), are dys- E.C. 3.2.1.31) are involved both in xenobiotic metabolism and in some biosis markers in Crohn’s disease (12–15) and in intestinal carci- of the beneficial effects of dietary compounds. Despite their biolog- nogenesis (16). Nevertheless, the relative contribution of bacterial ical significance, investigations are hampered by the fact that only strains to the global intestinal BG activity remains unclear, and a few BGs have so far been studied. A functional metagenomic the molecular bases are essentially unknown. approach was therefore performed on intestinal metagenomic In the human digestive tract, bacterial BGs are known to be libraries using chromogenic glucuronides as probes. Using this strat- distributed among the Enterobacteriaceae family in some Firmi- egy, 19 positive metagenomic clones were identified but only one cutes genera (Lactobacillus, Streptococcus, , Rumino- exhibited strong β-D-glucuronidase activity when subcloned into an coccus, Roseburia, and Faecalibacterium) and in a specific expression vector. The cloned gene encoded a β-D-glucuronidase of Actinobacteria (Bifidobacterium dentium) (11, 17, 18). On (called H11G11-BG) that had distant amino acid sequence homolo- the basis of their sequence, BGs are highly homologous to gies and an additional C terminus domain compared with known β-galactosidases and only a few of them have been investigated β fi -D-glucuronidases. Fifteen homologs were identi ed in public and clearly annotated as β-D-glucuronidases in sequence data- – bacterial genome databases (38 57% identity with H11G11-BG) in bases. The most notorious prokaryotic BGs are those encoded by fi the phylum. The genomes identi ed derived from strains uidA homologs (also named gusA) from Escherichia coli, Lacto- from Ruminococcaceae, Lachnospiraceae, and . The bacillus gasseri, and Ruminococcus gnavus (17, 19, 20). genetic context diversity, with closely related symporters and In the present study, we report the identification of a BG from gene duplication, argued for functional diversity and contribution intestinal bacteria by a functional metagenomic approach. The to adaptive mechanisms. In contrast to the previously known fi β identi ed enzyme is unrelated to the previous ones and expands -D-glucuronidases, this previously undescribed type was present the number of bacterial enzymes responsible for glucuronidase in the published microbiome of each healthy adult/child investigated activity. Furthermore, its sequence and distribution among bac- (n = 11) and was specific to the human gut ecosystem. In conclusion, terial genomes and metagenomes highlights the relevance and our functional metagenomic approach revealed a class of BGs that the diversity of BGs within the intestinal microbiome. may be part of a functional core specifically evolved to adapt to the human gut environment with major health implications. We propose Results consensus motifs for this unique Firmicutes β-D-glucuronidase sub- High Frequency and Diversity of BG Activities in Metagenomic family and for the glycosyl hydrolase family 2. Libraries. A strategy was developed to assay BG activity from met- agenomic inserts in spite of the constitutive activity of the E. coli functional core | intestinal microbiota | functional metagenomics | glycosyl E. coli hydrolase | Firmicutes strain used to built the metagenomic libraries ( DH10B) (Fig. 1). The method consisted of a two-step screen of clones β β containing large metagenomic inserts. Using p-nitrophenyl -D- acterial -D-glucuronidases (BG; E.C. 3.2.1.31) are part of the glucuronide as chromogenic substrate (PNP-G deglucuronidation Bextensive human intestinal microbiome and are involved in test), the clones’ activity was first compared to the basal activity of the metabolism and bioavailability of food and drug compounds the receiving strain. Fosmids born by positive clones were further in the human body. They catalyze the hydrolysis of exogenous β transferred into an E. coli strain deprived of BG activity (E. coli -glucuronides naturally occurring in diet and drugs as well as L90 ΔuidA). The clones were then grown with 5-bromo-4-chloro- endogenous β-glucuronides produced in the liver by glucur- 3-indolyl-β-D-glucuronide in the medium (X-GlcA deglucuroni- onosyltransferases, a major xenobiotic detoxication pathway. In- dation test) to confirm BG activity. deed, numerous compounds including metabolites, vitamins, With this strategy, two libraries from human gut microbiota steroid hormones, xenobiotics, and drugs are excreted via the bile were screened: a metagenomic library of 4,608 clones combining and the digestive tract after conversion to a more hydrophilic glucuronidated form. Nevertheless, secondary deglucuronidation, primarily due to intestinal bacteria (1), promotes recycling of the This paper results from the Arthur M. Sackler Colloquium of the National Academy of aglycone forms through the enterohepatic cycle, which prevents Sciences, "Microbes and Health," held November 2–3, 2009, at the Arnold and Mabel their removal from the human body. This phenomenon is par- Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. ticularly well described in the case of xenobiotics (2, 3) and is The complete program and audio files of most presentations are available on the NAS suggested for circulating hormones (4, 5). Despite their detri- Web site at http://www.nasonline.org/SACKLER_Microbes_and_Health. mental effects, bacterial BGs are also thought to have beneficial Author contributions: K.G., O.B., and J.D. designed research; K.G. performed research; K.G., H.E.o., and F.B. contributed new reagents/analytic tools; K.G., O.B., and J.D. analyzed effects, notably on the bioavailability of active metabolites derived data; and K.G., O.B., M.L., and J.D. wrote the paper. fl from dietary compounds, including lignans, avonoids, sphingo- The authors declare no conflict of interest. lipids, and glycyrrhizin (6–8). This article is a PNAS Direct Submission. Finally, observations argue in favor of the involvement of Data depostion: BG sequences from C7D2 and H11G11 metagenomic inserts have been bacterial BGs in several pathologies. Indeed, a lower BG activity deposited at EMBL/GenBank/DDBJ databases under accession nos. FN666674 and was detected in the feces of Crohn’s disease patients compared to FN666673, respectively. healthy subjects (9). Conversely, a high BG activity is recognized 1To whom correspondence should be addressed. E-mail: [email protected]. as a prognosis marker for colon cancer (10). Furthermore, the This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. Clostridium leptum group (cluster IV) or Lachnospiraceae (cluster 1073/pnas.1000066107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1000066107 PNAS | March 15, 2011 | vol. 108 | suppl. 1 | 4539–4546 Downloaded by guest on September 29, 2021 from 0.02 to 1.30 U (units) with, as expected, a lower global level of BG activity when metagenomic inserts were expressed in the ΔuidA E. coli strain (Fig. 2).

A Unique Class of BGs in the Firmicutes Phylum. An initial attempt was made to sequence the potential BGs present in the 19 BG- A positive clones on the basis of degenerate primers. The sequence of the primers used derived from the sequences of known pro- karyotic BG genes (17, 19). As this approach proved to be un- successful, the BG-positive fosmids were subcloned and subse- B quently sequenced. The 40-kbp inserts were fully sequenced, and six candidate genes were identified as potentially encoding BG activity. The predicted proteins encoded by these genes were the following: β-galactosidase, antimicrobial peptide ABC transport system, serine/threonine kinase, sporulation protein, 2C-methyl- D-erythritol 2,4-cyclodiphosphate synthase, and aspartyl-tRNA synthetase (Table S1). The corresponding genes were subcloned into an expression plasmid (pRSF plasmid) and further trans- C formed into E. coli BL21(DE3) or E. coli L90 T7+ (uidA+ and E.coli DH10B (b-glucuronidase+) ΔuidA strains, respectively). Only the gene encoding the putative β-galactosidase was able to promote BG over-expression in the uidA+ strain (5.3 versus 0.4 U -glucuronidase activity

b for the control deprived of insert) and to induce a BG activity in the ΔuidA strain (4.9 U). This gene originated from a meta- E. coli L90 (b-glucuronidase-) genomic insert (H11G11 insert) in which BG activity was in the - + fosmid b-glucuronidase + fosmid b-glucuronidase+ Screening of range of the 19 fosmids (Fig. 2). This insert was from a pool of fecal samples from healthy individuals. Remarkably, the identified gly- cosidase had low homology with known BGs (<33% identity, Fig. 3) and signatures that diverged from the two consensus sequences of glycosyl hydrolase family 2 (82 and 93% similarity with Prosite D patterns PS00719 and PS00608, respectively; Table S2). This gly- cosidase also had low homology with putative BGs recently iden- tified in the gut dominant species Roseburia intestinalis and Bacteroides ovatus using degenerate PCR primers (27% and 33% identity, respectively) (11). However, the two glutamate residues involved in the catalytic mechanism of other members of family 2 Fig. 1. Screening of BG activity from metagenomic libraries. The meta- BGs (21) were conserved in this BG, and a search in the Conserved genomic clones originated from different libraries derived from healthy and Domains Database (22) revealed the highest matches to the TIM Crohn’s disease subjects’ feces and from a distal ileum biopsy. Metagenomic + barrel domain of glycosyl hydrolase family 2 (pfam02836) and BGs libraries were screened for over-expressing clones (compared with the uidA (PRK10150) and lower matches to β-galactosidases (COG3250, E. coli receiving strain) using PNP-G as substrate with a two-step method: PRK09525, PRK10340). An additional C-terminal domain com- a visual detection and a quantitative validation. Fosmid activity was further fi assessed by transfer into a BG-negative strain (E. coli L90 ΔuidA). Subclones pared to known BGs was identi ed, but no putative function could (3-kbp fragments inserted into pcDNA 2.1 plasmids) were further screened be proposed (expected value 0.01). for PNP-G deglucuronidation, and candidate genes were cloned into an A search of homologous proteins in the National Center for expression vector (pRSFDuet-1) to definitively confirm BG activity. Random Biotechnology Information (NCBI) nr database revealed a set of screening of 4,608 metagenomic clones from fecal and ileal libraries (E. coli 15 H11G11-BG homologs with more than 50% coverage and DH10B as receiving strain) involved the following: (A) a qualitative screen similarity (Fig. 4). On the basis of amino acid sequence alignment, with precocity of β-glucuronidase activity detection that yielded 1.79% of clones over-expressing the activity; (B) a quantitative validation of clones over-expressing the activity (E. coli DH10B deprived of insert as control) that yielded 0.73% of positive clones; (C) a selection of fosmids generating E. coli DH10B (uidA+ background) β-glucuronidase activity (following transformation of a β-glucuronidase- negative strain) yielding 0.41% of positive clones among all clones tested; 2 E. coli L95 (uidA- background) and (D) the identification of the genes involved (subcloning, sequencing, and bioinformatic analysis). 1,5

1 (U) uncharacterized inserts (Materials and Methods) and a sublibrary 0,5 of 1,536 clones previously selected for the presence of 16S rRNA 0 genes (14). An initial visual screen was performed and revealed H1D1 b -glucuronidase activity I32E4 I32E4 H3D4 C7D2 H2G8 H3C6 H3C6 I40A2 I24A5 I40C8 C12F1 H11B1 C11H2 C11H2 C12H1 C11D3

that 1.79% of clones over-expressed BG activity in the unchar- I53H12 H51F12 C208E9 H21B12 acterized library. The quantitative validation step confirmed an H11G11 over-expression for 40% of them (Fig. 1B), and transfer of the Metagenomic clones Δ corresponding metagenomic inserts into the uidA strain allowed fosmid DH10B+ Control the determination of a hit rate of 0.41% of positive fosmids. In- Fig. 2. BG activity of positive metagenomic fosmids in E. coli DH10B and E. cluding the second screen performed on the 16S rRNA gene coli L90 ΔuidA. Activity was assayed using PNP-G as substrate and expressed in sublibrary, 19 positive fosmids were identified of a total of 6,144 units (U) of BG activity (ΔOD 405 nm·min−1 · mg−1). Means and SD values are metagenomic clones. BG activity in the positive clones ranged determined from experiments performed in quadruplicates for each clone.

4540 | www.pnas.org/cgi/doi/10.1073/pnas.1000066107 Gloux et al. Downloaded by guest on September 29, 2021 10 20 30 40 50 60 70 80 .... | .... | . ...| .... | .... | ....| .... | . ... | ....| .... | . ... | .... | .... | . ...| .... | .... | H11G11-BG ------MR E VININKNW L F S KKEQPVPKTL------PEDWESVNLP H T WNGTDGQDGGNDYY R G K CC YV K LLK gi 12802352 Lactobacillus gasseri --MESALY P I Q NKY R F NTLMNG T W Q F E TD P NSV G L D E-GW N KEL P DPEEMP V PG TF AE LTTKRD-RKYY T G D FW YQK DFF gi 584839 E. coli -----MLR P V E TPT R E IKKLDG L W A F S LD R ENC G I D QRWW E SAL Q ESRAIA V PG SF ND QFADAD-IRNY A G N VW YQR EVF gi 34581788 Ruminococcus gnavus MLEYSELY P I Q NEY R M MQSLDG M W K F Q FD P EEI G K K S-GW E NGL P APVSMP V P S SF AD FFTDHK-ERDY C G D FW YE T EFY

90 100 110 120 130 140 15 01 60 .... | .... | . ...| .... | .... | ....| .... | . ... | ....| .... | . ... | .... | .... | . ...| .... | .... | KADLGEKPVHYIQFD GV N SSAE V W WN GE K IGSH D G G Y SAF R VRI P EIS---DEN I LT V Y ADN S PN D TVY P Q VADFTFYGG IPS FLKKKELYIR FG SVT H R AK V F IN GHE V GQH E G GF LPFQ VKIS NYINYDQTN R VT VL VNNE LS E KAI P C G T EEILDN G IPK GWAGQRIVLR FD AVTH YGK V W VN NQE V MEH Q G GY TPFE ADVTPYVIAGKSV RI TVC VNN ELN W QTI P P G M VITDEN G LPA EWRNKKIWLR F G SI T H R GT V Y CN GM E I TSH E G G F LP V LADI S TVAKPGQVN Q VV V K IN N E L N E TSL P C G A TKILNN G

170 180 190 200 210 220 23 02 40 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | ------IYRDVTVIG V DE S H FDLEFYGS--SGIMITPKVSGLSAAVNITAR V T NPQDCSV R F VVTD A D K KPV G EKNVDAS QKL AQPYFD F F N Y S G I MR N V W L LAL P Q SQITNFKLNYQLAN--NKA T ITYN I EANNNAEF KV T L F D N QKEVA C ATSKNTS KKK QSYFHD FFNYA G I HR S VM L YTT P N TWVDDITVVTHVAQDCNHA S VDWQ V V ANG--DV S V E L R D A D Q QVV A TGQGTSG RKL AKPYFD F F N Y S G L QR S V W V IAL P E ESVKDYSVDYELCG--TDA L VKYE V V TTGEHPV I V R L L D A E G ELV A ETEGKEG

250 260 270 280 290 300 31 03 20 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | DGKTVIEIENA H L W N GTQDPY L Y S L TAELLKDGEKT D E I S VRFG C R S FSI D PQK G FIL N G KP C PL R G V SR HQ D R PGI G N A ----SLT IKN P H LW S -PNDPY S Y K I KIEMLEDGKTV D EY T DKIG I RT V K I V NDK - IL LN NHP I Y LK G F G K H E DF NVLG K A ----TLQ VVN P H LW Q -PGEGY L YE L CVTAKS-QTEC D I Y P LRVG IRS V A V K GEQ -F LI N H KP F Y F TG FGR H ED A DLR G K G ----ILQ VAN A R L W E -VRNAY L Y Q I VILITDGNGVL D E Y R EKIG I R T V R I E GTK - IL L N D RP V Y L K G F G K H E D F PIL G R G

330 340 350 360 370 380 3904 0 0 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | LTEKEHREDM D LICELG A N T IR L AH Y Q HSRVFY D LC D E C G M AVWAE I P Y ISRHMPGG------R VNESIIKRDY E CMK W I G AN CFR S S HY PY A E E W Y Q YA DK YG F LII D EV P A V G L NRS I TNFLNVTNSNQSHFFASKTVP E L K FDNVLMVHDH A LMD W I G AN SYR T S H YPYAEEM L DWA DE H G I VVI DET A A VG F NLS L GIGFEAGNKPKELYSEEAVNG E T Q FHWGIVKRDF E CLK W T N A N C FR T S H Y P Y A E E W Y Q FA D E E G F LII D E V P A V G M MRS T RNFVAAGSGNYTYFFEALTVP E L L

410 420 430 440 450 460 47 04 80 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | ENTVSQMK E L I Y QNIN H P S I I V W G L S N E I TMNGASDSSLI E N HRMLNDLVH K I D P - TR P T T I AVLSMCD P G EEYVR-IPD KVHE QEI K EM I D R D Q R H PS VIA W S LF NEP E S---TTQESY D Y F K DIFAFAR K L D PQ NRPY T G TLVMGSG P K VDKLHP L C D QAHL QAI K EL I A R DK NHPSVV MW SIAN EP D T---RPQGAR E YFA PLAEATR K L D P- TRPI T C VNVMFCD A HTDTISD L F D KSHI ADT E E M I T R D K N H P S V I A W S L F N E P E T---ITDYAY E Y F K EVFAAAE T YD F QSR P M T G AFEKNSK P E LCKCYP L C D

490 500 510 520 530 540 55 05 60 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | VLSYNH Y F G W Y G GKT------DMYGPWF D K FHKKYPDRAVGMS E Y G C EAL N WHTSDP Q QGDYTE E Y Q A KY H E DVIR Q IA FVCL NR YY GWYV AG G P EIVNA K KMLEDE L D G W Q NLK L N KP F VFT EF G A D T LS SSHR LP - DEM W S Q E YQN EY Y QM YFD IF K VLCLNRYYGWY V QS G - DLETA E KVLEKE L L AW Q -EK L H QP I IIT EY G V D TL A GLH S MY -TDM WSE E YQC AW LD M Y HR V F D FICL N R Y Y G W Y I SG G P EIEEA E ELFRDE M D R W K AKE L N VP F VFT E F G T D T M AGLH K LP - SIM W S E E Y Q K EY L EM NFR V F D

570 580 590 600 610 620 63 06 40 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | VRPWLWSTHVW N M FD FA A DARSEGGENG M N H K G L VT F D R K YK K DS FY AYK A W L SDEPFVHICGKRYIDRPESMTSVTVYT KYPFICGE L V WN FA DFK T S EG I M R - -V GG ND K GI FTR DREP KD IA FTLKK R WQ QL N ------RVSAVVGEQ V W NFADFA T S QG IL R- -V GGNK K GI FT RDRK P KS AA FLLQK RW TGM N F G EKPQQGGKQ------SYEFVQGE L A W N F A D F Q T T EG I M R - -V D G N H K G V F T R D R Q P K A AA VV FKD R W E KK N E L F------

650 660 670 680 690 700 71 07 20 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | NEPSVELFANGKSLGVQKRGEFPFFYFSVPNEGETVLTAKAGDCTDESRIRKVDKANPDYVLQEEGAVINWFEIETPPGY ------

730 740 750 760 770 780 79 080 0 .... | .... | . ...| .... | .... | ....| .... | . ...| ....| .... | . ... | .... | .... | . ...| .... | .... | MSVNDTIGDILATAKGKLLALKILKMVRANMKKNKGKSTGGMADMAKGMKINKSIIEMGKGFSVKRVCMMAGGLFTKEQI ------

810 .... | .... | . ...| LEINASLNKIKKKQK ------

Fig. 3. Amino acid sequence alignment of the unique BG (H11G11-BG) and known BGs from the gut microbiota. Alignment was performed using the ClustalW multiple sequence alignment program. Framed amino acids were conserved in at least three sequences.

all homologs constituted a biologically relevant group (COFFEE in gut bacteria from the Firmicutes phylum including Bacteroides score: from 89 to 95) with 38–57% identity with H11G11-BG. capillosus, recently reassigned to Gram-positive bacteria (23). The These proteins were all identified in prokaryotic genomes under only exception was Shuttleworthia satelles DSM 14600, which is sequencing and were annotated as “uncharacterized protein” ex- a Firmicutes isolated from the oral cavity (24). The closest neigh- cept the one from Paenibacillus sp. JDR-2, which was annotated as bors were functionally uncharacterized genes in gut Bacteroidales a β-galactosidase. The H11G11-BG homologs were identified only and represented a likely subgroup of Bacteroidetes BGs (Fig. 1B).

Gloux et al. PNAS | March 15, 2011 | vol. 108 | suppl. 1 | 4541 Downloaded by guest on September 29, 2021 Fig. 4. Identification of a unique class of β-D-glucuronidase, phylogenetic affiliation, and association with potential symporters. Detailed tree view of the H11G11BG-like homologs identified in GenBank genomic databases (Fig. S1). The unique BGs (>50% of both coverage and similarity with H11G11-BG) have been identified only in Firmicutes species, but a potential BG subgroup has also been identified in Bacteroidetes constituting an independent subclass of BGs. Strains known to generate a BG activity are F. prausnitzii M21/2, R. gnavus ATCC 29149, and B. capillosus ATCC 29799. Strains demonstrated as BG positive in this study are S. variabile DSM 15176, B. formatexigens DSM 14469, C. bartlettii DSM 16795, B. ovatus ATCC 8483, P. merdae ATCC 43184, and P. johnsonii DSM 18315. aClassification according to Carlier et al. (23). bPhylogenetic assignation of metagenomic inserts are as described in Materials and Methods (H11G11 insert: 41% Clostridium genus. C7D2 insert: 81% R. gnavus species.).

The unique Firmicutes BGs had amino acid sequences distantly A–B) and more broadly for the glycosyl hydrolase family 2 (Fig. related to known BGs, even from the ones previously character- 5C). Interestingly, a clear distinction inside the unique BG family ized in this phylum (Table S2 and Fig. S1), and had specific was observed. Indeed, the invariant histidine/tyrosine/glutamine uncharacterized and additional domains (Fig. S2). Most of the motif was conserved in the Bacteroidetes subgroup but was identified genes were restricted to the Clostridiales order, in- replaced by a histidine/tyrosine/proline motif in the Firmicutes cluding members of the Ruminococcaceae, Lachnospiraceae, and subgroup (Fig. 5C and Fig. S3). Clostridiaceae families (Fig. 4). Three strains were previously known to exhibit BG activity (i.e., Faecalibacterium prausnitzii Is Functional Diversity of Unique BGs Based on Associated Symporters? M21/2, R. gnavus ATCC 29149, and Bacteroides capillosus ATCC Insert sequence alignments and genetic environment exploration 29799). Seven strains of Firmicutes and Bacteroidetes containing of 15 of the 17 Firmicutes BGs identified revealed that the con- H11G11-BG homologs in their genomes were tested for their served genetic environments close to the BG genes were restricted ability to hydrolyze PNP-G (i.e., B. ovatus ATCC 8483, Bryantella to symporters. Fifteen BGs were associated with putative sym- formatexigens DSM 14469, Clostridium bartlettii DSM 16795, porters that, by sequence homology, clustered into four types (Fig. Roseburia inulinivorans DSM 16841, Subdoligranulum variabile S4). The symporter homologies correlate well to the BG subclades DSM 15176, Parabacteroides johnsonii DSM 18315, and Para- (Fig. 4). Despite the diversity of the genetic organizations ob- bacteroides merdae ATCC 43184). Except for R. inulinivorans served (direct vicinity, intermediate sequences, lengths) (Figs. S4 DSM 16841 (BG activity <0.025 U), all strains tested exhibited and S5), most of these putative symporters had best hits for high BG activity. The search of known uidA homologs in their conserved domains corresponding to Na+/melibiose symporter genomes and in the three Firmicutes strains with known BG ac- tivity led to different situations (Table S3). For most of the and related transporters (COG2211) or the sugar transporter Gph genomes tested (7/10), the homologs found were more closely (TIGR00792), indicating that these BGs are involved in the me- related to the H11G11 protein sequence than to UidA. The tabolism of extracellular glucuronides. presence of the two classes of BG (uidA and H11G11-like) within The Unique BG Within the Phylogenetic Core. Among healthy adults the same genome remained possible for F. prausnitzii M21/2 and B. formatexigens DSM 14469, and even a third class was suggested the human intestinal microbiota phylogenetic core consists of in B. ovatus ATCC 8483. Nevertheless, these data suggest that in a restricted set of dominant species (25). To determine the – most cases H11G11-BG was responsible for the observed BG presence of the H11G11-BG positive species within this phylo- activity. Furthermore, paralogs were found in the genome of S. genetic core, sequence alignments between the 16S rRNA of variabile DSM 15176, B. formatexigens DSM 14469, and R. gnavus H11G11-BG-positive strains and the 66 prevalent operational ATCC 29149. taxonomic units (OTUs) (25) were performed. Almost half of the Because the two existing Prosite consensus signatures (PS00719 BG-positive strains (i.e., F. prausnitzii M21/2, R. inulinivorans and PS00608) for glycosyl hydrolase family 2 could not include DSM 16841, S. variabile DSM 15176) belonged to the phyloge- these unique BGs (Table S2), we propose different consensus netic core (>99% 16S rRNA identity). Because the phylogenetic motifs and a signature for the unique Firmicutes BG group (Fig. 5 core represents only a very few highly prevalent and dominant

4542 | www.pnas.org/cgi/doi/10.1073/pnas.1000066107 Gloux et al. Downloaded by guest on September 29, 2021 A 1FITOM tis05=htdiw e 016-e2.1=eulav-E5112=rll71=s

OM T 2FI 5=htdiw tis0 e e6.1=eulav-E4802=rll71=s 595-

3FITOM 334-e3.1=eulav-E4261=rll71=setis14=htdiw

SfarP(puorgGB-11G11Hehtroferutangi t:onrettaptseB-t )GBsetucimriF B G-x( R-)4,2 -L A- Y-H- Q- -H- [ 2(x-]SDA )- Y-F x-]ED[- -[A -]C D-[E RQ x-] G- MLIF[- -] A[ VI [-] ILV A-W-] - I-E P- [- FY -] -I TS[ (] f ssenti 001 )

C Revised desoporpserutangis2ylimaFsesalordyhlysocylg -H-)2(]NCATS[-R-]DWYFMVIL[-x-N:)91700SP(1nrettapetisorP- Y -]ND[-)3(x-)2(]SWYFMVIL[-)4(x-P- x [-G-)2( L )4(]WYFMVI - eN w SP 0 170 9 tap t nre : N[ T x-] - IL[ MV WYF D R-] - S[ TA NC L 2(] ) Y-H- [- PQ x-] )4( - IL[ V WYFM S 2(] ) 3(x- ) D[- N x-] ( -)2 G IL[- MV WYF A](4)

-P sor ti pe a tt (2nre L[-]ASFMVIL[-]SFMVIL[-]CAS[-]VPATS[-]YRH[-N-]WVRK[-]FLQNED[:)80600SP I FMV -]S [-W SG ]V x- -)3,2( -N E - ttap80600SPweN e nr : [ FLQNED H WVRK[-] YI HF N[-] H PATS[-]YRH[-] S[-]V MVIL[-]CA F ASFMVIL[-]S C [-] FMVIL S ]CT W- SG[- -)3,2(x-]AV N E-

spuorgGB-11G11H

orP- s eti pa (1nrett :)91700SP F(puorgGB-11G11H- imri c etu s :) N[ T -] T[ SA -] [ VIL ] R- - Y-H-A-L - -Q H [- DA S )2(x-] - [YF D [-x-]E AC] D- -[EQR –] -x -G [ MIL F [-] A ]VI - LI[ V]-W 1G11H- 1 uorgGB- p oretcaB( i ted e :)s N - W-]TV[-]VI[-I-G-]NY[-K-D-M-L-DYF-Y-T-A-Q-P-Y-H-A-L-R-V--A-

- rP :)80600SP(2nrettapetiso 1H- 1G11 gGB- r puo ( riF m uci t )se : N-[ FHYI ]-[ ]HN H- - A[ ]P - CS[ ] -I- [ VI AC]-[VFTC W-] - L[-]AG[ I -]V [ ]QS - E-N puorgGB-11G11H- ( :)setedioretcaB L-G-W-]FV[-C-I-S-P-H--N-Y-H - F E-N-

Fig. 5. Conserved motifs within the H11G11-BG group. (A) Conserved motifs within the 17 proteins identified in this study (>50% coverage and similarity with H11G11-BG). (B) Signatures proposed for the unique Firmicutes BGs. (C) Previously undescribed glycosyl hydrolase family 2 patterns, including the 17 unique Firmicutes BGs identified in this study and the three Bacteroidetes BGs. Amino acids added compared to the actual patterns are in blue.

OTUs, this result underlined the importance of this BG function times less frequently represented in the human gut metagenomes. in the human gut. Remarkably, the H11G11-BGs subgroup was almost not repre- sented or was totally absent in microbial metagenomes arising A BG Highly Prevalent in and Specific to the Human Gut. The fre- from other environments whereas the uidA BGs subgroup was quency of the unique Firmicutes or Bacteroidetes BGs reached an more systematically found in soil ecosystems (Waseca County −8 average of 9.5 ± 3.0 × 10 hit/bp within the human gut meta- Farm Soil Metagenome ID 13699). genomes (Fig. 6). At least one homolog can therefore be found To determine the distribution among the human population, every 107 bp, which is approximately equivalent to 104 bacterial homologs of H11G11-BGs were searched in gut metagenomic genes. Previously known BG genes (uidA homologs from Clos- databases from different healthy individuals (26) (Fig. 7). tridiales, Lactobacillales, and E. coli K12) were at least three Homologs of the unique Firmicutes or Bacteroidetes BGs

11G11H)%14(muidirtsolC 70-E04,1 70501220_PZiitteltrab.C S. av ri ba il PZe _03 77 6 51 8 1G11H 1 sGB- iiztinsuarp.F Z 48609020_P sgolomohsetucimriF 4943573_PZsnaroviniluni.R 70-E02,1 43374820_PZ.pssullicabineaP 43374820_PZ.pssullicabineaP 21556020_PZsutavo.B sGB-11G11H 05387430_PZiinosnhojsedioretcabaraP sgolomohsetedioretcaB 98413020_PZeadremsedioretcabaraP 70-E00,1 98413020_PZeadremsedioretcabaraP Ana ore co psucc er ov tii 1C Q1E5 07TF8BesneinfahmuiretcabotifluseD 7J7W6Qsuvang..R 80-E00,8 3HLM9Bmulihpomreht.A 9VQR1Bsnegnirfrep.C 35GC2CsuidartetsuccocoreanA sGBnwonK suidartetsuccocoreanA C 35GC2 sgolomohAdiU 4USD3QeaitcalagasuccocotpertS 80-E00,6 0WAC1CeainomuenpsuccocotpertS susonmahrsullicabotcaL C 9STJ2 otcaL ab sullic r nmah so 9STJ2Csu 40850P21Kiloc.E Frequency in metagenomes (hits/bp) 0,4 0 -E 80

80-E00,2

0 0, 0E+00 Human Fish gut Gut from Gut from Deep-sea lean mice distal gut obese and and obese whale-fall whale-fall Termite gut Human gut County Farm Global Ocean Global Ocean Soil of Waseca

Fig. 6. Frequency of the unique β-glucuronidases and known β-glucuronidases in environmental and human gut metagenomes. Homologs of unique and known BG proteins were searched in NCBI metagenomic project datasets (Materials and Methods). The hit threshold was at least 50% similarity with 50% sequence coverage. Results of frequencies were expressed as hits per base pair to correct the different sizes of the metagenomic datasets investigated.

Gloux et al. PNAS | March 15, 2011 | vol. 108 | suppl. 1 | 4543 Downloaded by guest on September 29, 2021 2, 5 0 E - 07 F i rm H1 1 G1 1 li ke (7 st r a i ns ) sgolomohsetucimriFGB-11G11H

NERDLIHCdnaSTLUDA B a ct H1 1 G1 1 BGB-11G11H li ke (3 st r a i ns ) olomohsetedioretca sg

2, 0 0 E - 07 un i p ro t C lo st (6 st r a i ns ) rpinU(selaidirtsolCGB )sniarts6,to

p i n un i p ro t L a ct o (3 s tr a i ns ) orpinU(selallicabotcaLGB iarts3,t )sn sgolomohAdiU 1, 5 0 E - 07 . EB . cGo li K 1 2iloc.E P 0 5 8 040850P 4

1, 0 0 E - 07 Frequence (hits/bp)

5, 0 0 E - 08 Frequency in metagenomes (hits/bp)

0, 0 0 E + 0 0 W 2 F V 2 F T 1 F S 1 F R n I D n I A n In A In D In R F 1 S F 1 T F 2 V F 2 W F 2 X F 2 Y su b 7 su b 8

0 5 , 2, 5 0 E - 07 NAFNI TS 0 0 , 2, 0 0 E - 07

0 5 , 1, 5 0 E - 07

0 0 , 1, 0 0 E - 07

0 0 , 5, 0 0 E - 08

0, 0 0 E + 0 0 n In B In E In M F1 U

Fig. 7. Distribution of the unique and known BGs among adult, child, and infant gut metagenomes. Homologs of unique and known BG proteins were searched in the two projects Human Gut Metagenome (13 healthy individuals) (ID 28117) and Human Distal Gut Biome (ID16729) as described in Fig. 6. Results are pre- sented per BG group (i.e., Firmicutes-BGs, Bacteroidetes-BGs, UidA homologs) and expressed as means of hits per base pair for each group (details per BG protein, Fig. S6).

were identified in all 11 adults and children but only in one infant already reported to exhibit BG activity (i.e., F. prausnitzii M21/2, among the four explored (Fig. 7 and Fig. S6). In contrast, uidA R. gnavus ATCC 29149, and B. capillosus ATCC 29799) (11, 17, BGs were less abundant among the “adults/children” group (6/11, 29) and strains that we demonstrated here as BG positive (i.e., S. 8/11, and 7/11 for E. coli, Clostridiales, and Lactobacillales BGs, variabile DSM 15176, B. formatexigens DSM 14469, C. bartlettii respectively) but were present in all infants. Furthermore, in DSM 16795). In the same way, BG-positive strains from the adults and children, the H11G11-BG frequencies were always Bacteroidetes phylum were evidenced (i.e., B. ovatus ATCC 8483, higher than those of the gusA/uidA BGs (P ≤ 0.01). They ranged P. merdae ATCC 43184, and P. johnsonii DSM 18315). Thus this − − from 2.3 × 10 8 to 2.2 × 10 7 hit/bp, and the frequency in each class of BGs, which is present in many dominant Firmicutes and individual was not correlated to the previously determined level of Bacteroidetes species, might represent a major deglucuronidation COG3250 (β-galactosidase/β-glucuronidase) (26). pathway in the human gut. The unique BGs exhibit conserved domains that diverged from the Prosite consensus sequences Discussion (PS00719 and PS00608) of glycosyl hydrolase (GHase) family 2. Using a functional metagenomic approach, we revealed a high We thus proposed consensus signatures to include this unique frequency of BG activity (0.41%) in the human gut microbiota. class of BGs. Interestingly, the invariant “HYP” amino acid motif In the high range of hits usually obtained during functional found in the PS00719 signature was maintained in Bacteroidetes screens of glycosyl hydrolases from environmental or gut meta- BGs but specifically replaced by a “HYQ” motif in Firmicutes genomic libraries (from 0.00002% to 0.8%) (27), this result is BGs. These Firmicutes BGs may therefore have originated from coherent with the particular enrichment of the gut microbiome a common ancestor and evolved in the context of the intestinal with β-galactosidase/β-glucuronidase family (28). This result pro- environment. Considering the importance of GHase family 2, bably reflects the abundance and the diversity of glucuronide which is the third most abundant family of GHases in the Human compounds that reach the intestine and the critical role of BGs gut (27), the functional significance of these sequence specificities in several bacterial metabolic pathways. However, this high score deserves further investigation. could also be due to the secondary deglucuronidation potenti- Our observations support an important physiological role for alities of proteins that exhibit no obvious sequence homologies these unique BGs. Indeed, in contrast to previously identified BGs with BGs as suggested in this study. We were unable to find (UidA/GusA homologs), these BGs are found with high fre- evidence of BG activity among these cloned genes. Further quency and high specificity in the human gut microbiome and are investigations of these genes in their genetic context will be well represented and distributed among adults and children but required to definitely demonstrate their functionality. absent in most of the very young infants tested despite the relative Our functional screen led to the identification of a unique β-D- known abundance of Firmicutes in their microbiota (26, 30). The glucuronidase from the human gut microbiota that possesses absence of correlation between frequencies of H11G11-BG several key features, including an additional C terminus domain homologs and the galactosidase/glucuronidase COG family and distant sequence homologies with known BGs. Mining of within human gut microbiomes demonstrates the necessity of genome sequence databases demonstrated that this unique BG is improving this family classification and annotation. Furthermore, specifically present into several Firmicutes species of the gastro- among the bacterial species harboring this enzyme family, three intestinal tract or the oral cavity including C. bartlettii, S. variabile, Firmicutes species belong to the restricted phylogenetic core of B. formatexigens, R. gnavus, Penibacillus sp., R. inulinivorans, S. the human intestinal microbiota (25), suggesting an ecological satelles, and F. prausnitzii. This group includes bacterial strains drive that ensures the presence of the activity in spite of micro-

4544 | www.pnas.org/cgi/doi/10.1073/pnas.1000066107 Gloux et al. Downloaded by guest on September 29, 2021 biota variability among humans. This unique class of BGs might estimated by HindIII digestion followed by pulsed-field gel electrophoresis. therefore play a critical role in the establishment of a definitive Fosmid DNA was subjected to mechanical shearing, and 3-kbp fragments were and stabilized adult microbiota (31). Results also revealed du- inserted into the high-copy plasmid pcDNA2.1. A total of 384 subclones origi- plication of unique BG-encoding genes in several strains and as- nating from each metagenomic BG-positive clone were screened for BG activity, and positive subclones were further sequenced(MWG Biotech). A homolog search sociation with diverse permeases. These organizations are was made using the Blastp algorithm available on the NCBI web site (http://www. biologically highly relevant because they are important in gener- ncbi.nlm.nih.gov/BLAST/). Domain architectures, potential functions, and asso- ating genetic variability and in facilitating adaptive evolution and ciated patterns were analyzed using the InterProScan (http://www.ebi.ac.uk/ functional diversity. Interestingly, most of the species harboring InterProScan/), MyHits (http://www.isb-sib.ch/), PROSITE (http://www.expasy.org/ the unique BG were from the C. leptum group (cluster IV) or prosite/), SMART (http://smart.embl-heidelberg.de/), PFAM (http://www.sanger. Lachnospiraceae (cluster XIVa), both subject to microbiota dis- ac.uk/Software/Pfam/), Motif Scan (http://myhits.isb-sib.ch/cgi-bin/motif_scan), or turbances in the physiopathological contexts of Crohn’s disease SUPERFAMILY (http://supfam.org/SUPERFAMILY/)softwares. (12, 32–34), ulcerative colitis (35), familial Mediterranean fever (36), and intestinal carcinogenesis in Apc (Min) mice (16). Recent β-Glucuronidase Candidates, Cloning, and Expression. Genes encoding putative fi studies revealed a functional link between xenobiotic neutraliza- BG were ampli ed by PCR with metagenomic fosmid DNA as templates and appropriate primers (Table S4). PCR products were purified by agarose gel elec- tion, steroid metabolism, and inflammation via mutual repression trophoresis and the Wizard SV Gel and PCR clean up system kit (Promega). PCR of xenobiotic and hormone nuclear receptors and the NF-κB– – products were subcloned into pGEM-T plasmid (Promega) and fully sequenced. signalling pathways in the intestine (37 39). Determination of pGEM-T plasmids with exact insert sequence were digested by BamHI and PstI potential xenobiotics or hormone substrates for BGs may be restriction enzymes (except for the putative serine/threonine protein kinase critical to determining their physiological function(s) and potential cloned with the EcoRI and PstI restriction site). Inserts were further purified by contribution to the modulation of eukaryotic regulatory networks. agarose gel electrophoresis and the Wizard SV Gel and PCR clean up system kit In conclusion, the functional metagenomic approach reported (Promega). Purified inserts were ligated into pRSFDuet-1 plasmid (Novagen) here allows the identification of a unique BG type, which is pre- previously digested with the same restriction enzymes. All cloned genes were dominant in the gut microbiota of human adults and children. This sequenced to ensure that no errors were introduced. The plasmids obtained were BG family, identified in several Firmicutes, may be part of the transformed into both E. coli BL21(DE3) (Stratagene) and the L90 T7+ E. coli strain. The L90 T7+ E. coli strain was obtained by transforming the L90 ΔuidA functional core of the human microbiome recently proposed E. coli strain with the pAR1219 plasmid, which expresses the T7 polymerase under for the GIT ecosystem (40) and may represent an adaptive the control of an isopropyl 1-thio-β-D-galactopyranoside (IPTG)-inducible pro- evolution. Further investigations should address the impact of moter. The resulting clones were analyzed for PNP-G deglucuronidation as de- functional BGs in individuals‛ gut physiology and their involve- scribed above, after gene expression induction with IPTG. ment in the crosstalk with host cells, especially with respect to immune homeostasis. Sequence Analysis of the Unique BG Group and Associated Symporters. Homologs of H11G11-BG were searched on May 2009 by Blastp method Materials and Methods (limit expect E-value ≤2e-45) on the NCBI nr database, including all non- Metagenomic Libraries. Fosmidic metagenomic libraries were constructed as redundant GenBank coding sequences. Radial tree of Blastp results was previously described using the EpiFOS Library production kit (Epicentre Tech- computed by the fast minimum evolution method (threshold of maximum fi nologies) and E. coli DH10B as a host (14). Three fosmidic metagenomic li- sequence difference: 0.85). De nitive presentation and phylogenetic symbols braries were used in this study; namely the “Healthy,”“Crohn,” and “Ileum” were realized using Molecular Evolutionary Genetics Analysis (MEGA) soft- libraries. The first two libraries were previously constructed from bacterial DNA ware version 4.0 after conversion in Newick file format. Proteins showing at extracted from pools of feces of six healthy individuals and six inflammatory least 50% length coverage and 50% similarity of aligned sequences were bowel disease patients, respectively (14), and the third from a colorectal cancer retained to constitute the group of H11G11-BG homologs. Sequence simi- patient biopsy obtained from the healthy distal part of the ileum. From these larities with the current conserved patterns of the glycosyl hydrolases family 2 we used a set of 4,600 clones that contained random genomic bacterial DNA (BG and galactosidases, PROSITE documentation PDOC00531, http://www. and a set of 1,536 clones selected for the presence of 16S rRNA genes by DNA expasy.org/prosite/) were determined using PATTINPROT (http://npsa-pbil. hybridization (14). ibcp.fr/). Conserved motifs were determined by using the motif-based se- quence analysis tool MEME (http://meme.nbcr.net/meme4_1/). The patterns Screening of BG-Positive Fosmid Clones. Because the E. coli strain used to build proposed were obtained by using the discover patterns tool PRATT (http:// the metagenomic libraries (E. coli DH10B) constitutively expressed a β- www.expasy.ch/tools/pratt/). Comparative analysis of codon usage was per- fi glucuronidase encoded by the uidA gene, an initial visual screening was formed with known Firmicutes BGs and H11G11-like BG proteins identi ed by performed to detect metagenomic clones over-expressing BG activity. Met- using the Blastp software. agenomic clones were grown overnight in 2YT medium with chloramphenicol The genetic environments of the H11G11 and C7D2 BGs were analyzed − (12.5 μg · ml 1), and the activity was measured on permeabilized cells after using GeneMark (Version 2.4) for Prokaryotes (http://opal.biology.gatech. toluene treatment (10%, 10 min at 4 °C). BG activity was measured by adding edu/GeneMark/gmhmm2_prok.cgi). The different types of putative sym- 2 mM of PNP-G (Sigma-Aldrich) (41). Positive clones were submitted to a sec- porters and their identities/similarities within types were determined after ond quantitative screen using a kinetic measurement with E. coli DH10B homology search and ClustalW multiple alignment. transformed with an empty fosmid as control. Eight replicates were per- formed for each clone. Activity and bacterial density (E. coli dry weight) were Phylogenetic Assignment. The phyla and family assignments of H11G11-BG

estimated by measuring ΔOD405nm and OD600nm, respectively. The BG activity homologs from genomic databases were from the reports auto- −1 −1 fi was expressed in units of ΔOD405nm.min ·mg of dry weight. matically generated by NCBI (completed with the classi cation of R. gnavus As E. coli DH10B possesses a β-glucuronidase activity encoded by the uidA ATCC 29149 using the Ribosomal Database Project). gene, the ΔuidA E.coli L90 strain was used to unambiguously identify fos- The C7D2 insert was isolated from the 16S rDNA sublibrary and thus harbor- mids expressing β-glucuronidase activity. This strain was derived from the ed a 16S rRNA gene (NCBI accession no. AY850499) allowing phylogenetic as- TG1 ΔuidA::Kanr strain that was previously constructed by gene homolo- signmentbythe Ribosomal Database Projectclassifier(http://rdp.cme.msu.edu/). gous recombination (3). The kanamycine resistance cassette was removed However, for C7D2 and H11G11 inserts, potential ORFs were determined using using FLP recombinase borne by the pCP20 plasmid. Purified fosmids MetaGene (http://metagene.cb.k.u-tokyo.ac.jp/), and sequences were further (NucleoSpin 96 Flash, extraction kit, Marchery Nagel) were then electro- phylogenetically assigned using the best blast hits (Blastn or Blastx software) on transferred (25 μFd, 200 Ω,and2.5Kv)intheE. coli L90 strain. Transformants NCBI databases. The final phylogenetic assignment was expressed in percentage were grown on LB agar containing the appropriate antibiotic with 5-bromo- of insert length coverage. −1 4-chloro-3-indolyl-β-D-glucuronide (25 μg·ml ) (X-gluc, VWR) to reveal Phylogenetic relationships with the human gut bacterial phylogenetic core BG activity. (25) were analyzed in retrieving the 16s rRNA gene sequences of the H11G11- BG–containing strains and analyzing them with the RapidOTU pipeline (25) Fosmids Subcloning and Sequence Analysis. Metagenomic fosmids were purified (0.02% identity cut-off, Kimura-2 parameter distance method). The per- using the NucleoSpin 96 Flash extraction kit (Macherey-Nagel). Insert size was centage of identity between the 16s rRNA sequences of BG-positive strains

Gloux et al. PNAS | March 15, 2011 | vol. 108 | suppl. 1 | 4545 Downloaded by guest on September 29, 2021 and their closest phylogenetic core homologs was determined using pairwise Comparative Metagenomic Analysis. Comparative metagenomic analysis was alignment with Jalview 2.4.0b2. performed on May 2009 using the tBlastn software to search for H11G11-BG homologous proteins in the following NCBI metagenomic project datasets (http:// BG Activity Assay in Gut Bacteria. Strains encoding a H11G11 homolog in their metasystems.riken.jp/metabiome/metagenome.php): Human Gut Metagenome genomes were grown for 24 h at 37 °C under anaerobic conditions using the (13 healthy individuals) (ID 28117), Human Distal Gut Biome (ID16729), Com- bined Gut Metagenome from Obese and Lean Mice (ID17401), Termite Gut Hungate method in appropriate media. The Cooked Meat Medium (Difco) Metagenome (ID19107), Fish Metagenome (ID28955), Whale Fall Metagenome supplemented with yeast extract (5 g/L, Sigma), KH PO (5 g/L, Sigma), and 2 4 (ID13700), Microbial Mat Isolate (ID29795), Waseca County Farm Soil Meta- cysteine (0.5 g/L, Sigma) was used for C. bartlettii DSM 16795. B. ovatus ATCC genome (ID13699), and Global Ocean Sampling (ID19733). The same approach – 8483 and R. gnavus ATCC 29149 were grown in LYHBHI medium [brain heart was performed with representative known or putative BGs (UniProtKB data- infusion medium with 0.5% yeast extract (Difco) and 5 mg/L hemin] supple- bases, http://www.uniprot.org/, June 2009). The similarity hit threshold was at mented with cysteine (0.25 mg/mL; Sigma). The same medium was supple- least 50% similarity with 50% sequence coverage. To compare frequencies be- mented with cellobiose (0.5 mg/mL; Sigma–Aldrich) and maltose (0.5 mg/mL; tween databases, results are expressed as hits per base-pair unit. Sigma) for S. variabile DSM 15176 and F. prausnitzii M21/2. Soluble starch (0.5 mg/mL; Sigma) was added to this medium to grow P. merdae ATCC 43184 ACKNOWLEDGMENTS. We thank Dr. J. Tap for helpful discussions, M. Serezat for technical assistance, and C. Bridonneau for advice about gut bacteria and P. johnsonii DSM 18315, and 0.4% of clarified rumen juice (42) was also cultivation. This research was supported by the French Ministry of Research necessary to grow B. formatexigens DSM 14469 and R. inulinivorans DSM under project GenoTube, and insert sequences were performed by the French 16841. PNP-G activity was tested as described above for E. coli. National Sequencing Center (Genoscope, Evry, France).

1. Rod TO, Midtvedt T (1977) Origin of intestinal beta-glucuronidase in germfree, 22. Marchler-Bauer A, Bryant SH (2004) CD-Search: Protein domain annotations on the monocontaminated and conventional rats. Acta Pathol Microbiol Scand [B] 85:271–276. fly. Nucleic Acids Res 32(Web Server issue):W327–W331. 2. Knasmuller S, et al. (2001) Impact of bacteria in dairy products and of the intestinal 23. Carlier JP, Bedora-Faure M, K’Ouas G, Alauzet C, Mory F (2010) Proposal to unify microflora on the genotoxic and carcinogenic effects of heterocyclic aromatic amines. Clostridium orbiscindens (Winter et al. 1991) and Eubacterium plautii (Seguin 1928; Mutat Res 480–481:129–138. Hofstad and Aasjord 1982) with description of Flavonifractor plautii gen. nov., comb. 3. Humblot C, et al. (2007) Beta-glucuronidase in human intestinal microbiota is nov. and reassignment of Bacteroides capillosus to Pseudoflavonifractor capillosus necessary for the colonic genotoxicity of the food-borne carcinogen 2-amino-3- gen. nov., comb. nov. Int J Syst Evol Microbiol 60:585–590. methylimidazo[4,5-f]quinoline in rats. Carcinogenesis 28:2419–2425. 24. Downes J, Munson MA, Radford DR, Spratt DA, Wade WG (2002) Shuttleworthia 4. de Herder WW, Hazenberg MP, Pennock-Schroder AM, Hennemann G, Visser TJ satelles gen. nov., sp. nov., isolated from the human oral cavity. Int J Syst Evol (1986) Rapid and bacteria-dependent in vitro hydrolysis of iodothyronine-conjugates Microbiol 52:1469–1475. by intestinal contents of humans and rats. Med Biol 64:31–35. 25. Tap J, et al. (2009) Towards the human intestinal microbiota phylogenetic core. 5. Graef V, Furuya E, Nishikaze O (1977) Hydrolysis of steroid glucuronides with beta- Environ Microbiol 11:2574–2584. glucuronidase preparations from bovine liver, Helix pomatia, and E. coli. Clin Chem 26. Kurokawa K, et al. (2007) Comparative metagenomics revealed commonly enriched 23:532–535. gene sets in human gut microbiomes. DNA Res 14:169–181. 6. Kim DH, et al. (1998) Intestinal bacterial metabolism of flavonoids and its relation to 27. Li LL, McCorkle SR, Monchy S, Taghavi S, van der Lelie D (2009) Bioprospecting some biological activities. Arch Pharm Res 21:17–23. metagenomes: Glycosyl hydrolases for converting biomass. Biotechnol Biofuels 2:10. 7. Kim DH, et al. (2000) Biotransformation of glycyrrhizin by human intestinal bacteria 28. Gill SR, et al. (2006) Metagenomic analysis of the human distal gut microbiome. and its relation to biological activities. Arch Pharm Res 23:172–177. Science 312:1355–1359. 8. Schmelz EM, et al. (1999) Ceramide-beta-D-glucuronide: Synthesis, digestion, and 29. Knivett VA, Shah HN, McKee AS, Hardie JM (1983) Numerical taxonomy of some non- suppression of early markers of colon carcinogenesis. Cancer Res 59:5768–5772. saccharolytic and saccharolytic Bacteroides species. J Appl Bacteriol 55:71–80. 9. Carrette O, et al. (1995) Bacterial enzymes used for colon-specific drug delivery are 30. Palmer C, Bik EM, DiGiulio DB, Relman DA, Brown PO (2007) Development of the decreased in active Crohn’s disease. Dig Dis Sci 40:2641–2646. human infant intestinal microbiota. PLoS Biol 5:e177. 10. Geier MS, Butler RN, Howarth GS (2006) Probiotics, prebiotics and synbiotics: A role in 31. Paliy O, Kenche H, Abernathy F, Michail S (2009) High-throughput quantitative chemoprevention for colorectal cancer? Cancer Biol Ther 5:1265–1269. analysis of the human intestinal microbiota with a phylogenetic microarray. Appl 11. Dabek M, McCrae SI, Stevens VJ, Duncan SH, Louis P (2008) Distribution of beta- Environ Microbiol 75:3572–3579. glucosidase and beta-glucuronidase activity and of beta-glucuronidase gene gus in 32. Jansson J, et al. (2009) Metabolomics reveals metabolic biomarkers of Crohn’s disease. human colonic bacteria. FEMS Microbiol Ecol 66:487–495. PLoS One 4:e6386. 12. Baumgart M, et al. (2007) Culture independent analysis of ileal mucosa reveals 33. Sokol H, et al. (2009) Low counts of Faecalibacterium prausnitzii in colitis microbiota. a selective increase in invasive Escherichia coli of novel phylogeny relative to Inflamm Bowel Dis 15:1183–1189. depletion of Clostridiales in Crohn’s disease involving the ileum. ISME J 1:403–418. 34. Willing B, et al. (2009) Twin studies reveal specific imbalances in the mucosa- 13. Frank DN, et al. (2007) Molecular-phylogenetic characterization of microbial associated microbiota of patients with ileal Crohn’s disease. Inflamm Bowel Dis 15: community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci 653–660. USA 104:13780–13785. 35. Zhang M, et al. (2007) Structural shifts of mucosa-associated lactobacilli and 14. Manichanh C, et al. (2006) Reduced diversity of faecal microbiota in Crohn’s disease Clostridium leptum subgroup in patients with ulcerative colitis. J Clin Microbiol 45: revealed by a metagenomic approach. Gut 55:205–211. 496–500. 15. Scanlan PD, Shanahan F, O’Mahony C, Marchesi JR (2006) Culture-independent 36. Khachatryan ZA, et al. (2008) Predominant role of host genetics in controlling the analyses of temporal variation of the dominant fecal microbiota and targeted composition of gut microbiota. PLoS One 3:e3064. bacterial subgroups in Crohn’s disease. J Clin Microbiol 44:3980–3988. 37. Wahli W (2008) A gut feeling of the PXR, PPAR and NF-kappaB connection. J Intern 16. Mai V, Colbert LH, Perkins SN, Schatzkin A, Hursting SD (2007) Intestinal microbiota: A Med 263:613–619. potential diet-responsive prevention target in ApcMin mice. Mol Carcinog 46:42–48. 38. Zhou C, et al. (2006) Mutual repression between steroid and xenobiotic receptor and 17. Beaud D, Tailliez P, Anba-Mondoloni J (2005) Genetic characterization of the beta- NF-kappaB signaling pathways links xenobiotic metabolism and inflammation. J Clin glucuronidase enzyme from a human intestinal bacterium, Ruminococcus gnavus. Invest 116:2280–2289. Microbiology 151:2323–2330. 39. Zhou C, Verma S, Blumberg B (2009) The steroid and xenobiotic receptor (SXR), beyond 18. Roy D, Ward P (1992) Rapid detection of Bifidobacterium dentium by enzymatic xenobiotic metabolism. Nucl Recept Signal 7:e001. hydrolysis of beta-glucoronide substrates. J Food Prot 55:291–295. 40. Turnbaugh PJ, et al. (2009) A core gut microbiome in obese and lean twins. Nature 19. Russell WM, Klaenhammer TR (2001) Identification and cloning of gusA, encoding 457:480–484. a new beta-glucuronidase from Lactobacillus gasseri ADH. Appl Environ Microbiol 67: 41. Bardonnet N, Blanco C (1992) uidA-antibiotic-resistance cassettes for insertion 1253–1261. mutagenesis, gene fusions and genetic constructions. FEMS Microbiol Lett 72: 20. Blanco C (1987) Transcriptional and translational signals of the uidA gene in 243–247. Escherichia coli K12. Mol Gen Genet 208:490–498. 42. Leedle JA, Hespell RB (1980) Differential carbohydrate media and anaerobic replica 21. Salleh HM, et al. (2006) Cloning and characterization of Thermotoga maritima beta- plating techniques in delineating carbohydrate-utilizing subgroups in rumen glucuronidase. Carbohydr Res 341:49–59. bacterial populations. Appl Environ Microbiol 39:709–719.

4546 | www.pnas.org/cgi/doi/10.1073/pnas.1000066107 Gloux et al. Downloaded by guest on September 29, 2021