Supplementary Materials for

The Genetic Basis for the Cooperative Bioactivation of Plant Lignans by a Human Gut Bacterial Consortium Elizabeth N. Bess1, Jordan E. Bisanz1, Peter Spanogiannopoulos1, Qi Yan Ang1, Annamarie Bustion1, Seiya Kitamura2, Diana L. Alba3, Dennis W. Wolan2, Suneil K. Koliwad3, Peter J. Turnbaugh1,4*

*Correspondence to: [email protected]

This PDF file includes:

Materials and Methods Figures S1–S3 Tables S1–S11

1

Materials and Methods

Bacterial culturing conditions All were cultured at 37 °C in an anaerobic chamber (Coy Laboratory Products) with an atmosphere composed of 2-3% H2, 20% CO2, and the balance N2. Culture media was composed of Bacto Brain Heart Infusion (BD Biosciences, 37 g/L) supplemented with L-cysteine-HCl (0.05%, w/v), L-arginine (1%, w/v), menadione (1 µg/mL), and hemin (5 µg/mL) (referred to herein as BHI++) and allowed to equilibrate in the anaerobic environment prior to use.

Bacterial strains The bacterial strains in our collection have previously been extensively described (1). Each of these strains has been deposited in the Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) culture collection; draft are available in NCBI’s BioSample database.

Organism BioSample accession lenta 32-6-I 6 NA SAMN08365966 Eggerthella lenta DSM 2243 SAMN08365978 Eggerthella lenta 11C SAMN08365960 Eggerthella lenta RC4/6F SAMN08365982 Eggerthella lenta FAA1-1-60AUCSF SAMN08365980 Eggerthella lenta AN51LG SAMN08365970 Eggerthella lenta 14A SAMN08365961 Eggerthella lenta AB12 #2 SAMN08365968 Eggerthella lenta 28B SAMN08365965 Eggerthella lenta Valencia SAMN08365983 Eggerthella lenta DSM 15644 SAMN08365977 Eggerthella lenta DSM 11863 SAMN08365976 Eggerthella lenta CC7/5 D5 2 SAMN08365972 Eggerthella lenta CC8/2 BHI2 SAMN08365973 Eggerthella sinensis DSM 16107 SAMN08365985 Eggerthella lenta W1 BHI 6 SAMN08365984 Eggerthella lenta 22C SAMN08365964 Eggerthella lenta AB8 #2 SAMN08365969 Eggerthella lenta CC8/6 D5 4 SAMN08365974 Eggerthella lenta DSM 11767 SAMN08365975 Eggerthella lenta 1-3-56 SAMN02463797 Eggerthella lenta MR1 #12 SAMN08365981 Gordonibacter pamelaeae 3C SAMN08365986 Gordonibacter sp. 28C SAMN08365987 Paraeggerthella hongkongensis RC2/2 A SAMN08365988

2

Blautia producta DSM 3507 and Lactonifactor longoviformis DSM 17459 were purchased from DSMZ; draft genomes are available in NCBI’s BioSample database.

Organism BioSample accession Blautia producta DSM 3507 SAMN08397823 Lactonifactor longoviformis DSM 17459 SAMN08397824

Chemicals Pinoresinol, lariciresinol, and secoisolariciresinol were obtained from Separation Research Ltd. Enterodiol, enterolactone, and corticosterone were obtained from Sigma Aldrich.

Culturing the Coriobacteriia strain collection with PINO Wells of 96-well plates containing 99 µL of BHI++ media supplemented with PINO (510 µM) were inoculated (1:100), in triplicate, from dense cultures of each of the 25 strains of our Coriobacteriia collection. Wells were also prepared to serve as sterile controls. Plates were sealed with tape to minimize evaporation and incubated at 37 °C for 48 hours. Next, the plates were centrifuged at 2000 rpm for 10 min at 4 °C, and the supernatant was harvested and stored at -20 °C. Samples were thawed prior to performing analyte measurements via HPLC, using HPLC Method A (described below). Results are summarized in Table S2.

Culturing bacteria with lignans for time-course, RNA-Seq, and qRT-PCR experiments Culturing E. lenta DSM 2243 with PINO Hungate tubes containing 5 mL of BHI++ were inoculated (1:100), in pairs, from dense cultures of E. lenta DSM 2243. Sterile controls were also prepared. Culture tubes were loosely capped and incubated at 37 °C until mid-log-phase growth was achieved, at which time pairs of cultures were exposed to either 50 µL of pinoresinol (PINO, 50 mM in methanol) or 50 µL of vehicle (methanol). Then, 1.5 hours following PINO (n=3 cultures) or vehicle (n=3 cultures) exposure, cultures were removed from the anaerobic chamber and centrifuged at 2000 rpm for 10 min at 4 °C. The supernatant was decanted, and the remaining cell pellet was either flash-frozen in liquid nitrogen and subsequently stored in a -80 °C freezer until RNA was extracted for qRT- PCR analysis or 1 mL of TRI Reagent (Sigma Aldrich) was added to the pellet for immediate extraction of RNA for RNAseq (see below for details of this procedure).

Additional cultures differentially exposed to PINO (n=3) or vehicle (n=3) were maintained in the anaerobic chamber. Throughout the course of incubating these cultures, OD600 was periodically measured using a Hach spectrophotometer (model DR1900) that was housed in an anaerobic chamber. Following addition of PINO or vehicle, each culture was periodically sampled (100 µL aliquots); supernatant was collected and stored at -20 °C. Samples were thawed prior to HPLC analysis, which was performed according to HPLC Method A (described below).

Culturing B. producta DSM 3507 with SECO Hungate tubes containing 5 mL of BHI++ were inoculated (1:100), in pairs, from dense cultures of B. producta DSM 3507. Sterile controls were also prepared. Culture tubes were loosely

3

capped and incubated at 37 °C until mid-log-phase growth was achieved, at which time pairs of cultures were exposed to either 50 µL of secoisolariciresinol (SECO, 50 mM in methanol) or 50 µL of vehicle (methanol). Then, ~1.5 hours following SECO (n=3 cultures) or vehicle (n=3 cultures) exposure, cultures were removed from the anaerobic chamber and centrifuged at 2000 rpm for 10 min at 4 °C. The supernatant was decanted, and the remaining cell pellet was flash- frozen in liquid nitrogen and subsequently stored in a -80 °C freezer until RNA was extracted for analysis via qRT-PCR or RNAseq (see below for details of this procedure).

To assess consumption of SECO over time, cultures (100 µL of BHI++) were prepared in a 96- well plate. Five sets of three cultures and one sterile control, each exposed to SECO (500 µM), were prepared. Following 2, 5, 7, 10, and 25 hrs of incubation at 37 °C, one set was of cultures was harvested; supernatant was collected and stored at -20 °C. Samples were thawed prior to HPLC analysis, which was performed according to HPLC Method B (described below).

Culturing G. pamelaeae 3C with dmSECO Hungate tubes containing 5 mL of BHI++ and 500 µM of SECO (50 µL from a stock solution of 50 mM SECO in methanol) were inoculated (1:100), in pairs, from dense cultures of B. producta DSM 3507. Sterile controls were also prepared. Culture tubes were loosely capped and incubated at 37 °C. After 22.5 hours of incubation, HPLC analysis of supernatant (using HPLC Method B) demonstrated that SECO had been fully consumed.

Maintaining cultures in an anaerobic chamber, the cultures were partitioned into microcentrifuge tubes and centrifuged to pellet the cells. The supernatant was passed through a 0.2 µm filter to sterilize the spent media, which contained the metabolic byproduct of SECO, presumably didemethylsecoisolariciresinol (dmSECO).

Concurrent to preparation of these cultures of B. producta DSM 3507, hungate tubes containing 3 mL of BHI++ were inoculated (1:100), in pairs, from dense cultures of G. pamelaeae 3C. Sterile controls were also prepared. Culture tubes were loosely capped and incubated at 37 °C until mid-log-phase growth was achieved, at which time pairs of cultures were exposed to 2 mL of the sterilized, spent media that was harvested from the incubation of B. producta DSM 3507 with either SECO (n=3 cultures) or vehicle (n=3 cultures). The spent, filter-sterilized media from B. producta DSM 3507 was also added to sterile controls to monitor sterility of the spent media.

After 1.5 hours, the cultures of G. pamelaeae 3C were removed from an anaerobic chamber and centrifuged at 2000 rpm for 10 min at 4 °C. The supernatant was decanted, and the remaining cell pellet was flash-frozen in liquid nitrogen and subsequently stored in a -80 °C freezer until RNA was extracted for RNAseq (see below for details of this procedure).

Prior to removing the cultures from the anaerobic chamber, a 400 µL aliquot was obtained of each of the cultures of G. pamelaeae 3C that had been exposed to spent, filter-sterilized media that was generated from the incubation of B. producta DSM 3507 with SECO. These aliquots, maintained in an anaerobic chamber, were incubated for an additional 21.5 hours. Supernatant was evaluated by HPLC (HPLC Method B) to assess production of END by G. pamelaeae 3C. No END was detected in the sterile control.

4

During the course of the work reported herein, we obtained a pure sample of dmSECO (synthetic methods reported below). This material was used to validate by qRT-PCR the results of the RNAseq analysis from the incubation of G. pamelaeae 3C with spent, filter-sterilized media that was generated from the incubation of B. producta DSM 3507 with SECO or vehicle. Hungate tubes containing 5 mL of BHI++ were inoculated (1:100), in pairs, from dense cultures of G. pamelaeae 3C. Sterile controls were also prepared. Culture tubes were loosely capped and incubated at 37 °C until mid-log-phase growth was achieved, at which time pairs of cultures were exposed to either 50 µL of dmSECO (50 mM in methanol) or 50 µL of vehicle (methanol). Then, 1.5 hours following dmSECO (n=3 cultures) or vehicle (n=3 cultures) exposure, cultures were removed from an anaerobic chamber and centrifuged at 2000 rpm for 10 min at 4 °C. The supernatant was decanted, and 1 mL of TRI Reagent was added to the remaining cell pellet for immediate extraction of RNA (see below for details of this procedure), which was subjected to a qRT-PCR assay.

Additional cultures differentially exposed to dmSECO (n=3) or vehicle (n=3) were maintained in the anaerobic chamber. During incubation, OD600 was periodically measured using a Hach spectrophotometer (model DR1900) that was housed in the anaerobic chamber. Following addition of dmSECO or vehicle, each culture was periodically sampled (100 µL aliquots); supernatant was collected and stored at -20 °C. Samples were thawed prior to HPLC analysis, which was performed according to HPLC Method A (described below).

Culturing L. longoviformis DSM 17459 with END Hungate tubes containing 5 mL of BHI++ were inoculated (1:100), in pairs, from dense cultures of L. longoviformis DSM 17459. Sterile controls were also prepared. Culture tubes were loosely capped and incubated at 37 °C until mid-log-phase growth was achieved, at which time pairs of cultures were exposed to either 50 µL of enterodiol (END, 50 mM in methanol) or 50 µL of vehicle (methanol). Then, ~1.7 hours following END (n=3 cultures) or vehicle (n=3 cultures) exposure, cultures were removed from an anaerobic chamber and centrifuged at 2000 rpm for 10 min at 4 °C. The supernatant was decanted, and the remaining cell pellet was flash-frozen in liquid nitrogen and subsequently stored in a -80 °C freezer until RNA was extracted for analysis via qRT-PCR or RNAseq (see below for details of this procedure).

Additional cultures differentially exposed to END (n=3) or vehicle (n=3) were maintained in an anaerobic chamber. During incubation, OD600 was periodically measured using a Hach spectrophotometer (model DR1900) that was housed in an anaerobic chamber. Following addition of END or vehicle, each culture was periodically sampled (100 µL aliquots); supernatant was collected and stored at -20 °C. Samples were thawed prior to HPLC analysis, which was performed according to HPLC Method A (described below).

Culturing Coriobacteriia strain collection with SECO for assessing END production Wells of 96-well plates containing 98 µL of BHI++ media supplemented with SECO (505 µM) were each inoculated (1:100), in triplicate, from a dense culture of B. producta DSM3507 and from a dense culture of each of the 25 strains of our Coriobacteriia. Wells were also prepared to serve as sterile controls. Plates were sealed with tape to minimize evaporation and incubated at 37 °C for 40 hours. Next, the plates were centrifuged at 2000 rpm for 10 min at 4 °C, and the

5

supernatant was harvested and stored at -20 °C. Samples were thawed prior to HPLC analysis, which was performed according to HPLC Method A (described below). Results are summarized in Table S1.

Cloning ber The ber sequence was PCR-amplified using the primers shown in Table S9, which introduced KpnI and BamHI restriction sites into the forward and reverse primers, respectively. The PCR product was cloned into a pET19bTEV plasmid, affording a plasmid designated as pET19bTEV- ber, that was then introduced into E. coli Rosetta 2(DE3) cells.

Heterologous expression of Ber E. coli Rosetta 2(DE3) cells bearing pET19bTEV with or without ber were anaerobically cultured at 37 °C for 1 day in LB that was supplemented with ampicillin (50 µg/mL) and chloramphenicol (15 µg/mL). The resulting cultures were used to inoculate (1:100) LB that was supplemented with 250 µM PINO (from a stock solution of 50 mM in MeOH). Cultures were aerobically incubated for 48 hours. Then, the bacterial cells were pelleted, and the supernatant was harvested and stored at -20 °C. Samples were thawed prior to HPLC analysis, which was performed according to HPLC Method B (described below).

HPLC methods HPLC measurements of analyte concentrations were performed using an Agilent HPLC (1220 Infinity) equipped with a pump, degasser, autosampler, column oven, and diode array detector. Data was acquired using OpenLAB CDS (Agilent Technologies).

Stock solutions of PINO, LAR, SECO, END, and ENL were prepared at 50 mM in DMSO. These solutions were used to generate a standard containing PINO, LAR, SECO, END, and ENL (HPLC Method A, each 750 µM in BHI++; HPLC Method B, each 1000 µM in BHI++), which was then serially diluted [2-fold dilutions to 5.9 µM (Method A) and 7.8 µM (Method B)]. These standards were each diluted 2-fold in a solution of the internal standard corticosterone (COR, 200 µM in water). The resulting samples were injected to an HPLC using conditions given below. Calibration curves were generated by measuring peak areas and performing linear regression against known concentrations.

Supernatant samples from the incubation of bacteria with lignan were prepared by diluting 2-fold in a solution of the internal standard corticosterone (200 µM in water). The resulting samples were injected to an HPLC using conditions given below. Retention times and UV absorption traces of analytes were compared to lignan standards to identify lignans. Lignan concentrations were calculated from calibration curves.

6

Method A Solvent A: 0.1% formic acid(aq) Solvent B: 100% methanol 0-7.5 min, 30%->48% B 7.5-12 min, 48%-90% B 12.5 min, 90% B 12.5-12.6 min, 90%-30% B 12.6-14 min, 30% B Flow rate: 1 mL/min Temperature: 37 °C Column: C18 column (Kinetex 2.6 µM 100Å, 15 cm x 0.46 cm; Phenomenex: 00F-4462-E0) Guard column: SecurityGuard ULTRA cartridge (Phenomenex part #: AJ0-8768) Injection volume: 30 µL

Analyte Wavelength Measured (nm) Retention Time (min) PINO 275 9.35 LAR 275 7.49 SECO 275 7.73 END 275 9.97 ENL 275 10.65 COR 240 12.24

Representative spectrum

7

Method B HPLC Method Solvent A: 0.1% formic acid(aq) Solvent B: 100% acetonitrile 0-1 min, 30% B 1-5 min, 30%->80% B 5-6 min, 80%->30% B 6-10 min, 30% B Flow rate: 1 mL/min Temperature: 37 °C Column: C18 column (Kinetex 2.6 µM 100Å, 15 cm x 0.46 cm; Phenomenex: 00F-4462-E0) Guard column: SecurityGuard ULTRA cartridge (Phenomenex part #: AJ0-8768) Injection volume: 30 µL

Analyte Wavelength Measured (nm) Retention Time (min) PINO 285 4.28 LAR 285 3.17 SECO 285 2.76 END 285 3.53 ENL 285 4.81 COR 240 4.91

Representative spectrum

8

Comparative genomics Comparative genomics was performed using ElenMatchR, the development of which we have reported (1). This tool can be found at https://jbisanz.shinyapps.io/elenmatchr/.

This tool was applied using the below parameters. Phenotype: Custom_Phenotype Clustering Parameters: Min Percent Identity, 60%; Min Query Coverage, 80% Random Forest Parameters: mtry, 0; mtree, 10,000 Phenotype Table: given below ______

#To add custom phenotype, type phenotypes in the Custom_Phenotype column and save as a tab separated variable file.

Strain Genome_ID ShortName Custom_Phenotype Adlercreutzia equolifaciens DSM 19450 Adlercreutzia_equolifaciens_DSM19450 Ae DSM 19450 Asaccharobacter celatus AP38TSA Asaccharobacter_celatus_AP38TSA Ac AP38TSA Asaccharobacter celatus OB21 GAM 11 Asaccharobacter_celatus_OB21GAM11 Ac OB21 GAM 11 Atopobium rimae ATCC 49626 Atopobium_rimae_ATCC49626 Ar ATCC 49626 Collinsella aerofaciens ATCC 25986 Collinsella_aerofaciens_ATCC25986 Ca ATCC 25986 Coriobacterium glomerans PW2 Coriobacterium_glomerans_PW2 Cg PW2 curtum DSM 15641 Cryptobacterium_curtum_DSM15641 Cc DSM 15641 Denitrobacterium detoxificans DSM 21843 Denitrobacterium_detoxificans_DSM21843 Dd DSM 21843 Eggerthella lenta 11C Eggerthella_lenta_11C El 11C Metabolizer Eggerthella lenta 14A Eggerthella_lenta_14A El 14A Metabolizer Eggerthella lenta 16A Eggerthella_lenta_16A El 16A Eggerthella lenta 19C Eggerthella_lenta_19C El 19C Eggerthella lenta 22C Eggerthella_lenta_22C El 22C Non Metabolizer Eggerthella lenta 28B Eggerthella_lenta_28B El 28B Metabolizer Eggerthella lenta 32-6-I 6 NA Eggerthella_lenta_326I6NA El 32-6-I 6 NA Metabolizer Eggerthella lenta A2 Eggerthella_lenta_A2 El A2 Eggerthella lenta AB12 #2 Eggerthella_lenta_AB12n2 El AB12 #2 Metabolizer Eggerthella lenta AB8 #2 Eggerthella_lenta_AB8n2 El AB8 #2 Non Metabolizer Eggerthella lenta AN51LG Eggerthella_lenta_AN51LG El AN51LG Metabolizer Eggerthella lenta ATCC 25559 Eggerthella_lenta_ATCC25559 El ATCC 25559 Eggerthella lenta C592 Eggerthella_lenta_C592 El C592 Eggerthella lenta CC7/5 D5 2 Eggerthella_lenta_CC75D52 El CC7/5 D5 2 Metabolizer Eggerthella lenta CC8/2 BHI2 Eggerthella_lenta_CC82BHI2 El CC8/2 BHI2 Metabolizer Eggerthella lenta CC8/6 D5 4 Eggerthella_lenta_CC86D54 El CC8/6 D5 4 Non Metabolizer Eggerthella lenta DSM 11767 Eggerthella_lenta_DSM11767 El DSM 11767 Non Metabolizer Eggerthella lenta DSM 11863 Eggerthella_lenta_DSM11863 El DSM 11863 Metabolizer Eggerthella lenta DSM 15644 Eggerthella_lenta_DSM15644 El DSM 15644 Metabolizer Eggerthella lenta DSM 2243 Eggerthella_lenta_DSM2243REF El DSM 2243 Metabolizer Eggerthella lenta DSM 2243DSM Eggerthella_lenta_DSM2243DSMZ El DSM 2243D Eggerthella lenta FAA1-1-60ABroad Eggerthella_lenta_1160AFAABroad El FAA1-1-60AB Eggerthella lenta FAA1-1-60AUCSF Eggerthella_lenta_1160AFAAUCSF El FAA1-1-60AU Metabolizer Eggerthella lenta FAA1-3-56 Eggerthella_lenta_1356FAA El FAA1-3-56 Non Metabolizer Eggerthella lenta HGA1 Eggerthella_lenta_HGA1 El HGA1 Eggerthella lenta MR1 #12 Eggerthella_lenta_MR1n12 El MR1 #12 Non Metabolizer Eggerthella lenta RC4/6F Eggerthella_lenta_RC46F El RC4/6F Metabolizer Eggerthella lenta UCSF 2243 Eggerthella_lenta_DSM2243UCSF El UCSF 2243 Eggerthella lenta Valencia Eggerthella_lenta_Valencia El Valencia Metabolizer Eggerthella lenta W1 BHI 6 Eggerthella_lenta_W1BHI6 El W1 BHI 6 Metabolizer Eggerthella sinensis DSM 16107 Eggerthella_sinensis_DSM16107 Es DSM 16107 Metabolizer Eggerthella species YY7918 Eggerthella_species_YY7918 Es YY7918 Enterorhabdus mucosicola DSM 19490 Enterorhabdus_mucosicola_DSM19490 Em DSM 19490 Gordonibacter pamelaeae 3C Gordonibacter_pamelaeae_3C Gp 3C Non Metabolizer Gordonibacter species 28C Gordonibacter_species_28C Gs 28C Non Metabolizer Olsenella uli DSM 7084 Olsenella_uli_DSM7084 Ou DSM 7084

9

Paraeggerthella hongkongensis RC2/2 A Paraeggerthella_hongkongensis_RC22A Ph RC2/2 A Non Metabolizer Parvibacter caecicola DSM22242 Parvibacter_caecicola_DSM22242 Pc DSM 22242 Senegalimassilia anaerobia AP69FAA Senegalimassilia_anaerobia_AP69FAA Sa AP69FAA Senegalimassilia anaerobia JC110 Senegalimassilia_anaerobia_JC110 Sa JC110 heliotrinireducens DSM 20476 Slackia_heliotrinireducens_DSM20476 Sh DSM 20476 Slackia isoflavoniconvertens OB21 GAM31 Slackia_isoflavoniconvertens_OB21GAM31 Si OB21 GAM31 ______

Didemethylsecoisolariciresinol (dmSECO) synthesis General All reagents and solvents were purchased from commercial suppliers and were used without further purification. Arctigenin was purchased from Ark Pharm (purity ≥ 98%). All reactions were performed in an inert atmosphere of argon. 1H and 13C NMR spectra were collected using a Bruker 600 MHz spectrometer with chemical shifts reported relative to residual deuterated solvent peaks or a tetramethylsilane internal standard. Accurate masses were measured using an ESI-TOF (HRMS, Agilent MSD) or MSQ Plus mass spectrometer (LRMS, Thermo Scientific). Reactions were monitored on TLC plates (silica gel 60, F254 coating, EMD Millipore, 1057150001), and spots were either monitored under UV light (254 mm) or stained with phosphomolybdic acid. The same TLC system was used to test purity, and the final product showed a single spot on TLC with both phosphomolybdic acid and UV absorbance. The purity of the final product was >95% based on 1H NMR and reverse phase HPLC-UV on monitoring absorption at 254 nm.

Analytical LC method to determine the purity of synthetic compounds Purity determination of the final product was performed on a Thermo Scientific Accela HPLC system using Accela 1250 pump. The Thermo Accucore C18 RP HPLC column (150 mm × 2.1 mm, particle size 2.6 µm) was used. The UV absorption between 190 nm and 400 nm was monitored, and the purity was determined by the peak area at 254 nm. Gradient method is described in a table below.

Flow rate Time Aqueous phasea Organic phaseb (mL/min) 0.00 99 1 0.5 1.00 99 1 0.5 16.00 0 100 0.5 19.00 0 100 0.5 19.01 99 1 0.5 22.00 99 1 0.5 aMilli-Q water 99.9, formic acid 0.1, volume %; bacetonitrile 99.9, formic acid 0.1, volume %.

10

Synthesis Didemethylsecoisolariciresinol was synthesized by the method described previously (2) with slight modifications.

(3R,4R)-3,4-bis(3,4-dihydroxybenzyl)dihydrofuran-2(3H)-one

To a dichloromethane (DCM) solution of Arctigenin (100 mg, 269 μmol) was added a DCM solution of BBr3 (2M, 1.0 mL, 2 mmol) and stirred overnight. The reaction mixture was slowly added to ice-cold water and stirred vigorously for 60 min at room temperature. This solution was extracted with diethyl ether three times. The organic layer was combined and dried over MgSO4, filtered and concentrated in vacuo to give a fairly pure title compound as an off-white solid (90 1 mg, 1.7 mmol, quant.). H NMR (600 MHz, Acetone-d6) δ 6.79 – 6.75 (m, 2H), 6.73 (d, J = 8.0 Hz, 1H), 6.63 (d, J = 2.1 Hz, 1H), 6.58 (dd, J = 8.0, 2.1 Hz, 1H), 6.46 (dd, J = 8.0, 2.1 Hz, 1H), 4.01 (dd, J = 8.9, 7.6 Hz, 1H), 3.84 (t, J = 8.8 Hz, 1H), 2.90 – 2.79 (m, 2H), 2.64 – 2.57 (m, 2H), 2.53 – 2.44 (m, 1H), 2.41 (dd, J = 13.3, 9.3 Hz, 1H). 13C NMR (151 MHz, Acetone) δ 179.1, 145.8, 145.8, 144.6, 144.4, 131.3, 130.7, 121.7, 120.8, 117.2, 116.5, 116.1, 116.0, 71.6, 57.8, 47.0, 42.1, 38.1, 34.7, 18.7. LRMS (-) calcd for (M-H)- 329.1. Found 329.2.

4,4'-((2R,3R)-2,3-bis(hydroxymethyl)butane-1,4-diyl)bis(benzene-1,2-diol)

To a tetrahydrofuran (THF) solution of (3R,4R)-3,4-bis(3,4-dihydroxybenzyl)dihydrofuran- 2(3H)-one (50 mg, 152 μmol) was slowly added a THF solution of lithium aluminum hydride (2M, 0.2 mL, 400 μmol), and stirred 2 hours. To this mixture was slowly added additional THF solution of lithium aluminum hydride (2M, 0.2 mL, 400 μmol), and stirred 2 hours. To this solution was slowly added aqueous H2SO4 solution (2N, 10 mL) and extracted with diethyl ether three times. The organic layer was combined, dried over MgSO4, filtered and concentrated in vacuo. Purification by reverse-phase HPLC using an XBridgeTM Prep C18 column (Waters, 5 μm, 19 x 150 mm) was repeated until the purity of target compound exceeds 95%. HPLC fraction containing the target compound was lyophilized to give the title compound as an off- 1 white solid (7.1 mg, 21 μmol, 14%). H NMR (600 MHz, Acetone-d6) δ 7.70 (s, 2H), 7.63 (s, 2H), 6.71 – 6.68 (m, 4H), 6.51 (dd, J = 8.0, 2.1 Hz, 2H), 4.09 (br s, 2H), 3.66 (d, J = 11.2 Hz, 2H), 3.52 – 3.44 (m, 2H), 2.62 – 2.59 (m, 4H), 1.91 – 1.85 (m, 2H). 13C NMR (151 MHz, Acetone) δ 145.6, 143.9, 134.0, 121.2, 117.0, 115.8, 61.0, 45.1, 35.7. HRMS (+) calcd for (M+H)+ 335.1489. Found 335.1491.

11

Extraction of total RNA To a bacterial pellet, 1 mL of TRI reagent was added. The sample was vortexed to resuspend the cells and incubated at room temperature for 10 minutes. The suspension was then transferred to a 2 mL screw-cap tube containing glass beads and lysing matrix (MP Biomedicals, catalog #: 116914050). Cells were lysed and the suspension homogenized for 5 minutes in a bead-beater (BioSpec Products, Mini-Beadbeater-96, catalog #: 1001). Chloroform (200 µL) was added to the sample, and then the sample was vortexed for 15 seconds before incubating at room temperature for 10 minutes. Next, the mixture was centrifuged at 16,000 x g for 15 minutes at 4 °C.

The upper aqueous phase (500 µL) was transferred to a new RNase-free microcentrifuge tube, and 500 µL of 100% ethanol was added, followed by vortexing to mix well. This mixture was transferred to a spin column (PureLink RNA Mini Kit, Life Technologies catalog #: 12183025) and centrifuged at ≥12,000 x g for 30 seconds, discarding flow-through, until all of the material had been added to the column. To the spin column, 350 µL of Wash Buffer I (PureLink RNA Mini Kit, Life Technologies, catalog #: 12183025) was added, and then the column was centrifuged at ≥12,000 x g for 30 seconds, discarding flow-through.

To the column, 80 µL of DNase mix (PureLink DNase Set, Life Technologies catalog #: 12185010; 8 µL 10x reaction buffer, 10 µL DNase, 62 µL RNase-free water) was added, and the column was incubated at room temperature for 15 minutes. Next, 350 µL of Wash Buffer I (PureLink RNA Mini Kit, Life Technologies, catalog #: 12183025) was added, and then the column was centrifuged at ≥12,000 x g for 30 seconds. The column was transferred to a new collection tube, and 500 µL of Wash Buffer II was added, followed by centrifugation at ≥12,000 x g for 30 seconds, discarding flow-through. The column was centrifuged at ≥12,000 x g for 60 seconds, drying the column, and then moved to a collection tube. Then, 50 µL of RNase-free water was added, and the column was incubated at room temperature for 1 minute. Finally, the column was centrifuged for 1 minute at ≥12,000 x g, retaining the flow-through, which contains total RNA.

A second, solution-phase DNase treatment was performed using TURBO-DNase (Ambion, ThermoFisher, catalog #: AM2238), adding 5 µL TURBO-DNase buffer and 1 µL TURBO- DNase to the 50 µL solution of total RNA. This solution was incubated at 37 °C for 30 minutes, after which 56 µL of Lysis Buffer (PureLink RNA Mini Kit, Life Technologies, catalog #: 12183025) and 56 µL of 100% ethanol were added. The solution was vortexed, and then the sample was transferred to a spin cartridge with collection tube and centrifuged at ≥12,000 x g for 30 seconds, discarding the flow-through. Wash Buffer I (350 µL; PureLink RNA Mini Kit, Life Technologies, catalog #: 12183025) was added, followed by centrifugation of the spin column for 30 seconds at ≥12,000 x g. The spin column was moved to a new collection tube, and 500 µL of Wash Buffer II (PureLink RNA Mini Kit, Life Technologies, catalog #: 12183025) was added. The sample was centrifuged (≥12,000 x g for 30 seconds), discarding flow-through. The column was centrifuged at ≥12,000 x g for 1 minute, drying the column, and then moved to a collection tube. Then, 30 µL of RNase-free water was added, and the column was incubated at room temperature for 1 minute. Finally, the column was centrifuged for 1 minute at ≥12,000 x g, retaining the flow-through, which contains purified total RNA.

12

rRNA depletion and library generation for RNAseq Total RNA was submitted to rRNA depletion using RiboZero (Illumina Ribo-Zero Bacterial rRNA Depletion, catalog #: MRZB12424), following the manufacturer’s protocol. RNA fragmentation, cDNA synthesis, and library preparation proceeded using NEBNext Ultra RNA Library Prep Kit for Illumina (New England BioLabs, catalog #: E7530) and NEBNext Multiplex Oligos for Illumina, Dual Index Primers (New England BioLabs, catalog #: E7600), following the manufacturer’s protocol.

RNAseq All samples, except Gordonibacter pamelaeae 3C samples, were single-end sequenced (1x50bp) using an Illumina HiSeq2500 platform (High Output, v4 chemistry). Gordonibacter pamelaeae 3C samples were single-end sequenced (1x100bp) using an Illumina HiSeq2500 platform (Rapid Run, v2 chemistry). For each sample, fastq files are available in NCBI’s Sequence Read Archive (SRA), accession number SRP140684.

RNAseq analysis The code for analyzing the RNAseq data generated is in the Supplementary Material file CodeS1.html. Briefly, reads were mapped to reference genomes using Bowtie2 (1). HTSeq was used to count the number of transcripts mapping to (3). Finally, differential expression was assessed using DESeq (4).

qRT-PCR analysis From total RNA, prepared as described above, cDNA was synthesized using iScript Reverse Transcriptase (BioRad) and then diluted 20-fold in nuclease-free water. The qRT-PCR assays were performed using SYBR Select Master Mix for CFX (Applied Biosystems) using Thermocycler C1000 CFX384 Real-Time System (BioRad). The primers used for amplification are listed in Table S9. Fold changes in transcript levels were calculated as lignan exposure relative to vehicle exposure using the ΔΔCT method.

Culture-independent analysis of gene prevalence in the human gut microbiome Stool sample collection and DNA extraction The included human stool samples were collected as part of a multi-ethnic clinical cohort study, termed Inflammation, Diabetes, Ethnicity, and Obesity (IDEO), consisting of 25–65-year-old men and women residing in Northern California and recruited from medical and surgical clinics at the University of California San Francisco (UCSF) and the Zuckerberg San Francisco General Hospital, or through local public advertisements. The host phenotypic data from this cohort has been described in detail (5). All subjects consented to participate in the study, which was approved by the UCSF Institutional Review Board.

To extract DNA, stool samples were homogenized with bead beating for 5 min (Mini- Beadbeater-96, BioSpec, catalog #: 1001) using beads of mixed size and material (Lysing Matrix E 2mL Tube, MP Biomedicals, catalog #: 116914500) using the digestion solution and lysis

13

buffer of a Wizard SV 96 DNA kit (Promega, catalog #: 2371). The samples were then centrifuged for 10 min at 16,000 x g, and then the supernatant was transferred to the binding plate. The DNA was then purified according to the manufacturer's instructions. DNA was further diluted 10-fold in water.

Extraction of genomic DNA from mono-cultured lignan-metabolizing bacteria Eggerthella lenta DSM 2243, Blautia producta DSM 3507, Gordonibacter pamelaeae 3C, and Lactonifactor longoviformis DSM 17459 were each cultured in BHI++ in an anaerobic chamber, as described above. Dense cultures were centrifuged to pellet bacterial cells. Subsequently, cells were lysed and genomic DNA was extracted using UltraClean Microbial DNA Isolation Kit (MO BIO Laboratories, catalog # 12224) according to the manufacturer’s instructions.

PCR assay for detection of genomic loci associated with lignan metabolism DNA extractions from IDEO stool samples were diluted 20-fold in a mixture of nuclease-free water, AmpliTaq Gold 360 Master Mix (Applied Biosystems), and forward and reverse primers (0.5 μM). Forward and reverse primers used for amplification are listed in Table S10. Reactions were performed using either a C1000 Touch™ Thermal Cycler or a S1000™ Thermal Cycler (BioRad). Reactions were amplified over 30 cycles (40 cycles for edl1 and edl2) with the following parameters: denaturation at 95 °C for 30 sec., annealing at 59.7 °C for 30 sec., and extension at 72 °C for one min. with a final extension at 72 °C for 7 min. Genomic DNA extracted from mono-cultures of each bacteria served as positive-control templates and nuclease- free water served as a negative control. PCR amplification products were subjected to gel electrophoresis with agarose gels [1.5% agarose with SYBR Safe DNA gel stain (Invitrogen)]. Gels were visualized using ChemiDoc™ Imaging System (BioRad). In the PCRs in which DNA amplified, only one amplicon was visualized per reaction. Amplicons matching positive controls in length were classified as having the target gene present; a subset of amplicons were Sanger sequenced to confirm their identity (described below). Genes detected for each individual are tabulated in Table S8. Across 68 individuals, five genes were detected in 67.6% of the cohort, four genes in 23.5%, three genes in 5.9%, two genes in 1.5%, and one gene in 1.5%. We did not detect any significant associations between the presence or absence of each gene and the following host factors: sex, ethnicity, diabetes status, or body mass index (obese versus lean) (p- value>0.05, chi-square test of independence).

Sanger sequencing of PCR amplicon products Of the 46 individuals in which all five genes associated with lignan metabolism were detected, all of the resultant amplicons from a subset of five individuals (designated in Table S8) and positive controls (genomic DNA extracted from mono-cultured organisms served as PCR template) were purified using Agencourt AMPure PCR Purification kit (Beckman Coulter) according to the manufacturer’s instructions for a 96-well format, using nuclease-free water as a DNA elution buffer. After purification, the amplicon was subjected to Sanger sequencing (GENEWIZ) using the same forward primer that was used for PCR amplification. The resulting sequences were manually evaluated for correct base-calling and trimmed of unassignable bases (“N”s) at both 5’ and 3’ ends of the reads. The resulting sequence reads were then aligned to gene sequences obtained from draft genomes (NCBI RefSeq locus tags: ber, ELEN_RS01850; glm, C3R19_12985; cld2, C1877_07250; edl1, BUA56_RS20795; edl2, BUA56_RS20800) via

14

NCBI’s nucleotide BLAST web tool. Percent identity and coverage of query sequences relative to subject sequences is reported in Table S11.

15

Fig. S1. PINO-metabolizing Coriobacteriia strains are not monophyletic. Phylophlan-based phylogenetic tree produced using ElenMatchR: Comparative Genomics Tool v0.3 (1, 6). This tree demonstrates the non-monophyletic nature of PINO metabolism across the strain collection and suggests that this phenotypic trait is de-coupled from bacterial evolutionary history, suggesting the repeated gain or loss of the genes responsible.

16

Fig. S2. Domain maps for gut bacterial genes implicated in the lignan metabolism pathway. Annotations, assigned by homology, of the domains that constitute the putative lignan- metabolizing enzymes are presented and provide support for the inferred biochemical functions. All proteins are predicted to be cytoplasmic with the exception of Ber, which has an N-terminal TAT signal sequence, targeting Ber for secretion.

17

Fig. S3. The putative enzymes mediating bacterial metabolism of dietary lignans. A working model of the bacterial lignan metabolism pathway is presented. Several transporters, which traffic small molecules (ABC transporters) or ions (MFS transporters) across bacterial membranes, were significantly up-regulated in response to lignan doses and may be responsible for funneling substrates or products across cell membranes. Ber: benzyl ether reductase; Glm: guaiacol lignan methyltransferase; Cld: catechol lignan dehydroxylase; Edl: enterodiol lactonizing enzyme; ABC: ATP-binding cassette; MFS: major facilitator superfamily.

18

Table S1. Production of END by Coriobacteriia strain collection END (µM) Strain n=3 Gordonibacter pamelaeae 3C 573 ± 52 Gordonibacter pamelaeae 28C 32 ± 9 Eggerthella lenta FAA1−1−60AU ND Eggerthella lenta FAA1−3−56 ND Eggerthella lenta 11C ND Eggerthella lenta 14A ND Eggerthella lenta 22C ND Eggerthella lenta 28B ND Eggerthella lenta DSM 11767 ND Eggerthella lenta DSM 11863 ND Eggerthella lenta DSM 15644 ND Eggerthella sinesis DSM 16107 ND Eggerthella lenta DSM 2243 ND* Eggerthella lenta Valencia ND Eggerthella lenta AN51LG ND Eggerthela lenta MR1 #12 ND Eggerthella lenta 32−6−I 6 NA ND Eggerthella lenta AB8 #2 ND Eggerthella lenta AB12 #2 ND Eggerthella lenta CC8/2 BHI2 ND Eggerthella lenta RC4/6F ND Eggerthella lenta CC7/5 D5 2 ND Eggerthella lenta CC8/6 D5 4 ND Eggerthella lenta W1 BHI 6 ND Paraeggerthella hongkongensis RC2/2 A ND Sterile Control ND** ND: not detected by HPLC *: n=2 **: n=6

19

Table S2. PINO metabolism by Coriobacteriia strains

PINO (µM) LAR (µM) SECO (µM) Strain Culture #1 Culture #2 Culture #3 Culture #1 Culture #2 Culture #3 Culture #1 Culture #2 Culture #3

E. lenta AB8#2 573 570 537 ND ND ND ND ND ND E. lenta DSM11767 564 545 539 ND ND ND ND ND ND E. lenta MR#12 536 535 558 ND ND ND ND ND ND E. lenta 22C 584 530 575 ND ND ND ND ND ND G. pamelaeae 3C 588 597 585 ND ND ND ND ND ND Gordonibacter sp. 28C 550 563 544 ND ND ND ND ND ND E. hongkongensis RC22A 637 631 703 ND ND ND ND ND ND E. lenta CC86D54 584 583 600 ND ND ND ND ND ND E. lenta 1-3-56FAA 275 272 272 ND ND ND ND ND ND E. lenta W1BHI6 589 576 576 29 27 24 ND ND ND E. sinesis DSM16107 517 474 523 67 77 26 13 16 ND E. lenta CC82BHI2 396 415 426 114 121 100 33 34 26 E. lenta CC75D52 359 375 428 201 180 155 104 81 55 E. lenta DSM11863 309 283 45 175 195 139 91 123 402 E. lenta DSM15644 ND 356 37 48 159 132 553 67 459 E. lenta Valencia ND 338 18 50 183 98 559 93 526 E. lenta 28B ND 50 162 49 154 198 558 437 219 E. lenta AB12#2 298 ND ND 192 38 32 114 587 556 E. lenta 14A 18 20 45 94 101 139 510 508 404 E. lenta AN51LG 17 52 9 99 161 63 563 456 547 E. lenta 1-1-60FAA 21 49 8 111 156 60 554 453 597 E. lenta RC46F 52 17 ND 157 91 46 450 533 654 E. lenta 11C 11 10 8 74 69 53 566 535 552 E. lenta 326IFAA 7 9 10 53 61 66 574 556 584 E. lenta DSM2243 ND ND 11 34 41 71 585 596 515 ND: not detected by HPLC

20

Table S3. Gene annotation for the top 10 gene candidates from our comparative genomic analysis of PINO metabolism across our Coriobacteriia strain collection Orthologous gene (OG) cluster Gene annotation Metabolizers with gene Non-metabolizers with gene OG6080_191 Fumarate/Urocanate reductase (ber) 16 0 OG6080_190 Transcriptional regulatory protein LiaR, LuxR family (berR) 16 0 OG6080_358 Cinnamate reductase 14 1 OG6080_330 Fumarate reductase flavoprotein subunit 14 1 OG6080_332 Protein-ADP-ribose hydrolase 14 1 OG6080_3483 Hypothetical protein 11 0 OG6080_310 Hypothetical protein 15 2 OG6080_2552 Hypothetical protein 13 1 OG6080_1936 Hypothetical protein 14 2 OG6080_1937 Cysteine desulfurase IscS 14 2

21

Table S4. Differentially expressed genes upon exposure of Eggerthella lenta DSM 2243 to PINO relative to vehicle (>|4|-fold difference, FDR<0.01) log2(fold locus_tag change) FDR Contig Protein ID Product ELEN_RS01850 11.38 0.00E+00 NC_013204 WP_015759938.1 FAD-binding dehydrogenase (Ber) ELEN_RS15955 2.44 1.83E-03 NC_013204 WP_057385171.1 hypothetical protein ELEN_RS03540 2.17 2.03E-10 NC_013204 WP_015760107.1 acetate--CoA ligase ELEN_RS03545 2.11 1.21E-09 NC_013204 WP_009306457.1 hypothetical protein

22

Table S5. Differentially expressed genes upon exposure of Blautia producta DSM 3507 to SECO relative to vehicle (>|4|-fold difference, FDR<0.01) log2(fold locus_tag change) FDR Contig Protein ID Product C3R19_12985 3.65 1.27E-28 NODE_139 PRJNA431659:C3R19_12985 uroporphyrinogen decarboxylase (Glm) C3R19_08885 3.41 1.66E-53 NODE_80 PRJNA431659:C3R19_08885 hypothetical protein C3R19_08890 3.30 3.34E-89 NODE_80 PRJNA431659:C3R19_08890 ABC transporter ATP-binding protein C3R19_12980 3.28 1.03E-23 NODE_139 PRJNA431659:C3R19_12980 Na+/melibiose symporter and related transporter C3R19_08895 3.15 3.79E-80 NODE_80 PRJNA431659:C3R19_08895 ABC transporter ATP-binding protein C3R19_02795 2.20 1.00E-25 NODE_17 PRJNA431659:C3R19_02795 radical SAM protein C3R19_04525 -3.20 5.88E-03 NODE_30 PRJNA431659:C3R19_04525 MFS transporter C3R19_04530 -3.46 3.29E-05 NODE_30 PRJNA431659:C3R19_04530 uroporphyrinogen decarboxylase

23

Table S6. Differentially expressed genes upon exposure of Gordonibacter pamealeae 3C to dmSECO relative to vehicle (>|4|-fold difference, FDR<0.01) log2(fold locus_tag change) FDR Contig Protein ID Product C1877_07245 12.94 1.55E-68 NODE_4 PRJNA412637:C1877_07245 Oxidoreductase (Cld1) C1877_07255 12.19 4.43E-102 NODE_4 PRJNA412637:C1877_07255 hypothetical protein C1877_07250 11.93 0.00E+00 NODE_4 PRJNA412637:C1877_07250 Dehydrogenase (Cld2) C1877_07260 11.35 0.00E+00 NODE_4 PRJNA412637:C1877_07260 hypothetical protein C1877_07200 8.26 0.00E+00 NODE_4 PRJNA412637:C1877_07200 DUF3641 domain-containing protein C1877_07190 8.19 2.82E-161 NODE_4 PRJNA412637:C1877_07190 ABC transporter ATP-binding protein C1877_07205 8.16 0.00E+00 NODE_4 PRJNA412637:C1877_07205 GTPase (G3E family) C1877_07195 7.84 7.50E-269 NODE_4 PRJNA412637:C1877_07195 ABC transporter permease C1877_07185 7.67 0.00E+00 NODE_4 PRJNA412637:C1877_07185 ABC transporter substrate-binding protein C1877_07240 7.57 0.00E+00 NODE_4 PRJNA412637:C1877_07240 hypothetical protein C1877_07210 6.84 0.00E+00 NODE_4 PRJNA412637:C1877_07210 methyltransferase type 11 C1877_07265 6.25 9.81E-19 NODE_4 PRJNA412637:C1877_07265 hypothetical protein C1877_07180 5.72 1.42E-279 NODE_4 PRJNA412637:C1877_07180 DUF1847 domain-containing protein C1877_07270 4.24 2.49E-39 NODE_4 PRJNA412637:C1877_07270 peptidylprolyl isomerase C1877_07235 3.79 7.81E-104 NODE_4 PRJNA412637:C1877_07235 hypothetical protein C1877_00920 10.10 0.00E+00 NODE_1 PRJNA412637:C1877_00920 nitrate ABC transporter substrate-bindingprotein C1877_00925 9.85 1.25E-291 NODE_1 PRJNA412637:C1877_00925 ABC transporter permease C1877_00915 9.69 9.49E-245 NODE_1 PRJNA412637:C1877_00915 ABC transporter ATP-binding protein C1877_00905 8.55 0.00E+00 NODE_1 PRJNA412637:C1877_00905 radical SAM protein C1877_00910 8.54 0.00E+00 NODE_1 PRJNA412637:C1877_00910 ABC transporter permease C1877_00930 6.93 0.00E+00 NODE_1 PRJNA412637:C1877_00930 ABC transporter ATP-binding protein C1877_00900 5.06 2.17E-167 NODE_1 PRJNA412637:C1877_00900 hypothetical protein C1877_00945 3.17 1.68E-103 NODE_1 PRJNA412637:C1877_00945 hypothetical protein C1877_00940 2.92 1.27E-80 NODE_1 PRJNA412637:C1877_00940 hypothetical protein C1877_00895 2.88 3.04E-81 NODE_1 PRJNA412637:C1877_00895 MFS transporter C1877_00935 2.11 2.82E-113 NODE_1 PRJNA412637:C1877_00935 hypothetical protein C1877_04445 5.73 0.00E+00 NODE_2 sodium:solute symporter C1877_04450 2.91 1.21E-168 NODE_2 PRJNA412637:C1877_04450 hypothetical protein C1877_09910 5.00 0.00E+00 NODE_6 PRJNA412637:C1877_09910 hypothetical protein C1877_09915 4.06 0.00E+00 NODE_6 PRJNA412637:C1877_09915 radical SAM protein C1877_09905 3.88 0.00E+00 NODE_6 PRJNA412637:C1877_09905 ABC transporter ATP-binding protein C1877_09900 2.37 3.99E-166 NODE_6 PRJNA412637:C1877_09900 hypothetical protein C1877_13835 4.66 0.00E+00 NODE_10 PRJNA412637:C1877_13835 DUF3641 domain-containing protein C1877_13840 4.21 2.80E-274 NODE_10 PRJNA412637:C1877_13840 4-carboxymuconolactone decarboxylase C1877_13970 2.15 2.28E-152 NODE_10 PRJNA412637:C1877_13970 amidohydrolase C1877_13780 2.11 9.45E-90 NODE_10 PRJNA412637:C1877_13780 hypothetical protein C1877_13785 2.10 6.91E-82 NODE_10 PRJNA412637:C1877_13785 4Fe-4S ferredoxin

24

Table S7. Differentially expressed genes upon exposure of Lactonifactor longoviformis DSM 17459 to END relative to vehicle (>|4|-fold difference, FDR<0.01) log2(fold locus_tag change) FDR Contig Protein ID Product BUA56_RS20800 8.88 0.00E+00 NZ_FQVI01000041 WP_072854690.1 SDR family NAD(P)-dependent oxidoreductase (Edl2) BUA56_RS20805 8.79 0.00E+00 NZ_FQVI01000041 WP_072854691.1 MFS transporter BUA56_RS20795 8.49 0.00E+00 NZ_FQVI01000041 WP_084068179.1 Fe-S oxidoreductase (Edl1) BUA56_RS11690 2.40 1.42E-02 NZ_FQVI01000011 WP_084067899.1 Fur family transcriptional regulator BUA56_RS20810 2.31 2.13E-158 NZ_FQVI01000041 WP_072854692.1 LysR family transcriptional regulator BUA56_RS11695 2.02 1.94E-02 NZ_FQVI01000011 WP_072851898.1 FeoB-associated Cys-rich membrane protein BUA56_RS08390 -2.10 2.34E-07 NZ_FQVI01000007 WP_072850830.1 tryptophan synthase subunit alpha BUA56_RS08395 -2.20 7.56E-10 NZ_FQVI01000007 WP_072850832.1 TetR/AcrR family transcriptional regulator

25

Table S8. Stool sample metadata and tabulation of genes detected/not detected, therein

Subject ID Gender Age Ethnicity BMI Weight (Kg) Diabetes ber glm cld2 edl1 edl2

OB001 Female 46 Hispanic 34.0 78.8 Y Detected Detected Detected Detected Detected OB002 Female 44 Hispanic 45.0 110.9 Y Detected Detected Detected Detected Detected OB003 Female 34 Hispanic 22.5 65.0 N Detected Detected Detected Detected Detected OB004 Male 42 Hispanic 29.1 73.0 N Detected Detected Detected Detected Detected 0B005 Male 57 Hispanic 31.4 105.0 Y Detected Detected Detected Detected Detected OB006 Male 62 Caucasian 30.2 71.2 N Detected Detected Detected Detected Detected OB008 Female 57 Caucasian 41.3 105.6 N Detected Detected Detected Detected Detected OB009 Female 51 Hispanic 34.5 78.6 Y Detected Detected Detected Detected Detected OB010 Female 39 Hispanic 38.1 96.8 N Detected Detected Detected Detected Detected OB011 Male 58 Caucasian 39.9 121.6 Y Detected Detected Detected Detected Detected OB012 Female 42 Chinese 34.1 79.4 Y Detected Detected Detected Detected Detected OB014 Female 30 Caucasian 48.5 136.0 Y Detected Detected Detected Detected Detected OB016 Female 57 Hispanic 39.8 101.4 N Detected Not Detected Not Detected Detected Detected OB018 Female 61 Caucasian 50.7 118.6 N Detected Detected Detected Detected Detected OB020 Male 62 Caucasian 19.2 58.2 N Detected Detected Detected Detected Detected OB021 Male 44 Chinese 36.5 109.6 Y Detected Detected Detected Detected Detected OB022 Female 47 Chinese 26.8 72.8 N Detected Detected Detected Detected Detected OB024 Male 25 Chinese 32.8 104.2 N Detected Detected Detected Detected Detected OB026 Male 40 Chinese 22.4 61.0 N Detected Detected Detected Detected Detected 0B027 Female 45 Chinese 20.1 52.8 N Detected Not Detected Detected Detected Detected OB028 Male 52 Caucasian 47.1 138.4 Y Detected Detected Detected Detected Detected OB030 Male 51 Chinese 24.5 79.8 Y Detected Detected Detected Detected Detected OB031 Male 44 Hispanic 29.8 84.0 N Detected Detected Detected Not Detected Detected OB032 Male 54 Hispanic 48.0 130.8 Y Detected Detected Detected Detected Detected OB034 Male 62 Hispanic 30.5 81.2 Y Detected Detected Detected Detected Detected OB035 Female NA Chinese 19.8 45.2 N Detected Detected Detected Not Detected Detected OB036 Female 53 Caucasian 37.4 86.4 Y Detected Detected Detected Detected Detected OB037 Male 33 Hispanic 42.9 126.6 N Detected Not Detected Not Detected Detected Detected OB039 Female 58 Caucasian 35.0 100.0 N Detected Detected Detected Detected Detected OB040† Female 64 Chinese 21.8 47.4 Prediabetic Detected Detected Detected Detected Detected OB041† Female 59 Chinese 23.4 63.2 N Detected Detected Detected Detected Detected OB042 Male 63 Caucasian 25.2 74.2 N Detected Detected Detected Detected Detected OB043 Female 37 Chinese 40.3 94.0 NA Detected Detected Detected Detected Detected OB044 Male 55 Chinese 24.4 65.3 Y Detected Detected Detected Detected Detected OB045 Female 37 Caucasian 22.6 63.0 N Detected Detected Detected Detected Detected OB046 Female 52 Caucasian 30.1 85.0 N Detected Detected Detected Detected Detected OB047 Female 63 Hispanic 41.2 90.8 N Not Detected Detected Detected Detected Detected OB050 Male 29 Caucasian 36.8 110.0 N Detected Detected Not Detected Detected Detected OB051 Female 52 Chinese 19.7 45.2 N Detected Detected Not Detected Detected Detected OB053 Female 54 Hispanic 42.5 83.8 N Detected Detected Not Detected Detected Detected OB054 Female 28 Hispanic 32.9 80.6 N Detected Detected Not Detected Detected Detected OB055 Male 37 Chinese 28.3 78.0 N Detected Detected Not Detected Detected Detected OB056 Female 57 Caucasian 43.0 124.6 Y Detected Detected Not Detected Detected Detected OB057 Female 50 Chinese 20.2 57.8 Gestational Detected Detected Detected Detected Detected 26

Table S8, continued. Stool sample metadata and tabulation of genes detected/not detected, therein

Subject ID Gender Age Ethnicity BMI Weight (Kg) Diabetes ber glm cld2 edl1 edl2

OB058† Female 27 Hispanic 15.1 48.2 N Detected Detected Detected Detected Detected OB059 Female 57 Caucasian 20.4 45.0 N Not Detected Not Detected Not Detected Detected Detected OB063† Male 33 Caucasian 24.3 87.0 N Detected Detected Detected Detected Detected OB064 Female 30 Caucasian 20.5 55.8 N Detected Detected Detected Detected Detected OB065 Female 58 Hispanic 24.2 63.0 N Detected Detected Detected Detected Detected OB066 Female 25 Caucasian 19.2 55.2 N Detected Not Detected Detected Detected Detected OB067 Male 52 Caucasian 37.8 128.2 N Not Detected Detected Detected Not Detected Detected OB068 Female 26 Caucasian 20.6 63.0 N Detected Detected Detected Detected Detected OB069 Male 32 Caucasian 20.7 76.2 N Detected Detected Detected Detected Detected OB070 Female 31 Chinese 22.2 57.6 N Detected Detected Detected Detected Detected 0B071† Female 29 Caucasian 42.9 121.8 N Detected Detected Detected Detected Detected OB072 Female 59 Chinese 26.8 68.0 N Detected Detected Detected Detected Detected OB073 Female 27 Hispanic 18.7 47.6 N Detected Detected Not Detected Detected Detected OB076 Male 46 Hispanic 33.7 93.4 N Detected Not Detected Detected Detected Detected OB077 NA NA NA NA NA NA Detected Detected Detected Detected Detected OB078 Female 32 Chinese 24.4 62.0 N Detected Detected Detected Detected Detected OB079 Male 53 Chinese 29.0 77.8 Prediabetic Not Detected Detected Not Detected Detected Detected OB080 Female 31 Hispanic 24.3 73.0 N Not Detected Not Detected Not Detected Detected Not Detected OB081 Female 31 Caucasian 24.1 65.0 N Detected Detected Detected Detected Detected OB082 Female 53 Caucasian 21.8 56.8 N Detected Detected Detected Detected Detected OB084 Male 27 Chinese 24.3 67.0 N Detected Detected Detected Detected Detected OB085 Female 30 Chinese 28.7 73.2 N Detected Not Detected Detected Detected Detected OB086 Male 41 Hispanic 23.4 68.6 NA Detected Not Detected Detected Detected Detected OB088 Male 64 Chinese 46.8 140.9 N Detected Detected Not Detected Detected Detected †PCR products for each target gene were Sanger sequenced.

27

Table S9. Primers for qRT-PCR and cloning experiments Experiment Gene Forward primer sequence (5'-3') Reverse primer sequence (5'-3') qRT-PCR Eggerthella lenta DSM2243 ber GGC AAC TAC AAT TCC GCC AC GCA TCG GTG TCG TAG GTC AT qRT-PCR Eggerthella lenta DSM2243 ffh* AGG AGG AAG TGG AGC AGG AG TCT TGC GGA TCT GTC GGT TC qRT-PCR Eggerthella lenta DSM2243 gapdh* ATC GTG GAT AAC CCG CAC TC ACC AGG AGA TGC ACT TGA CG qRT-PCR Eggerthella lenta DSM2243 gyrB* ACT GCG ACG AGA TCA AGG TG TCT TTC GGG TGC TTG TCG AT qRT-PCR Blautia producta DSM3507 glm TTC GGT ACC CCT TAC TCC GT CCG GGA AGG AAA TGG TGA CA qRT-PCR Blautia producta DSM3507 ffh* AAA AGC AGG GCG TGG AAG TA CAT GCT CGT ATG CTG CCT TG qRT-PCR Blautia producta DSM3507 gapdh* ATC ATC TCC GCA GCT TCC TG ATT TGA TCG GAG CCA GGT CG qRT-PCR Blautia producta DSM3507 gyrB* CTT TCC CAG CGT GCA AGA GA GTT TGC CCG GAA GGG ACA TT qRT-PCR Gordonibacter pamelaeae 3C cld2 GTG GAT CCA GTC CTC CAA CC ACG ATG AAC GGC GTC TTG AA qRT-PCR Gordonibacter pamelaeae 3C ffh* TGG ACT TCG ACG GCG TAA TC GCC GGA CGA GAT GAA CTT GA qRT-PCR Lactonifactor longoviformis DSM17459 edl2 ACC CCA AAA AGG GAA ACC GT TGG CGA CAT TAA GAG GAC CG qRT-PCR Lactonifactor longoviformis DSM17459 edl1 TAA AAG TAT GCC CCC AGC GG CTG CGC CTC CGG ATT ATA GG qRT-PCR Lactonifactor longoviformis DSM17459 ffh* ACA GGA TGT TAT GAC GGG GC TCA TTG CCC GGT TTG AAT GC qRT-PCR Lactonifactor longoviformis DSM17459 gapdh* ATG ACT CTG GAC GGA CCT CA ATT GCC TTA GCA GCA CCT GT qRT-PCR Lactonifactor longoviformis DSM17459 gyrB* TGT ATC TGG CTC AGC CTC CT TAT CGC GGC CAA TCT CTG AC cloning Eggerthella lenta DSM2243 ber ATG CAT GGT ACC GCA ACC GCA TCC ATG CAT GGA TCC TTA CGC TTT GTC GCC CAG GGC G** CA** *Designates housekeeping gene **Nucleotides in bold and under-lined designate the restriction site

28

Table S10. PCR primers for culture-independent detection of genomic loci associated with lignan metabolism Gene Forward primer sequence (5'-3') Reverse primer sequence (5'-3') Amplicon size (bp) Eggerthella lenta DSM2243 ber GGTACATCAGCCACTTCGAC GATGGTCATCATCATCTCGC 866 Blautia producta DSM3507 glm ATTAGAAACCATCCGCGGAGG TCCCAGTCTGTGATGTAGTCGA 453 Gordonibacter pamelaeae 3C cld2 CCACGACTTCATCGACAAG GCTGATGTAGAACTCGTAGC 967 Lactonifactor longoviformis DSM17459 edl1 CGTCTGCCCATCTCAGATCC ACCGGTCTACCAGTGCTTTG 166 Lactonifactor longoviformis DSM17459 edl2 ACCCCAAAAAGGGAAACCGT GGGTTAAAGGCTCCGTTGGA 331

29

Table S11. Alignment of PCR amplicons to gene sequences from draft genomes Genomic DNA (positive controls) Human samples (average ± SEM) Gene % Identity % Coverage % Identity % Coverage (n=1) (n=1) (n=5) (n=5) ber 100 100 94.8 ± 0.6 99.6 ± 0.4 glm 99 99 96.8 ± 1.5 99.6 ± 0.2 cld2 99 100 98.5 ± 0.5* 99.3 ± 0.5* edl1 99 100 93.8 ± 0.7 95.4 ± 2.9 edl2 98 99 91.6 ± 0.2 99.6 ± 0.2 *n=4

30

References 1. J. E. Bisanz et al., Illuminating the microbiome's dark matter: a functional genomic toolkit for the study of human gut . bioRxiv, (2018). 2. T. H. Mäkelä, K. T. Wähälä, T. A. Hase, Synthesis of enterolactone and enterodiol precursors as potential inhibitors of human estrogen synthetase (aromatase). Steroids 65, 437-441 (2000). 3. S. Anders, P. T. Pyl, W. Huber, HTSeq—a Python framework to work with high- throughput sequencing data. Bioinformatics 31, 166-169 (2015). 4. S. Anders, W. Huber, Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). 5. D. L. Alba et al., Subcutaneous fat fibrosis links obesity to insulin resistance in Chinese- Americans. J. Clin. Endocrinol. Metab., jc.2017-02301 (2018). 6. N. Segata, D. Börnigen, X. C. Morgan, C. Huttenhower, PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

31