Discovery and Characterization of Glycyl Radical Enzymes Found in the Human Gut Microbiota and Other Environments

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Levin, Benjamin Joseph. 2019. Discovery and Characterization of Glycyl Radical Enzymes Found in the Human Gut Microbiota and Other Environments. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:41121264

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

Discovery and Characterization of Glycyl Radical Enzymes Found in the Human Gut

Microbiota and Other Environments

A dissertation presented

by

Benjamin Joseph Levin

to

The Department of Chemistry and Chemical Biology

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in the subject of

Chemistry

Harvard University

Cambridge, Massachusetts

January 2019

© 2019 – Benjamin Joseph Levin

All rights reserved. Dissertation advisor: Professor Emily P. Balskus Benjamin Joseph Levin

Discovery and Characterization of Glycyl Radical Enzymes Found in the Human Gut

Microbiota and Other Environments

Abstract

The human gastrointestinal tract is home to trillions of microorganisms, and these microbes have important impacts on human health. Despite the many known links between gut microbiota composition and host biology, the molecular mechanisms underlying these interactions are largely unknown and there are no general strategies to elucidate them. Glycyl radical enzymes (GREs) had been previously identified in the human gut microbiota, but their functions in this environment are largely unknown. This thesis presents our work towards characterizing the chemistry performed by GREs and their impacts on human health. Our results illustrate how integrating a biochemical and mechanistic understanding of enzymes and metagenomic sequencing data can lead to insight into host-gut microbial symbioses.

Chapter 2 details the development and application of a metagenomic analysis workflow called chemically guided functional profiling to identify and quantify the GREs present in healthy human microbiomes. Our methodology combined biochemical knowledge of this protein superfamily with metagenomic sequencing data, and with this approach we computed abundances of specific GREs in these environments. We prioritized targets for biochemical characterization based on their abundance in the human gut and characterized two enzymes previously of unknown function, propanediol dehydratase (PD) and trans-4-hydroxy-L-proline dehydratase, demonstrating how chemically guided functional profiling can reveal novel biochemistry in complex microbial communities.

iii

Chapter 3 describes the mechanistic characterization of PD. Intriguingly, a B12-dependent propanediol dehydratase (B12-PD) catalyzes the same overall reaction as PD but requires a different cofactor. To probe the mechanisms of these enzymes, we synthesized all four possible

18O-labeled 1,2-propanediol stereoisomers in high enantiomeric excess and used these compounds as substrates for PD and B12-PD. The results suggest PD catalyzes direct elimination of the C2 hydroxyl group from an initial substrate-based radical, while B12-PD instead mediates a 1,2-hydroxyl group migration. These experiments clarify how PD and other GREs important to human health perform challenging transformations and reveal mechanistic differences between

GREs and B12-dependent enzymes that have evolved identical functions.

Chapter 4 details our efforts to further study protein families in metagenomic datasets.

Complementing our work quantifying GREs in metagenomic reads, we searched for additional

GREs in metagenomic assemblies and identified genes encoding for GREs not previously found in any sequenced microbial genome. We also applied chemically guided functional profiling to additional protein families. Our results highlight how existing metagenomics datasets can be mined to find both previously sequenced as well as novel enzymes.

Chapter 5 presents our discovery and initial characterization of a new GRE catalyzing the decarboxylation of indole-3-acetate to skatole (3-methylindole). The production of skatole by boar and cattle gut microbes is important to the agricultural industry, but the enzymes involved in this pathway have not been characterized. Using comparative genomics and knowledge of

GRE biochemistry, we discovered two GRE indole-3-acetate decarboxylases (IADs) and validated that they catalyze the expected reactions. These findings are important for developing methods to inhibit skatole biosynthesis and for further characterizing the scope of reactions performed by GREs.

iv

Table of Contents

Abstract ...... iii

Table of Contents ...... v

Acknowledgments...... xi

List of Figures ...... xiii

List of Tables ...... xv

Chapter 1: Introduction to the human gut microbiota and to the biochemistry of glycyl radical enzymes...... 1

1.1: General introduction to the human gut microbiota ...... 1

1.2: Advances in sequencing technology have empowered studies of microbial communities ...... 3

1.3: Enzyme discovery in microbial genomes and metagenomes ...... 5

1.4: Introduction to glycyl radical enzymes (GREs): discovery and characterization of pyruvate formate-lyase...... 7

1.5: Glycyl radical enzymes use similar mechanisms to catalyze reactions ...... 11

1.6: Discovery and roles of glycyl radical enzymes in microbiotas and metagenomes ... 16

1.7: Microbial metabolism of L-fucose ...... 20

1.8: References ...... 22

Chapter 2: Quantification of glycyl radical enzymes in healthy human gut metagenomes and characterization of abundant dehydratases ...... 42

2.1: Introduction ...... 42

2.2: Results and discussion ...... 45

2.2.1: GRE sequence similarity network construction...... 45

2.2.2: Integrating the GRE SSN with quantitative metagenomics...... 49

2.2.3: Abundances and distributions of characterized GREs in healthy human microbiomes ...... 52

2.2.4: Previously uncharacterized GREs were profiled using CGFP ...... 54

v

2.2.5: Analysis of metatranscriptomes for GREs...... 56

2.2.6: Biochemical characterization of propanediol dehydratase ...... 57

2.2.7: Identification of conserved residues in dehydrating GREs ...... 61

2.2.8: Comparing abundances of PD and B12-PD in the healthy human gut microbiome ...... 64

2.2.9: Discovery and characterization of trans-4-hydroxy-L-proline dehydratase 65

2.2.10: Conclusions ...... 70

2.3: Materials and methods ...... 71

2.3.1: General materials and methods ...... 71

2.3.2: Construction of GRE SSNs...... 72

2.3.3: Determination of enzyme abundances in metagenomes and metatranscriptomes ...... 74

2.3.4: Statistical analysis ...... 78

2.3.5: Cloning of expression plasmids for PD and PD-AE ...... 78

2.3.6: Site-directed mutagenesis of PD ...... 80

2.3.7: Overexpression and purification of PD ...... 81

2.3.8: Overexpression and purification of PD-AE ...... 83

2.3.9: Construction of PD homology model ...... 85

2.3.10: Construction of a t4LHypD homology model ...... 86

2.3.11: Compilation of sequenced prokaryotes encoding t4LHypD ...... 87

2.3.12: Cloning of expression plasmids for t4LHypD, t4LHypD-AE, and P5C reductase ...... 87

2.3.13: Generation of a P5CR deletion mutant in E. coli ...... 88

2.3.14: Overexpression and purification of t4LHypD ...... 90

2.3.15: Overexpression and purification of t4LHypD-AE ...... 92

2.3.16: Overexpression and purification of P5CR ...... 94

vi

2.3.17: Glycyl radical detection and quantification by EPR spectroscopy ...... 95

2.3.18: Synthesis of 5-deazariboflavin and enantioenriched propanediols ...... 97

2.3.19: GC–MS assay for PD activity ...... 97

2.3.20: Coupled spectrophotometric assay for PD kinetics ...... 98

2.3.21: Spectrophotometric assay for detection of P5CR activity ...... 99

2.3.22: Coupled spectrophotometric assay for t4LHypD kinetics ...... 100

2.3.23: LC–MS/MS assays for t4LHypD enzymatic activity ...... 100

2.4: References ...... 102

Chapter 3: Biochemical characterization of propanediol dehydratases reveals distinct mechanisms of 1,2-diol dehydration by glycyl radical and B12-dependent enzymes ..... 112

3.1: Introduction ...... 112

3.2: Results and discussion ...... 115

3.2.1: PD-AE has one [4Fe–4S] cluster that is similar to other GRE activating enzymes...... 115

3.2.2: PD-AE catalyzes the formation of 5ʹ-deoxyadenosine from S-adenosylmethionine ...... 117

3.2.3: PD forms a dimer in solution ...... 119

3.2.4: PD catalyzes 1,2-diol dehydration of (S)-1,2-propanediol and other similar substrates ...... 121

3.2.5: Stable isotope labeling experiments support a direct elimination mechanism for PD ...... 123

3.2.6: Stable isotope labeling experiments confirm a hydroxyl group migration mechanism is operative for B12-PD ...... 126

3.2.7: Thermodynamic analysis of intermediates generated during 1,2-propanediol dehydration ...... 131

3.2.8: Mechanistic proposal for PD ...... 132

3.2.9: Conclusions ...... 134

3.3: Materials and methods ...... 135

vii

3.3.1: Materials and general methods ...... 135

3.3.2: UV–Vis spectroscopy of PD-AE ...... 136

3.3.3: Quantification of iron content of PD-AE ...... 136

3.3.4: Quantification of sulfide content of PD-AE ...... 137

3.3.5: HPLC assay for detection of S-adenosylmethionine cleavage products ... 137

3.3.6: LC–MS assays for detection of S-adenosylmethionine cleavage products ...... 138

3.3.7: Gel filtration chromatography for analysis of oligomeric state of PD ..... 139

3.3.8: GC–MS assays for accessing the substrate scope of PD ...... 140

3.3.9: GC–MS assays for determination of 18O-enrichment of 1-propanol ...... 141

3.3.10: Cloning of pduCDE for hetereologous overexpression ...... 143

3.3.11: Heterologous overexpression and purification of B12-PD ...... 145

3.3.12: Preparation of cell-free extracts from Klebsiella oxytoca ATCC 8724.. 146

3.3.13: Synthetic methods and characterization data ...... 147

3.4: References ...... 158

Chapter 4: Profiling glycyl radical enzymes and other enzyme families in metagenomes ...... 164

4.1: Introduction ...... 164

4.2: Results and discussion ...... 166

4.2.1: Quantifying glycyl radical enzymes in gut metagenomes from healthy and cirrhotic patient cohorts ...... 166

4.2.2: Quantifying glycyl radical enzymes in pig gut metagenomes ...... 170

4.2.3: Comparing folate biosynthetic gene abundance in infant gut metagenomes ...... 172

4.2.4: Mining assembled metagenomes for new glycyl radical enzymes ...... 176

4.2.5: Conclusions ...... 184

4.3: Materials and methods ...... 186

viii

4.3.1: General materials and methods ...... 186

4.3.2: Bioinformatics methods for comparative metagenomic profiling ...... 187

4.3.3: Quantifying GREs in pig gut metagenomes ...... 187

4.3.4: Quantifying genes in metagenomes using USEARCH ...... 188

4.3.5: Identifying genes in metagenomic assemblies...... 190

4.4: References ...... 190

Chapter 5: Discovery of a glycyl radical enzyme catalyzing indole-3-acetate decarboxylation 197

5.1: Introduction ...... 197

5.2: Results and discussion ...... 200

5.2.1: Identification of a putative indole-3-acetate decarboxylase (IAD) in Olsenella uli ...... 200

5.2.2: Identification of a putative IAD in Clostridium scatologenes ...... 204

5.2.3: Construction of expression hosts constitutively expressing genes involved in iron-sulfur cluster assembly ...... 205

5.2.4: Confirmation of the activity of O. uli IAD (OuIAD) ...... 206

5.2.5: Confirmation of the activity of C. scatologenes IAD (CsIAD) ...... 209

5.2.6: Conclusions ...... 211

5.3: Materials and methods ...... 211

5.3.1: General materials and methods ...... 211

5.3.2: Cloning of IAD expression vectors ...... 212

5.3.3: Construction of expression hosts constitutively expressing the isc operon ...... 215

5.3.4: Heterologous overexpression and purification of IADs ...... 217

5.3.5: Heterologous overexpression and purification of IAD-AEs and rSAM2 . 219

5.3.6: Detection and quantification of glycyl radicals by EPR spectroscopy ..... 221

5.3.7: HPLC assay for detection of indole-3-acetate and skatole from enzyme assays ...... 221

ix

5.4: References ...... 222

x

Acknowledgments

I must first acknowledge the role my advisor Professor Emily P. Balskus has played during my graduate studies. When I joined her research group, she suggested I study the mechanism of an unusual enzyme. I had no idea just how important, abundant, and diverse this enzyme and its homologs would turn out to be, and I am very lucky that I was able to spend my graduate studies researching them. Professor Balskus’s scientific expertise in all fields related to chemistry and the multidisciplinary research group she has built have made me a well-rounded scientist. Her mentorship and her support of my scientific and professional development were essential for my completing the work presented in this dissertation. I am very thankful for the time I have spent in her .

I also thank the other members of my graduate advising committee, Professor Eric N.

Jacobsen and Professor Catherine L. Drennan. Their scientific advice and professional support over the past few years have been invaluable. In addition, Professor E. Neil G. Marsh and

Professor Brian P. Coppola, my mentors during my undergraduate years at the University of

Michigan, encouraged my pursuit of a career in scientific research, and I thank them for their guidance and support.

The training provided by every other member of the Balskus group, and particularly Dr.

Smaranda Bodea, Dr. Spencer C. Peck, and Dr. Hitomi Nakamura, has been essential for my scientific development, and I thank them, particularly for their patience. I am also appreciative of the people I formally or informally collaborated with, including Dr. Yolanda Huang, Dr. Ana

Martínez-del Campo, Dr. Lauren Rajakovich, Carina Chittim, Beverly Fu, the team at Harvard

University Research Computing, Dr. Eric Franzosa, Professor John A. Gerlt, and Jennifer X.

Wang.

xi

I thank my friends from Harvard University, from the University of Michigan, from

Berkley High School, and from everywhere else. There are too many of you to list here, but your understanding, encouragement, and knowledge (scientific and otherwise) have made me a better person, and I truly appreciate it.

I sincerely thank my brother Aaron Levin and my parents Jan and Steve Levin. Their everlasting support has been essential to my success, and I am eternally grateful to them.

And a special thank you to Samantha J. Cassell, who has made my graduate school experience and all other aspects of my life better in every way.

xii

List of Figures

Figure 1.1: Overview of the human gut microbiota ...... 2 Figure 1.2: Proposed mechanism of PFL ...... 10 Figure 1.3: General mechanism of GREs ...... 13 Figure 1.4: Examples of GRE-catalyzed reactions ...... 14 Figure 1.5: Reaction catalyzed by and genomic context of CutC ...... 17 Figure 1.6: Discovery of GRE decarboxylases ...... 19 Figure 1.7: Bacterial L-fucose metabolism ...... 22 Figure 2.1: Outline of chemically guided functional profiling ...... 43 Figure 2.2: GRE SSNs constructed with varying percent identity cutoffs ...... 47 Figure 2.3: Multiple sequence alignment of selected GREs ...... 48 Figure 2.4: Subsets of SSNs showing clusters containing similar diol dehydratases ...... 48 Figure 2.5: SSN of the GRE superfamily ...... 49 Figure 2.6: Integration of SSNs and ShortBRED for CGFP ...... 51 Figure 2.7: CGFP of GREs in healthy human microbiomes ...... 52 Figure 2.8: Abundance of PFL in metagenomes ...... 53 Figure 2.9: Abundances of characterized GREs in metagenomes ...... 54 Figure 2.10: Per-site abundances of previously uncharacterized GREs ...... 55 Figure 2.11: Detection of GREs in paired meta-omics data ...... 56 Figure 2.12: Metabolism of L-fucose to propionate by gut microbes ...... 58 Figure 2.13: EPR spectra for wild-type PD and mutants activated by PD-AE ...... 59 Figure 2.14: Validation of activity of PD ...... 60 Figure 2.15: Identifying active site residues conserved in dehydrating GREs ...... 62 Figure 2.16: Overlays of the PD homology model with GD and PD crystal structures ...... 63 Figure 2.17: Comparison of the abundances of PD and B12-PD in stool metagenomes ...... 65 Figure 2.18: Gene cluster context and biological pathway for t4LHypD ...... 66 Figure 2.19: Spectrophotometric assay to detect P5CR activity ...... 68 Figure 2.20: Verification of t4LHypD activity ...... 68 Figure 2.21: Kinetic analysis of t4LHypD ...... 69 Figure 2.22: SDS-PAGE of purified PD, PD mutants, and PD-AE ...... 83 Figure 2.23: SDS-PAGE of purified t4LHypD, t4LHypD-AE, and P5CR ...... 92 Figure 3.1: Overview of 18O labeling experiments ...... 113 Figure 3.2: Consensus mechanism of B12-PD...... 114 Figure 3.3: UV–Vis spectrum of PD-AE (blue) and PD-AE (red) reduced with sodium dithionite ...... 117 Figure 3.4: HPLC assays for determination of SAM cleavage products by PD-AE ...... 118 Figure 3.5: LC–MS assays for determination of SAM cleavage products by PD-AE ...... 119 Figure 3.6: Purification of PD and determination of native molecular mass...... 120 Figure 3.7: GC–MS assays for PD activity with alternative potential diol substrates ...... 122 Figure 3.8: Design of 18O-labeling experiments to probe PD mechanism ...... 123 Figure 3.9: EI spectra of 1-propanol derived from the reaction of activated PD with different enantioenriched 1,2-propanediol isotopologues ...... 125 Figure 3.10: EI spectra of 1-propanol derived from the reaction of activated PD with additional enantioenriched 1,2-propanediol isotopologues ...... 126

xiii

Figure 3.11: EI spectra of 1-propanol derived from the reaction of B12-PD with different enantioenriched 1,2-propanediol isotopologues ...... 128 Figure 3.12: EI spectra ofEI spectra of 1-propanol derived from the reaction of B12-PD with additional enantioenriched 1,2-propanediol isotopologues ...... 129 Figure 3.13: EI spectra of 1-propanol derived from the reaction of cell-free extracts of K. oxytoca with different enantioenriched 1,2-propanediol isotopologues ...... 130 Figure 3.14: EI spectra of 1-propanol derived from the reaction of cell-free extracts of K. oxytoca with additional enantioenriched 1,2-propanediol isotopologues ...... 131 Figure 3.15: Proposed mechanism for PD ...... 133 Figure 3.16: Crystal structure of PD with substrate bound (PDB 5I2G) ...... 134 Figure 3.17: SDS-PAGE analysis of heterologously expressed B12-PD ...... 146 Figure 3.18: Representative small scale hydrolytic kinetic resolution to synthesize 1,2- (18O)propanediols ...... 149 Figure 3.19: Synthesis of 1,2-(2-18O)propanediols ...... 150 Figure 3.20: GC-MS (CI) of 1,2-(1-18O)propanediols to determine oxygen-18 enrichment ...... 154 Figure 3.21: GC-MS (EI) of 1,2-(2-18O)propanediols to determine oxygen-18 enrichment ...... 155 Figure 3.22: GC-FID analysis to measure enantioenrichment of 1,2-propanediols ...... 156 Figure 3.23: GC-FID analysis to measure enantioenrichment of 1,2-(1-18O)propanediols ...... 157 Figure 3.24: GC-FID analysis to measure enantioenrichment of 1,2-propanediols ...... 158 Figure 4.1: Principal component analysis (PCA) of GREs encoded in human gut metagenomes from a previous study of cirrhosis ...... 168 Figure 4.2: Abundance of CutC in metagenomes from healthy and cirrhosis cohorts ...... 169 Figure 4.3: Isethionate sulfite-lyase (IslA) in gut metagenomes from healthy and cirrhotic patients ...... 170 Figure 4.4: Abundances of characterized GREs in pig gut metagenomes ...... 172 Figure 4.5: Abundances of pabBC from Bifidobacterium and other organisms in infant gut metagenomes...... 175 Figure 4.6: Comparative analyses of pabBC abundances in infant gut metagenomes ...... 176 Figure 4.7: Example contig from a metagenomic assembly encoding a GRE ...... 180 Figure 4.8: Second contig encoding a GRE of unknown function ...... 182 Figure 5.1: Biological relevance of skatole production in livestock ...... 198 Figure 5.2: Similarities between I3A and 4-hydroxyphenylacetate decarboxylation ...... 202 Figure 5.3: Genomic context for the putative IAD in C. scatologenes...... 205 Figure 5.4: EPR spectrum of activated OuIAD ...... 207 Figure 5.5: Skatole production by OuIAD ...... 208 Figure 5.6: Skatole production by CsIAD ...... 210 Figure 5.7: Colony PCR for characterization of ΔiscR expression hosts ...... 216

xiv

List of Tables

Table 2.1: Primers used for cloning and site directed mutagenesis ...... 80 Table 3.1: Dehydration of 18O-labeled substrates ...... 124 Table 3.2: LC–MS analysis of standards used for assays ...... 139 Table 3.3: Primers used for cloning and sequencing ...... 144 Table 4.1: Predicted ORFs in a GRE-encoding contig from an HMP metagenome assembly .. 180 Table 4.2: Predicted ORFs in a second GRE-encoding contig ...... 183 Table 5.1: Primers used for cloning IADs, IAD-AE, and rSAM2 ...... 213 Table 5.2: Primers used for colony PCR of ΔiscR expression hosts ...... 216

xv

Chapter 1: Introduction to the human gut microbiota and to the biochemistry of glycyl radical enzymesi

1.1: General introduction to the human gut microbiota

The human gastrointestinal (GI) tract is home to trillions of microorganisms (~30 trillion microbial cells) that have important impacts on host health.1-2 Microbes have resided in the GI tracts of humans since the beginning of mammalian evolution, co-evolving with their hosts for millions of years, leading to an extraordinarily complex host-gut microbial symbiosis (Figure

1.1).3 Gut microbes ferment otherwise indigestible dietary components and process other xenobiotic substances that reach the GI tract either via the diet or other routes like biliary excretion.4 In turn, metabolites produced by gut microbes are taken up by host colonic epithelial cells and can reach systemic circulation. For example, the short-chain fatty acids (SCFAs), particularly butyrate, are the end products of gut microbial fermentation and are a key energy source for host colonic epithelial cells.5 Butyrate and other gut microbial metabolites are also critical for immune system development.6-8

On the other hand, dysbiosis of the gut microbiota, defined as a compositional imbalance of microbes in the GI tract, is associated with neurologic,9-10 respiratory,11-12 metabolic,13-16 hepatic,17 and cardiovascular illnesses,18-19 as well as more localized gastrointestinal disorders in human hosts.20 Perhaps the most well-characterized gut microbial dysbiosis is infection by

Clostridium difficile.21-22 Overuse of and loss of colonization resistance leads to overgrowth of this microorganism and extensive colitis.23 Nosocomial C. difficile infections have traditionally been treated with antibiotics, but recurrence is frequent. A promising treatment for these infections is fecal microbial transplantation, and this approach for reintroduction of i Parts of this chapter are adapted from the following publication: Levin, B. J.; Balskus, E. P. Discovering radical-dependent enzymes in the human gut microbiota. Curr. Opin. Chem. Biol. 2018, 47, 86-93. 1 commensal gut organisms has been successful in treating >90% of infected individuals.24 This example highlights the role of colonization resistance as a mechanism to limit pathogen expansion and prevent infections in the human gut.25-26

Figure 1.1: Overview of the human gut microbiota.

Despite the links between gut microbiota composition and human health, the molecular mechanisms underlying how gut microbes affect host biology are still largely uncharacterized.27-

28 A few examples of characterized host-gut microbial metabolic interactions illustrate their diversity and complexity. Irinotecan is a chemotherapeutic agent used to treat colorectal and pancreatic cancers.29 Irinotecan is a prodrug, and its hydrolysis to the active agent SN-38 by esterases in the host liver is necessary for its anticancer activity.30-32 SN-38 is glucuronidated by the host to yield the inactive metabolite SN-38G, which then enters the GI tract via biliary excretion. However, gut microbial β-glucuronidases hydrolyze SN-38G back to SN-38, which enters gut epithelial cells and causes severe GI toxicity. Selectively inhibiting these gut microbial enzymes with small molecules allows higher dosages of irinotecan to be taken in animals by preventing dose-limiting side effects.33-34 Other medications are also metabolized in a similar fashion and inhibition of β-glucuronidases may be a useful strategy to improve the effectiveness of other drugs.35-37 Another example from xenobiotic metabolism involves digoxin, a drug used

2 to treat various cardiovascular conditions. Reduction of the α,β-unsaturated lactone motif of this compound to the inactive metabolite (20R)-dihydrodigoxin by certain strains of Eggerthella lenta in the gut limits the efficacy of this pharmaceutical in individuals harboring these organisms.38 Understanding the organisms and enzymes responsible for this chemistry could guide more effective use of this drug.27,39 Finally, colibactin is a genotoxin produced by certain

Escherichia coli strains in the human gut. This metabolite induces DNA double-strand breaks in

HeLa cells,40 stimulates the development of colitis-associated colorectal cancer in animal models,41 and the presence of the colibactin biosynthetic gene cluster is correlated with increased rates of inflammatory bowel disease and colorectal cancer in humans.41 Although the structure of colibactin remains unknown, efforts to identify this natural product have led to insight into its mechanism of action.42-44 Ultimately, characterization of the underlying molecular mechanisms of these and other host-gut microbe metabolic interactions is essential for understanding and manipulating the impact of gut microbes on human health.

1.2: Advances in sequencing technology have empowered studies of microbial communities

The microbial community in the human gut is highly diverse, and characterizing the microorganisms in this environment has historically been exceptionally challenging. Classically, culture-based methods have been used to study gut microorganisms and their metabolic activities.45. However, these methods are time-consuming and can be difficult.46 Furthermore, cultured microbes typically represent only a small fraction of the microbial diversity in a given environment; some estimates suggest that <1% of bacteria can be readily cultivated.47-48 Even if all microbes were cultivatable, culturing every microbe in the human gut microbiota would be an impossible task, particularly given the extensive variability of gut microbiota composition between different individuals.49 Cultured isolates have and will continue to provide insight into

3 the gut microbiota, but methods to study this environment without culturing microorganisms are also needed.50-52

Fortunately, recent advances in DNA sequencing technology have provided new approaches to study the gut microbiota.53-55 The development of second generation sequencing methods, also known as massively parallel or next-generation sequencing, has enabled researchers to determine the sequences of complex mixtures of DNA. The first commercial instrument for next-generation sequencing was released in 2005, and this technology was rapidly applied to sequence the human gut microbiome, or the collection of all microbial genomes in this environment.56-57 With reference genomes available for many gut microbes, taxonomic profiling with 16S rRNA sequencing has become a standard tool for identifying the microbial taxa present in gut microbiota samples.58 The 16S rRNA genes encoded in a microbial community can be amplified, sequenced, and compared to 16S rRNA sequences in reference databases in order to identify which microorganisms are present in that environment.59 Although powerful, this taxonomic profiling method cannot reveal strain and species level differences, and given the extent of horizontal gene transfer in this environment,50 taxonomic profiling is only of limited value for functional profiling of microbial communities.

A significant development to arise from next-generation sequencing was shotgun metagenomics, the sequencing and analysis of all environmental DNA, for characterization of gut microbial communities. Pioneering large-scale metagenomics projects including

Metagenomics of the Human Intestinal Tract (MetaHIT)60 and the Human Microbiome Project

(HMP)61 involved metagenomic sequencing and analysis of hundreds of human samples. In addition to operational taxonomic unit (OTU) analysis to quantify microbial taxa in these samples, computational assembly of short sequencing reads was performed to combine

4 overlapping reads into longer, continuous contigs and scaffolds.54,62 With access to these longer stretches of sequencing data, gene catalogs for these microbiomes could be constructed, and sequences for all of the predicted open reading frames (ORFs) in these samples could be generated. The HMP, MetaHIT, and other metagenome-wide association studies have yielded a wealth of data about the connections between host health and gut microbiota composition.63

1.3: Enzyme discovery in microbial genomes and metagenomes

To fully understand the impact of gut microbes on human biology, the microbial enzymes and biochemical pathways that affect host biology must be identified and well-characterized.

However, our understanding of gut microbial enzymes is currently in its infancy. The scope of this problem is vast; most individuals’ gut microbiomes contain around 500,000 genes, of which nearly 300,000 are shared between many individuals.60 The remaining genes are highly variable between individuals and are drawn from among a set of ~3,000,000 genes. Compared to the human genome, which contains ~25,000 genes,64 the gut microbiome encodes a much wider range of biochemical functions.

It should not come as a surprise that the gut microbiota is such a large repository for novel enzymes. The huge assortment of metabolites generated and modified by host and microbial processes, and the range of molecules present in the human diets, provides many different opportunities for gut microbes to survive, and these organisms have evolved enzymes capable of unique reactivity to capitalize on these resources.28,65 For example, dietary and host- derived complex carbohydrates are an important resource for gut microbes.66 Accessing simple sugars from complex polysaccharides requires specific enzymes to cleave glycosidic linkages and catabolize the resulting sugars. Detailed characterization of these enzymes has already revealed interesting links between the gut microbiota and host diet.67 Other gut microbial

5 enzymes that metabolize amino acids have been linked to host immune and nervous systems changes.10,68-70

Radical-dependent enzymes are particularly abundant in the human gut microbiota, as is found in other anoxic microbial habitats,71-73 because their unique reactivity enables the catalysis of chemically challenging transformations that facilitate anaerobic metabolism.74 Some of these enzymes utilize metallocofactors to perform this difficult chemistry.75 For example, formylglycine-dependent sulfatases and the activating metalloenzymes required for installing the formylglycine cofactor are commonly encoded by gut microbes, enabling them to access sulfated glycans as carbon and energy sources.76-77 Metalloenzymes are involved in producing ribosomally synthesized and posttranslationally modified peptides (RiPPs), and these natural products are though to mediate colonization resistance by inhibiting pathogen growth.78-82

Finally, gut microbial metalloenzymes play important roles in the biosynthesis of vitamins including thiamine and biotin.83-85 This subset of metalloenzymes illustrates the breadth of reactivity utilized by gut microbes to survive in the human gut.

Despite these examples, efforts to analyze metagenomic sequencing data are hindered by the difficulty of connecting DNA sequences and enzymatic functions. It can be challenging to identify the genes present in these sequencing datasets. In large-scale metagenome-wide association studies, automated programs must be used to assemble reads into longer contigs and scaffolds, identify ORFs in these assemblies, and then functionally annotate these genes. Each step is fraught with challenges, and these problems do not account for difficulties with sample collection and sequencing.86 Metagenomic assembly, even with powerful tools like

SOAPdenovo, is computationally expensive, and construction of ideal assemblies, where all raw reads can be joined, is not yet feasible.62 Identifying genes from metagenomes is also only ~95%

6 accurate, and this number is lower for genes encoded by gut microbes that have not had their genomes sequenced.87

Strategies that rely on reference gut microbial genomes and taxonomic profiling are invaluable for identifying microbes in microbiomes, but they have difficulty translating this knowledge into detailed functional profiling.88-89 Even when gene identification is successful, strategies to functionally annotate these genes are limited. The current standard approach is to rely on similar genes represented in databases including KEGG (Kyoto Encyclopedia of Genes and Genomes), COG (Clusters of Orthologous Groups of proteins), and eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups).90-93 However, these databases are only of limited utility for annotating genes that diverge from well-characterized genes. For example, approximately 80% of genes in HMP metagenomes could not be assigned a metabolic function, and around half could not be given any annotation at all.61,94 Worsening matters, genes with annotations are often mapped to large enzyme superfamilies without consideration of the many different reactions an enzyme superfamily may perform.95 As many as 80% of enzymes within a superfamily can be uncharacterized or misannotated as a result of enzymatic functional diversity.96-97 Functional profiling strategies that accurately identify both characterized enzymes and enzymes of unknown functions in microbiomes are needed to characterize the important but unrecognized roles they play in microbial communities.

1.4: Introduction to glycyl radical enzymes (GREs): discovery and characterization of pyruvate formate-lyase

A group of enzymes known to be particularly abundant in the human gut microbiota is the glycyl radical enzyme (GRE) superfamily. As the first GRE to be discovered, pyruvate formate-lyase (PFL) is representative in many ways of the entire superfamily. It was reported in

7

1943 that E. coli metabolizes pyruvate and free Coenzyme A (CoA) to formate and acetyl-CoA, and that this process is the primary mode of anaerobic pyruvate dissimilation in E. coli.98 Despite this reaction’s importance, identification of the responsible enzyme took decades, in part because of its extreme oxygen sensitivity. The Knappe laboratory at Heidelberg University spent many years characterizing PFL, and they discovered the critical role of S-adenosylmethionine (SAM) in activating PFL,99 which other enzymes are required to activate PFL,100 and how pyruvate is converted to an acetyl-enzyme intermediate.101 After identifying how the Fe(II)-dependent, radical SAM activating enzyme (PFL-AE) cleaves SAM to 5ʹ-deoxyadeonsine,102-104 the Knappe group crucially discovered that PFL-AE installs a free radical into PFL.105-107 Advances in DNA and protein sequencing enabled the radical species to be identified as glycine-734, and this persistent glycyl radical was determined to be essential for catalytic activity.108-111 Further biochemical characterization revealed how PFL-AE selectively abstracts a hydrogen atom from

PFL.112-113 Following the subsequent structural characterization of PFL, a mechanism was proposed for this enzyme that was consistent with the prior biochemical work.114-115

Before it can perform catalysis, PFL must first be posttranslationally modified by

PFL-AE, yielding a persistent glycyl radical on the PFL protein backbone (Figure 1.2A).

PFL-AE utilizes a [4Fe–4S]+ cluster to reductively cleave SAM, producing a 5ʹ-deoxyadenosyl radical and L-methionine. PFL-AE is able to direct the 5ʹ-deoxyadenosyl radical intermediate to stereoselectively abstract a hydrogen atom from the α-carbon of a single glycine residue on

PFL.116-118 Activated PFL can then react with pyruvate and CoA (Figure 1.2B). Upon pyruvate binding to PFL, hydrogen atom transfers between G734, C419, and C418 occur, and the C418 thiyl radical attacks the C2 carbonyl of pyruvate. Collapse of this tetrahedral intermediate leads to acetylation of C418 and a formyl radical. Hydrogen atom abstraction by the formyl radical

8 results in formate formation and regenerates a thiyl radical on C419. CoA can now enter the PFL active site, and another hydrogen atom transfer followed by radical-mediated thioester exchange leads to the acetyl-CoA product and regeneration of the glycyl radical intermediate. Although the substrates and dual cysteine motif are specific to PFL and other α-ketoacid lyases,119 the overarching mechanism of GRE activation by GRE activating enzymes (GRE-AEs) and catalytic use of the protein-based radical species unites the GRE superfamily.

9

Figure 1.2: Proposed mechanism of PFL. (A) Mechanism of activation of PFL by PFL-AE. (B) Mechanism by which PFL converts pyruvate into formate and acetyl-CoA. Ade: Adenine, HS-CoA: Coenzyme A.

The use of radical chemistry hints at why GREs are used by microorganisms solely in anoxic environments. Exposure of activated GREs, including PFL, to oxygen causes cleavage of the protein backbone, yielding two peptide fragments.111 The unpaired electrons of dioxygen react rapidly with the glycyl radical species, irreversibly oxidizing it and leading to Cα–N bond 10 cleavage in the conserved, essential glycine residue.120 The resulting fragments cannot be reactivated on their own. However, E. coli encodes for an autonomous glycyl radical cofactor that can associate with the larger PFL-derived fragment and, upon reactivation by PFL-AE, recover the lost catalytic activity.121 If E. coli is exposed to oxidative stress, this glycyl radical

“spare part” would allow for rapid restoration of PFL activity. Although an interesting feature of

PFL, no autonomous glycyl radical cofactors have been identified for other GREs.

The challenging chemistry performed by GRE-AEs have made them a focus of study as well.122 These enzymes are members of the “radical SAM” enzyme superfamily, a large and functionally diverse family that utilize [4Fe–4S] clusters and SAM to perform reactions utilizing

122-123 radical chemistry. Members of this family contain a CX3CX2C motif (some with slight variations), and the three cysteines coordinate to three of the irons in a [4Fe–4S] cluster; the remaining iron is not linked to any protein side chains and is more labile.124 The cluster typically exists in the [4Fe–4S]2+ state, and reduction of this cluster to the [4Fe–4S]+ state renders it catalytically active.125 SAM binds to this [4Fe–4S] cluster,126-128 and the radical SAM enzyme then reductively cleaves SAM to yield a 5ʹ-deoxyadenosyl radical, which is responsible for the catalytic activity of the radical SAM enzymes. Over 100,000 different radical SAM enzymes have been identified and although they utilize a conserved mechanism for radical generation, subclasses of these enzymes contain additional domains and [4Fe–4S] clusters, endowing them with even greater chemical potential.129 Aside from their unique biochemistry, their roles in human health and disease have made them an intense subject of research.130-131

1.5: Glycyl radical enzymes use similar mechanisms to catalyze reactions

In addition to PFL, the study of other GREs has revealed that this large protein superfamily uses conserved chemistry to catalyze a diverse set of reactions (Figure 1.3).132-133 In

11 all GREs, an oxygen-sensitive glycyl radical species is installed in the active site by a cognate

GRE-AE.122,134 The conserved glycine is located near the C-terminus of GREs and is part of a conserved sequence of residues (RVXG). GRE-AEs are specific for a single GRE; a GRE-AE acting on multiple, functionally distinct GREs has never been observed. This glycyl radical is catalytically essential in all characterized GREs. Most GREs form homodimers, but installation of an average of only one glycyl radical per homodimer is the maximum that has been reported for GREs.132 This potential half-of-sites reactivity suggests that either only one glycyl radical can be formed per homodimer or that current strategies for GRE activation cannot achieve complete activation of these enzymes. In addition, all GREs have a conserved cysteine residue positioned between the glycyl radical and the substrate binding site. An initial reaction of the glycyl radical with this cysteine is thought to generate a thiyl radical intermediate.135-136 This species can react with the substrate to generate a substrate-centered radical, which can react in varying ways to generate a product-centered radical. Consecutive hydrogen atom transfers between this species, the conserved cysteine, and then the conserved glycine leads to product formation and glycyl radical regeneration.

12

Figure 1.3: General mechanism of GREs. All GREs contain a conserved glycine residue that is converted to a glycyl radical by a GRE-AE via hydrogen atom abstraction. The resulting glycyl radical species is essential for catalysis in all GREs, and it abstracts a hydrogen atom from a conserved cysteine, which in turn interacts with the substrate to form a substrate-based radical. This species reacts or rearranges to yield a product-based radical. Hydrogen atom abstractions by the conserved cysteine and glycine regenerate the glycyl radical species.

GREs catalyze many different reactions, including C–C bond formation and cleavage, nucleotide reduction, decarboxylation, dehydration, and deamination (Figure 1.4). Several GREs have been biochemically characterized; some of these GREs are described in this section, while

GREs discovered in or particularly relevant to gut microbes are described in the following section.

Glycerol dehydratase (GD) was first identified in 2003.137 Enzymatic glycerol dehydration was initially thought to be catalyzed solely by B12-dependent enzymes due to the radical chemistry required to mediate this challenging reaction,138 but Clostridium butyricum was

139 found to produce 1,3-propanediol from glycerol in a B12-independent manner, and investigations of this pathway led to the discovery of GD. Biochemical and structural characterization of GD demonstrated its similarity to PFL and other GREs.140 A glycyl radical is

13 formed on GD by GD activating enzyme (GD-AE), and both sequence and structural similarities between GD and PFL highlight how a conserved cysteine in GD engages in hydrogen atom transfers between the glycyl radical and the substrate. The dehydration of the unactivated C2 hydroxyl group is chemically challenging, and computational results have highlighted how the use radical chemistry can make this reaction possible under biological conditions.141-142

Mechanistic details of GRE-mediated dehydration is discussed further in Chapter 3.

Figure 1.4: Examples of GRE-catalyzed reactions and their roles in microbial metabolism.

Benzylsuccinate synthase (BSS) and other GREs catalyzing the addition of substrates containing unactivated carbon atoms to fumarate are important for the anaerobic microbial degradation of aromatic and aliphatic hydrocarbons.143 Anaerobic microbial metabolism of hydrocarbons was first reported in the 1980s, and the enzyme responsible for toluene degradation was first characterized in 1998.144 Biochemical,145 computational,146 and structural147-148 studies with BSS revealed how this enzyme catabolizes toluene. Abstraction of a benzylic hydrogen atom from toluene by the thiyl radical intermediate of the GRE leads to a benzylic radical, which then adds to the double bond in fumarate to form a benzylsuccinyl radical. Back-abstraction of a hydrogen atom from the conserved cysteine yields the product and regenerates the glycyl radical

14 on the enzyme. Unusually, BSS contains multiple subunits and purifies as an (αβγ)2 heterohexamer.147 The large α-subunit contains the conserved glycine and cysteine residues and the substrate binding pocket, the β-subunit binds a [4Fe–4S] cluster and may be involved in glycyl radical formation, and the γ-subunit, although similar in fold to the β-subunit, binds no clusters and its function is unknown.147 Other aryl- and alkylsuccinate synthases have been studied less frequently, but phylogenetic analyses and computational modelling suggest they resemble BSS but have modified substrate binding pockets.132

Ribonucleotide reductase (class III) (RNR) catalyzes the reduction of ribonucleotides to deoxynucleotides.149 Deoxynucleotides are the monomers of DNA and are essential building blocks of life. In addition to the glycyl radical analog, there are two other classes of ribonucleotide reductases. Class I RNRs use metals (Fe and/or Mn) and require oxygen (although a metal-independent Class I RNR was recently discovered150-151), while class II RNRs utilize an adenosylcobalamin cofactor.152 The former are present only in aerobic organisms, while the latter are found in both aerobes and anaerobes. Class III RNR’s oxygen sensitivity limits it to facultative and obligate anaerobes. However, similarities between these enzymes suggest they have share a common evolutionary ancestor, with class III RNR being first enzyme to perform this reaction.153-154 Compared to other GREs, RNR is phylogenetically distinct, and does not have the same level of sequence homology that the other GREs share with each other.132

Mechanistically it is also distinct in that it mediates a reduction and requires an electron-donor

(formate or thioredoxin) for catalysis.155-156 However, in other aspects, RNR resembles other

GREs, utilizing a glycyl radical and conserved cysteines to facilitate hydrogen atom transfers, and existing as a homodimer in solution.

15

1.6: Discovery and roles of glycyl radical enzymes in microbiotas and metagenomes

Although GREs were originally identified in diverse environments and microbes, it was eventually established that they are especially prominent in the human gut microbiota. Genes mapping to COG1882, which represents PFL, were found to be highly enriched in the human gut microbiome relative to other microbial habitats.157 Subsequent work revealed PFL and other

GREs were among the protein superfamilies encoded most frequently by gut microbes but less commonly by microbes from other environments.158 Metaproteomics work established that not only are these genes present, but they are also expressed in this environment; COG1882 was the most abundant protein group in the analyzed human gut metaproteomes.159 However, these studies were limited in that they could not reveal the functional roles of these GREs. Although annotated as PFL, all three of these studies relied on protein family databases (COG or Pfam) and are therefore actually reporting an aggregate abundance of many GREs with different catalytic activities.160-161 The abundance of individual GREs cannot be inferred solely from these metagenomic and metaproteomic experiments, and other methods were required to discover new

GREs and to determine the roles of GREs in the gut microbiota.

The identification of choline trimethylamine-lyase (CutC) was enabled by an understanding of GRE chemistry and the chemical logic of microbial metabolism (Figure 1.5).

Gut microbes have long been known to metabolize choline into trimethylamine (TMA) under anaerobic conditions,162 and many links between TMA, its oxidized derivative trimethylamine-

N-oxide (TMAO), and human disease have been reported.163-164 With this impetus, the Balskus laboratory undertook a search for putative enzymes involved in anaerobic choline metabolism.165

The first step in choline fermentation is the deamination of choline to TMA and acetaldehyde.

The chemical logic of this C–N bond cleavage reaction resembles the first step in ethanolamine

16 metabolism, a C–N bond cleavage carried out by a vitamin B12-dependent enzyme, ethanolamine ammonia-lyase.166 Recognizing this parallel, the Balskus group searched for homologs of the enzymes from this pathway in the genome of a choline-metabolizing organism. They identified a gene cluster encoding homologs of the acetaldehyde-metabolizing enzymes and microcompartment structural proteins from ethanolamine metabolism, as well as a GRE and a

GRE-AE. As GREs were known to catalyze dehydration of 1,2-diols,140 it seemed reasonable that a GRE could catalyze choline deamination. Genetic167 and in vitro biochemical168-169 experiments verified the activity of CutC, enabling further studies exploring the role of this enzyme in the human gut microbiota.170

Figure 1.5: Reaction catalyzed by and genomic context of CutC.

In addition to CutC, another GRE was functionally annotated based on its position in the genome of Rhodopseudomonas palutris BisB18 near genes encoding microcompartment structural proteins.171 Bacterial microcompartments are proteinaceous shells encapsulating other proteins and small molecules, and their purpose is to sequester reactions and compounds from the cytosol. Although functionally diverse, microcompartments are often involved in pathways that generate volatile aldehyde intermediates. Genome surveys performed by Kerfeld et al. indicate that genes encoding microcompartments are frequently associated with GREs, and that many of these gene clusters are uncharacterized.172-173 The hypothesis that the R. palustris GRE might generate a volatile aldehyde product guided functional characterization of this 1,2-

17 propanediol dehydrating GRE.174 Although functionally identical to other GREs performing 1,2- propanediol dehydration,175-176 their encoding gene clusters contain different ancillary genes and the microcompartments have distinct compositions.171

The discovery of two GRE decarboxylases highlights the importance of these enzymes in anaerobic microbial habitats, including the human gut, and how advances in sequencing have enabled enzyme discovery. p-Cresol production from tyrosine was first reported in 1949,177 and in 1976 4-hydroxyphenylacetate was found to be the immediate precursor of p-cresol.178 In 2001,

Selmer and Andrei isolated 4-hydroxyphenylacetate decarboxylase (4HPAD) by activity guided fractionation and discovered that this enzyme is a GRE (Figure 1.6A).179-180 The extreme oxygen sensitivity of the glycyl radical species makes GREs challenging to work with in vitro and complicates standard activity guided purification. 4HPAD was found in Clostridium difficile, an important opportunistic pathogen. Notably, C. difficile tolerates significantly higher concentrations of p-cresol than other microbes (35 mM vs. 1 mM) and may decarboxylate 4- hydroxyphenylacetate to inhibit the growth of other gut microbes.179 p-Cresol is also known to affect host drug metabolism.181 Human enzymes catalyze sulfation of p-cresol to facilitate elimination in the urine, and high levels of p-cresol limit the host’s ability to sulfate drugs and other xenobiotics, including the widely used drug acetaminophen. More recently, p-cresol and its derivatives have been linked to the brain-gut axis, and p-cresol production has been connected to depression and autism spectrum disorders in mice.182-183 This molecule has also been shown to affect gene expression and differentiation of oligodendrocyte progenitors into myelin-forming cells.184

18

Figure 1.6: Discovery of GRE decarboxylases. (A) 4HPAD was isolated by activity-guided purification from an organism known to possess this activity. (B) Activity-guided purification of phenylacetate decarboxylase from an enrichment culture and proteomic profiling of the active fraction led to the identification of this GRE.

The discovery of a similar GRE, phenylacetate decarboxylase, also relied upon activity guided fractionation, but with a modern twist (Figure 1.6B). Zargar et al. performed enrichment culturing of sewage sludge from a wastewater treatment plant in an attempt to identify enzymes that decarboxylate phenylacetate to generate toluene. In their initial study,185 experiments with cell-free extracts suggested that this activity was catalyzed by a GRE. They then compared the

19 proteomes of active fractions from activity guided purification. Despite the most active fraction containing >650 different proteins, limiting their search to solely GREs reduced the possibilities to just three sequences, of which one was a clear target for characterization.186 Subsequent heterologous expression and in vitro biochemical characterization verified that this GRE is indeed responsible for the observed activity. Although phenylacetate decarboxylase is not a gut microbial enzyme, this strategy of combining activity guided fractionation with metaproteomics and metagenomics is applicable to identifying enzymes in this microbial community. Crucially, this approach can benefit from a chemical understanding of the transformation of interest.

1.7: Microbial metabolism of L-fucose

In Chapters 2 and 3 I describe the role of a GRE in L-fucose metabolism by gut microbes, a process known to be important for the development and maintenance of the gut microbiota

187-189 even before the discovery of the involved GRE. The deoxysugar L-fucose is a component of glycans lining the surfaces of colonic epithelial cells, and the ~20% of humans unable to

190-192 transfer L-fucose to glycans have an increased risk of developing Crohn’s disease. This result is linked to the gut microbiota, as gut microbes are known to use host-derived L-fucose as a source of energy.193 The ubiquitous gut microbe Bacteriodes thetaiotaomicron, in particular, is able to upregulate fucosylation of glycans and then use α-L-fucosidases to access the freed sugar.194 The host can also manipulate the composition of the gut microbiota by rapidly fucosylating the intestinal epithelium.195 In order to support the growth of commensal organisms in the gut, during periods of sickness host epithelial cells provide L-fucose to the gut microbiota, and in mouse models this improved host tolerance to pathogens. However, in other cases

196 pathogens metabolize L-fucose to gain a competitive advantage. When gnotobiotic mice monoassociated with B. thetaiotaomicron were orally infected with Salmonella enterica serovar

20

Typhimurium (S. typhimurium), a situation approximating microbiota disruption caused by

196 use, S. typhimurium was able to catabolize L-fucose freed by B. thetaiotaomicron.

However, when the commensal organism was absent, S. typhimurium could not access L-fucose and gained no advantage, illustrating how this pathogen can still utilize L-fucose as a resource despite being unable to cleave it from host glycans. Similar results were reported for the pathogenic enterohaemorrhagic E. coli.197 Other research has established that adherent-invasive

E. coli are enriched in genes involved in L-fucose metabolism and that the pathogenicity and virulence of these strains depends on their ability to metabolize (S)-1,2-propanediol, a compound

198-199 derived from L-fucose. Finally, during periods of colitis and inflammation, S. typhimurium can utilize (S)-1,2-propanediol and tetrathionate to obtain a growth advantage compared to other gut microbes.200-202 Understanding the underlying pathways through which these pathogens metabolize L-fucose is essential for inhibiting their growth.

The microbial metabolic pathway for L-fucose degradation is well-characterized (Figure

1.7).203-204 After cleavage of the deoxysugar from glycans by α-fucosidases,187 the fucose utilization (fuc) pathway is the most common route for catabolizing this compound. Enzymes catalyzing isomerization, phosphorylation, and an aldol cleavage occur, yielding dihydroxyacetone phosphate, which can be redirected to glycolysis for further catalysis, and (S)- lactaldehyde, which is used as an electron acceptor and reduced to (S)-1,2-propanediol. Some organisms cannot metabolize this compound and excrete it. However, encoding the propanediol utilization (pdu) operon can further process (S)-1,2-propanediol.138 The pdu pathway relies on a B12-dependent propanediol dehydratase (B12-PD) to convert (S)-1,2-propanediol into propionaldehyde.205-206 The propionaldehyde produced can either be used as an electron acceptor and reduced to 1-propanol, or it can be oxidized to propionate, yielding ATP in the process.206

21

Microbes in the human gut differ in their abilities to metabolize L-fucose, with some organisms encoding for α-fucosidases, others encoding the fuc pathway, and others encoding the pdu pathway. Because of these differences, understanding how commensal and pathogenic organisms can process this molecule is crucial for leveraging our knowledge of this pathway to develop tool compounds, inhibitors, and antibiotics.

Figure 1.7: Bacterial L-fucose metabolism. The fuc pathway converts L-fucose to dihydroxyacetone phosphate and (S)-1,2-propanediol, and the pdu pathway further processes (S)-1,2-propanediol to 1-propanol or propionate.

1.8: References

(1) Lynch, S. V.; Pedersen, O. The Human Intestinal Microbiome in Health and Disease. N. Engl. J. Med. 2016, 375, 2369-2379.

(2) Sender, R.; Fuchs, S.; Milo, R. Revised Estimates for the Number of Human and Bacteria Cells in the Body. PLoS Biol. 2016, 14, e1002533.

(3) Davenport, E. R.; Sanders, J. G.; Song, S. J.; Amato, K. R.; Clark, A. G.; Knight, R. The human microbiome in evolution. BMC Biol. 2017, 15, 127.

(4) Levine, W. G. Biliary Excretion of Drugs and Other Xenobiotics. Annu. Rev. Pharmacol. Toxicol. 1978, 18, 81-96.

(5) den Besten, G.; van Eunen, K.; Groen, A. K.; Venema, K.; Reijngoud, D.-J.; Bakker, B. M. The role of short-chain fatty acids in the interplay between diet, gut microbiota, and host energy metabolism. J. Lipid Res. 2013, 54, 2325-2340.

22

(6) Louis, P.; Flint, H. J. Formation of propionate and butyrate by the human colonic microbiota. Environ. Microbiol. 2017, 19, 29-41.

(7) Hosseini, E.; Grootaert, C.; Verstraete, W.; Van de Wiele, T. Propionate as a health- promoting microbial metabolite in the human gut. Nutr. Rev. 2011, 69, 245-258.

(8) Reichardt, N.; Duncan, S. H.; Young, P.; Belenguer, A.; McWilliam Leitch, C.; Scott, K. P.; Flint, H. J.; Louis, P. Phylogenetic distribution of three pathways for propionate production within the human gut microbiota. ISME J. 2014, 8, 1323-1335.

(9) Vuong, H. E.; Yano, J. M.; Fung, T. C.; Hsiao, E. Y. The Microbiome and Host Behavior. Annu. Rev. Neurosci. 2017, 40, 21-49.

(10) Yoo, B. B.; Mazmanian, S. K. The Enteric Network: Interactions between the Immune and Nervous Systems of the Gut. Immunity 2017, 46, 910-926.

(11) Fujimura, K. E.; Sitarik, A. R.; Havstad, S.; Lin, D. L.; Levan, S.; Fadrosh, D.; Panzer, A. R.; LaMere, B.; Rackaityte, E.; Lukacs, N. W.; Wegienka, G.; Boushey, H. A.; Ownby, D. R.; Zoratti, E. M.; Levin, A. M.; Johnson, C. C.; Lynch, S. V. Neonatal gut microbiota associates with childhood multisensitized atopy and T cell differentiation. Nat. Med. 2016, 22, 1187.

(12) Arrieta, M.-C.; Stiemsma, L. T.; Dimitriu, P. A.; Thorson, L.; Russell, S.; Yurist- Doutsch, S.; Kuzeljevic, B.; Gold, M. J.; Britton, H. M.; Lefebvre, D. L.; Subbarao, P.; Mandhane, P.; Becker, A.; McNagny, K. M.; Sears, M. R.; Kollmann, T.; Mohn, W. W.; Turvey, S. E.; Brett Finlay, B. Early infancy microbial and metabolic alterations affect risk of childhood asthma. Sci. Transl. Med. 2015, 7, 307ra152.

(13) Turnbaugh, P. J.; Hamady, M.; Yatsunenko, T.; Cantarel, B. L.; Duncan, A.; Ley, R. E.; Sogin, M. L.; Jones, W. J.; Roe, B. A.; Affourtit, J. P.; Egholm, M.; Henrissat, B.; Heath, A. C.; Knight, R.; Gordon, J. I. A core gut microbiome in obese and lean twins. Nature 2008, 457, 480-484.

(14) Le Chatelier, E.; Nielsen, T.; Qin, J.; Prifti, E.; Hildebrand, F.; Falony, G.; Almeida, M.; Arumugam, M.; Batto, J.-M.; Kennedy, S.; Leonard, P.; Li, J.; Burgdorf, K.; Grarup, N.; Jørgensen, T.; Brandslund, I.; Nielsen, H. B.; Juncker, A. S.; Bertalan, M.; Levenez, F.; Pons, N.; Rasmussen, S.; Sunagawa, S.; Tap, J.; Tims, S.; Zoetendal, E. G.; Brunak, S.; Clément, K.; Doré, J.; Kleerebezem, M.; Kristiansen, K.; Renault, P.; Sicheritz-Ponten, T.; de Vos, W. M.; Zucker, J.-D.; Raes, J.; Hansen, T.; Meta, H. I. T. c.; Guedon, E.; Delorme, C.; Layec, S.; Khaci, G.; van de Guchte, M.; Vandemeulebrouck, G.; Jamet, A.; Dervyn, R.; Sanchez, N.; Maguin, E.; Haimet, F.; Winogradski, Y.; Cultrone, A.; Leclerc, M.; Juste, C.; Blottière, H.; Pelletier, E.; LePaslier, D.; Artiguenave, F.; Bruls, T.; Weissenbach, J.; Turner, K.; Parkhill, J.; Antolin, M.; Manichanh, C.; Casellas, F.; Boruel, N.; Varela, E.; Torrejon, A.; Guarner, F.; Denariaz, G.; Derrien, M.; van Hylckama Vlieg, J. E. T.; Veiga, P.; Oozeer, R.; Knol, J.; Rescigno, M.; Brechot, C.; M’Rini, C.; Mérieux, A.; Yamada, T.; Bork, P.; Wang, J.; Ehrlich, S. D.; Pedersen, O. Richness of human gut microbiome correlates with metabolic markers. Nature 2013, 500, 541-546. 23

(15) Qin, J.; Li, Y.; Cai, Z.; Li, S.; Zhu, J.; Zhang, F.; Liang, S.; Zhang, W.; Guan, Y.; Shen, D.; Peng, Y.; Zhang, D.; Jie, Z.; Wu, W.; Qin, Y.; Xue, W.; Li, J.; Han, L.; Lu, D.; Wu, P.; Dai, Y.; Sun, X.; Li, Z.; Tang, A.; Zhong, S.; Li, X.; Chen, W.; Xu, R.; Wang, M.; Feng, Q.; Gong, M.; Yu, J.; Zhang, Y.; Zhang, M.; Hansen, T.; Sanchez, G.; Raes, J.; Falony, G.; Okuda, S.; Almeida, M.; LeChatelier, E.; Renault, P.; Pons, N.; Batto, J.-M.; Zhang, Z.; Chen, H.; Yang, R.; Zheng, W.; Li, S.; Yang, H.; Wang, J.; Ehrlich, S. D.; Nielsen, R.; Pedersen, O.; Kristiansen, K.; Wang, J. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 2012, 490, 55-60.

(16) Karlsson, F. H.; Tremaroli, V.; Nookaew, I.; Bergström, G.; Behre, C. J.; Fagerberg, B.; Nielsen, J.; Bäckhed, F. Gut metagenome in European women with normal, impaired and diabetic control. Nature 2013, 498, 99-103.

(17) Qin, N.; Yang, F.; Li, A.; Prifti, E.; Chen, Y.; Shao, L.; Guo, J.; Le Chatelier, E.; Yao, J.; Wu, L.; Zhou, J.; Ni, S.; Liu, L.; Pons, N.; Batto, J. M.; Kennedy, S. P.; Leonard, P.; Yuan, C.; Ding, W.; Chen, Y.; Hu, X.; Zheng, B.; Qian, G.; Xu, W.; Ehrlich, S. D.; Zheng, S.; Li, L. Alterations of the human gut microbiome in liver cirrhosis. Nature 2014, 513, 59-64.

(18) Wang, Z.; Klipfell, E.; Bennett, B. J.; Koeth, R.; Levison, B. S.; DuGar, B.; Feldstein, A. E.; Britt, E. B.; Fu, X.; Chung, Y.-M.; Wu, Y.; Schauer, P.; Smith, J. D.; Allayee, H.; Tang, W. H. W.; DiDonato, J. A.; Lusis, A. J.; Hazen, S. L. Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature 2011, 472, 57.

(19) Brown, J. M.; Hazen, S. L. Microbial modulation of cardiovascular disease. Nat. Rev. Microbiol. 2018, 16, 171-181.

(20) Miyoshi, J.; Chang, E. B. The gut microbiota and inflammatory bowel diseases. Transl. Res. 2017, 179, 38-48.

(21) Leffler, D. A.; Lamont, J. T. Clostridium difficile Infection. N. Engl. J. Med. 2015, 372, 1539-1548.

(22) Kouhsari, E.; Abbasian, S.; Sedighi, M.; Yaseri, H. F.; Nazari, S.; Bialvaei, A. Z.; Dahim, P.; Mirzaei, E. Z.; Rahbar, M. Clostridium difficile infection: a review. Rev. Med. Microbiol. 2018, 29, 103-109.

(23) Chow, J.; Tang, H.; Mazmanian, S. K. Pathobionts of the gastrointestinal microbiota and inflammatory disease. Curr. Opin. Immunol. 2011, 23, 473-480.

(24) Kassam, Z.; Lee, C. H.; Yuan, Y.; Hunt, R. H. Fecal Microbiota Transplantation for Clostridium difficile Infection: Systematic Review and Meta-Analysis. Am. J. Gastroenterol. 2013, 108, 500-508.

(25) Buffie, C. G.; Pamer, E. G. Microbiota-mediated colonization resistance against intestinal pathogens. Nat. Rev. Immunol. 2013, 13, 790-801.

24

(26) Kamada, N.; Chen, G. Y.; Inohara, N.; Núñez, G. Control of pathogens and pathobionts by the gut microbiota. Nat. Immunol. 2013, 14, 685-690.

(27) Koppel, N.; Maini Rekdal, V.; Balskus, E. P. Chemical transformation of xenobiotics by the human gut microbiota. Science 2017, 356, eaag2770.

(28) Chittim, C. L.; Irwin, S. M.; Balskus, E. P. Deciphering Human Gut Microbiota–Nutrient Interactions: A Role for Biochemistry. Biochemistry 2018, 57, 2567-2577.

(29) Pellock, S. J.; Redinbo, M. R. Glucuronides in the gut: Sugar-driven symbioses between microbe and host. J. Biol. Chem. 2017, 292, 8569-8576.

(30) Wiseman, L. R.; Markham, A. Irinotecan. Drugs 1996, 52, 606-623.

(31) Pommier, Y. Topoisomerase I inhibitors: camptothecins and beyond. Nat. Rev. Cancer 2006, 6, 789-802.

(32) Mathijssen, R. H. J.; van Alphen, R. J.; Verweij, J.; Loos, W. J.; Nooter, K.; Stoter, G.; Sparreboom, A. Clinical Pharmacokinetics and Metabolism of Irinotecan (CPT-11). Clin. Cancer Res. 2001, 7, 2182-2194.

(33) Wallace, B. D.; Wang, H.; Lane, K. T.; Scott, J. E.; Orans, J.; Koo, J. S.; Venkatesh, M.; Jobin, C.; Yeh, L.-A.; Mani, S.; Redinbo, M. R. Alleviating Cancer Drug Toxicity by Inhibiting a Bacterial Enzyme. Science 2010, 330, 831-835.

(34) Wallace, Bret D.; Roberts, Adam B.; Pollet, Rebecca M.; Ingle, James D.; Biernat, Kristen A.; Pellock, Samuel J.; Venkatesh, Madhu K.; Guthrie, L.; O’Neal, Sara K.; Robinson, Sara J.; Dollinger, M.; Figueroa, E.; McShane, Sarah R.; Cohen, Rachel D.; Jin, J.; Frye, Stephen V.; Zamboni, William C.; Pepe-Ranney, C.; Mani, S.; Kelly, L.; Redinbo, Matthew R. Structure and Inhibition of Microbiome β-Glucuronidases Essential to the Alleviation of Cancer Drug Toxicity. Chem. Biol. 2015, 22, 1238-1249.

(35) Wang, L.-Z.; Ramírez, J.; Yeo, W.; Chan, M.-Y. M.; Thuya, W.-L.; Lau, J.-Y. A.; Wan, S.-C.; Wong, A. L.-A.; Zee, Y.-K.; Lim, R.; Lee, S.-C.; Ho, P. C.; Lee, H.-S.; Chan, A.; Ansher, S.; Ratain, M. J.; Goh, B.-C. Glucuronidation by UGT1A1 Is the Dominant Pathway of the Metabolic Disposition of Belinostat in Liver Cancer Patients. PLoS One 2013, 8, e54522.

(36) Anne, M.; Sammartino, D.; Barginear, M. F.; Budman, D. Profile of panobinostat and its potential for treatment in solid tumors: an update. OncoTargets Ther. 2013, 6, 1613- 1624.

(37) Castellino, S.; O'Mara, M.; Koch, K.; Borts, D. J.; Bowers, G. D.; MacLauchlin, C. Human Metabolism of Lapatinib, a Dual Kinase Inhibitor: Implications for Hepatotoxicity. Drug Metab. Dispos. 2012, 40, 139-150.

25

(38) Haiser, H. J.; Gootenberg, D. B.; Chatman, K.; Sirasani, G.; Balskus, E. P.; Turnbaugh, P. J. Predicting and Manipulating Cardiac Drug Inactivation by the Human Gut Bacterium Eggerthella lenta. Science 2013, 341, 295-298.

(39) Koppel, N.; Bisanz, J. E.; Pandelia, M.-E.; Turnbaugh, P. J.; Balskus, E. P. Discovery and characterization of a prevalent human gut bacterial enzyme sufficient for the inactivation of a family of plant toxins. eLife 2018, 7, e33953.

(40) Nougayrède, J.-P.; Homburg, S.; Taieb, F.; Boury, M.; Brzuszkiewicz, E.; Gottschalk, G.; Buchrieser, C.; Hacker, J.; Dobrindt, U.; Oswald, E. Escherichia coli Induces DNA Double-Strand Breaks in Eukaryotic Cells. Science 2006, 313, 848-851.

(41) Arthur, J. C.; Perez-Chanona, E.; Mühlbauer, M.; Tomkovich, S.; Uronis, J. M.; Fan, T.- J.; Campbell, B. J.; Abujamel, T.; Dogan, B.; Rogers, A. B.; Rhodes, J. M.; Stintzi, A.; Simpson, K. W.; Hansen, J. J.; Keku, T. O.; Fodor, A. A.; Jobin, C. Intestinal Inflammation Targets Cancer-Inducing Activity of the Microbiota. Science 2012, 338, 120-123.

(42) Zha, L.; Jiang, Y.; Henke, M. T.; Wilson, M. R.; Wang, J. X.; Kelleher, N. L.; Balskus, E. P. Colibactin assembly line enzymes use S-adenosylmethionine to build a cyclopropane ring. Nat. Chem. Biol. 2017, 13, 1063.

(43) Zha, L.; Wilson, M. R.; Brotherton, C. A.; Balskus, E. P. Characterization of Polyketide Synthase Machinery from the pks Island Facilitates Isolation of a Candidate Precolibactin. ACS Chem. Biol. 2016, 11, 1287-1295.

(44) Wilson, M. R.; Zha, L.; Balskus, E. P. Natural product discovery from the human microbiome. J. Biol. Chem. 2017, 292, 8546-8552.

(45) Lagier, J.-C.; Hugon, P.; Khelaifia, S.; Fournier, P.-E.; La Scola, B.; Raoult, D. The Rebirth of Culture in Microbiology through the Example of Culturomics To Study Human Gut Microbiota. Clin. Microbiol. Rev. 2015, 28, 237-264.

(46) Lagier, J.-C.; Million, M.; Hugon, P.; Armougom, F.; Raoult, D. Human Gut Microbiota: Repertoire and Variations. Front. Cell. Infect. Microbiol. 2012, 2, 136.

(47) Stewart, E. J. Growing Unculturable Bacteria. J. Bacteriol. 2012, 194, 4151-4160.

(48) Vartoukian, S. R.; Palmer, R. M.; Wade, W. G. Strategies for culture of ‘unculturable’ bacteria. FEMS Microbiol. Lett. 2010, 309, 1-7.

(49) Eckburg, P. B.; Bik, E. M.; Bernstein, C. N.; Purdom, E.; Dethlefsen, L.; Sargent, M.; Gill, S. R.; Nelson, K. E.; Relman, D. A. Diversity of the Human Intestinal Microbial Flora. Science 2005, 308, 1635-1638.

(50) Smillie, C. S.; Smith, M. B.; Friedman, J.; Cordero, O. X.; David, L. A.; Alm, E. J. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 2011, 480, 241-244.

26

(51) Truong, D. T.; Tett, A.; Pasolli, E.; Huttenhower, C.; Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 2017, 27, 626-638.

(52) Spanogiannopoulos, P.; Bess, E. N.; Carmody, R. N.; Turnbaugh, P. J. The microbial pharmacists within us: a metagenomic view of xenobiotic metabolism. Nat. Rev. Microbiol. 2016, 14, 273-287.

(53) Shendure, J.; Balasubramanian, S.; Church, G. M.; Gilbert, W.; Rogers, J.; Schloss, J. A.; Waterston, R. H. DNA sequencing at 40: past, present and future. Nature 2017, 550, 345- 353.

(54) Quince, C.; Walker, A. W.; Simpson, J. T.; Loman, N. J.; Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 2017, 35, 833-844.

(55) Goodwin, S.; McPherson, J. D.; McCombie, W. R. Coming of age: ten years of next- generation sequencing technologies. Nat. Rev. Genet. 2016, 17, 333.

(56) Riesenfeld, C. S.; Schloss, P. D.; Handelsman, J. Metagenomics: Genomic Analysis of Microbial Communities. Annu. Rev. Genet. 2004, 38, 525-552.

(57) Cénit, M. C.; Matzaraki, V.; Tigchelaar, E. F.; Zhernakova, A. Rapidly expanding knowledge on the role of the gut microbiome in health and disease. Biochim. Biophys. Acta, Mol. Basis Dis. 2014, 1842, 1981-1992.

(58) Jovel, J.; Patterson, J.; Wang, W.; Hotte, N.; O'Keefe, S.; Mitchel, T.; Perry, T.; Kao, D.; Mason, A. L.; Madsen, K. L.; Wong, G. K.-S. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics. Front. Microbiol. 2016, 7, 459.

(59) Langille, M. G. I.; Zaneveld, J.; Caporaso, J. G.; McDonald, D.; Knights, D.; Reyes, J. A.; Clemente, J. C.; Burkepile, D. E.; Vega Thurber, R. L.; Knight, R.; Beiko, R. G.; Huttenhower, C. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 2013, 31, 814-821.

(60) Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K. S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; Mende, D. R.; Li, J.; Xu, J.; Li, S.; Li, D.; Cao, J.; Wang, B.; Liang, H.; Zheng, H.; Xie, Y.; Tap, J.; Lepage, P.; Bertalan, M.; Batto, J.-M.; Hansen, T.; Le Paslier, D.; Linneberg, A.; Nielsen, H. B.; Pelletier, E.; Renault, P.; Sicheritz-Ponten, T.; Turner, K.; Zhu, H.; Yu, C.; Li, S.; Jian, M.; Zhou, Y.; Li, Y.; Zhang, X.; Li, S.; Qin, N.; Yang, H.; Wang, J.; Brunak, S.; Doré, J.; Guarner, F.; Kristiansen, K.; Pedersen, O.; Parkhill, J.; Weissenbach, J.; MetaHIT Consortium; Antolin, M.; Artiguenave, F.; Blottiere, H.; Borruel, N.; Bruls, T.; Casellas, F.; Chervaux, C.; Cultrone, A.; Delorme, C.; Denariaz, G.; Dervyn, R.; Forte, M.; Friss, C.; van de Guchte, M.; Guedon, E.; Haimet, F.; Jamet, A.; Juste, C.; Kaci, G.; Kleerebezem, M.; Knol, J.; Kristensen, M.; Layec, S.; Le Roux, K.; Leclerc, M.; Maguin, E.; Melo Minardi, R.; Oozeer, R.; Rescigno, M.; Sanchez, N.; Tims, S.; Torrejon, T.; Varela, E.; de Vos, W.; Winogradsky, Y.; Zoetendal, E.; Bork, P.; Ehrlich, S. D.; Wang, J. A human gut

27

microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59- 65.

(61) The Human Microbiome Project Consortium; Huttenhower, C. Structure, function and diversity of the healthy human microbiome. Nature 2012, 486, 207-214.

(62) Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G.; Kristiansen, K.; Li, S.; Yang, H.; Wang, J.; Wang, J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20, 265-272.

(63) Wang, J.; Jia, H. Metagenome-wide association studies: fine-mining the microbiome. Nat. Rev. Microbiol. 2016, 14, 508-522.

(64) Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G.; Smith, H. O.; Yandell, M.; Evans, C. A.; Holt, R. A.; Gocayne, J. D.; Amanatides, P.; Ballew, R. M.; Huson, D. H.; Wortman, J. R.; Zhang, Q.; Kodira, C. D.; Zheng, X. H.; Chen, L.; Skupski, M.; Subramanian, G.; Thomas, P. D.; Zhang, J.; Gabor Miklos, G. L.; Nelson, C.; Broder, S.; Clark, A. G.; Nadeau, J.; McKusick, V. A.; Zinder, N.; Levine, A. J.; Roberts, R. J.; Simon, M.; Slayman, C.; Hunkapiller, M.; Bolanos, R.; Delcher, A.; Dew, I.; Fasulo, D.; Flanigan, M.; Florea, L.; Halpern, A.; Hannenhalli, S.; Kravitz, S.; Levy, S.; Mobarry, C.; Reinert, K.; Remington, K.; Abu-Threideh, J.; Beasley, E.; Biddick, K.; Bonazzi, V.; Brandon, R.; Cargill, M.; Chandramouliswaran, I.; Charlab, R.; Chaturvedi, K.; Deng, Z.; Francesco, V. D.; Dunn, P.; Eilbeck, K.; Evangelista, C.; Gabrielian, A. E.; Gan, W.; Ge, W.; Gong, F.; Gu, Z.; Guan, P.; Heiman, T. J.; Higgins, M. E.; Ji, R.-R.; Ke, Z.; Ketchum, K. A.; Lai, Z.; Lei, Y.; Li, Z.; Li, J.; Liang, Y.; Lin, X.; Lu, F.; Merkulov, G. V.; Milshina, N.; Moore, H. M.; Naik, A. K.; Narayan, V. A.; Neelam, B.; Nusskern, D.; Rusch, D. B.; Salzberg, S.; Shao, W.; Shue, B.; Sun, J.; Wang, Z. Y.; Wang, A.; Wang, X.; Wang, J.; Wei, M.-H.; Wides, R.; Xiao, C.; Yan, C.; Yao, A.; Ye, J.; Zhan, M.; Zhang, W.; Zhang, H.; Zhao, Q.; Zheng, L.; Zhong, F.; Zhong, W.; Zhu, S. C.; Zhao, S.; Gilbert, D.; Baumhueter, S.; Spier, G.; Carter, C.; Cravchik, A.; Woodage, T.; Ali, F.; An, H.; Awe, A.; Baldwin, D.; Baden, H.; Barnstead, M.; Barrow, I.; Beeson, K.; Busam, D.; Carver, A.; Center, A.; Cheng, M. L.; Curry, L.; Danaher, S.; Davenport, L.; Desilets, R.; Dietz, S.; Dodson, K.; Doup, L.; Ferriera, S.; Garg, N.; Gluecksmann, A.; Hart, B.; Haynes, J.; Haynes, C.; Heiner, C.; Hladun, S.; Hostin, D.; Houck, J.; Howland, T.; Ibegwam, C.; Johnson, J.; Kalush, F.; Kline, L.; Koduru, S.; Love, A.; Mann, F.; May, D.; McCawley, S.; McIntosh, T.; McMullen, I.; Moy, M.; Moy, L.; Murphy, B.; Nelson, K.; Pfannkoch, C.; Pratts, E.; Puri, V.; Qureshi, H.; Reardon, M.; Rodriguez, R.; Rogers, Y.-H.; Romblad, D.; Ruhfel, B.; Scott, R.; Sitter, C.; Smallwood, M.; Stewart, E.; Strong, R.; Suh, E.; Thomas, R.; Tint, N. N.; Tse, S.; Vech, C.; Wang, G.; Wetter, J.; Williams, S.; Williams, M.; Windsor, S.; Winn-Deen, E.; Wolfe, K.; Zaveri, J.; Zaveri, K.; Abril, J. F.; Guigó, R.; Campbell, M. J.; Sjolander, K. V.; Karlak, B.; Kejariwal, A.; Mi, H.; Lazareva, B.; Hatton, T.; Narechania, A.; Diemer, K.; Muruganujan, A.; Guo, N.; Sato, S.; Bafna, V.; Istrail, S.; Lippert, R.; Schwartz, R.; Walenz, B.; Yooseph, S.; Allen, D.; Basu, A.; Baxendale, J.; Blick, L.; Caminha, M.; Carnes-Stine, J.; Caulk, P.; Chiang, Y.-H.; Coyne, M.; Dahlke, C.; Mays, A. D.; Dombroski, M.; Donnelly, M.; Ely, D.; Esparham, S.; Fosler, C.; Gire, H.; Glanowski, S.; Glasser, K.; Glodek, A.; Gorokhov, M.; Graham, K.; Gropman, B.; Harris, M.; Heil, J.;

28

Henderson, S.; Hoover, J.; Jennings, D.; Jordan, C.; Jordan, J.; Kasha, J.; Kagan, L.; Kraft, C.; Levitsky, A.; Lewis, M.; Liu, X.; Lopez, J.; Ma, D.; Majoros, W.; McDaniel, J.; Murphy, S.; Newman, M.; Nguyen, T.; Nguyen, N.; Nodell, M.; Pan, S.; Peck, J.; Peterson, M.; Rowe, W.; Sanders, R.; Scott, J.; Simpson, M.; Smith, T.; Sprague, A.; Stockwell, T.; Turner, R.; Venter, E.; Wang, M.; Wen, M.; Wu, D.; Wu, M.; Xia, A.; Zandieh, A.; Zhu, X. The Sequence of the Human Genome. Science 2001, 291, 1304- 1351.

(65) Thursby, E.; Juge, N. Introduction to the human gut microbiota. Biochem. J. 2017, 474, 1823-1836.

(66) Porter, N. T.; Martens, E. C. The Critical Roles of Polysaccharides in Gut Microbial Ecology and Physiology. Annu. Rev. Microbiol. 2017, 71, 349-369.

(67) Hehemann, J.-H.; Correc, G.; Barbeyron, T.; Helbert, W.; Czjzek, M.; Michel, G. Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota. Nature 2010, 464, 908-912.

(68) Devlin, A. S.; Marcobal, A.; Dodd, D.; Nayfach, S.; Plummer, N.; Meyer, T.; Pollard, K. S.; Sonnenburg, J. L.; Fischbach, M. A. Modulation of a Circulating Uremic Solute via Rational Genetic Manipulation of the Gut Microbiota. Cell Host Microbe 2016, 20, 709- 715.

(69) Hsiao, Elaine Y.; McBride, Sara W.; Hsien, S.; Sharon, G.; Hyde, Embriette R.; McCue, T.; Codelli, Julian A.; Chow, J.; Reisman, Sarah E.; Petrosino, Joseph F.; Patterson, Paul H.; Mazmanian, Sarkis K. Microbiota Modulate Behavioral and Physiological Abnormalities Associated with Neurodevelopmental Disorders. Cell 2013, 155, 1451- 1463.

(70) Dodd, D.; Spitzer, M. H.; Van Treuren, W.; Merrill, B. D.; Hryckowian, A. J.; Higginbottom, S. K.; Le, A.; Cowan, T. M.; Nolan, G. P.; Fischbach, M. A.; Sonnenburg, J. L. A gut bacterial pathway metabolizes aromatic amino acids into nine circulating metabolites. Nature 2017, 551, 648-652.

(71) Albenberg, L.; Esipova, T. V.; Judge, C. P.; Bittinger, K.; Chen, J.; Laughlin, A.; Grunberg, S.; Baldassano, R. N.; Lewis, J. D.; Li, H.; Thom, S. R.; Bushman, F. D.; Vinogradov, S. A.; Wu, G. D. Correlation Between Intraluminal Oxygen Gradient and Radial Partitioning of Intestinal Microbiota. Gastroenterology 2014, 147, 1055-1063.

(72) Donaldson, G. P.; Lee, S. M.; Mazmanian, S. K. Gut biogeography of the bacterial microbiota. Nat. Rev. Microbiol. 2016, 14, 20-32.

(73) Friedman, E. S.; Bittinger, K.; Esipova, T. V.; Hou, L.; Chau, L.; Jiang, J.; Mesaros, C.; Lund, P. J.; Liang, X.; FitzGerald, G. A.; Goulian, M.; Lee, D.; Garcia, B. A.; Blair, I. A.; Vinogradov, S. A.; Wu, G. D. Microbes vs. chemistry in the origin of the anaerobic gut lumen. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, 4170-4175.

29

(74) Buckel, W.; Golding, B. T. Radical Enzymes in Anaerobes. Annu. Rev. Microbiol. 2006, 60, 27-49.

(75) Rajakovich, L. J.; Balskus, E. P. Metabolic functions of the human gut microbiota: the role of metalloenzymes. Nat. Prod. Rep. 2019.

(76) Hickey, Christina A.; Kuhn, Kristine A.; Donermeyer, David L.; Porter, Nathan T.; Jin, C.; Cameron, Elizabeth A.; Jung, H.; Kaiko, Gerard E.; Wegorzewska, M.; Malvin, Nicole P.; Glowacki, Robert W. P.; Hansson, Gunnar C.; Allen, Paul M.; Martens, Eric C.; Stappenbeck, Thaddeus S. Colitogenic Bacteroides thetaiotaomicron Antigens Access Host Immune Cells in a Sulfatase-Dependent Manner via Outer Membrane Vesicles. Cell Host Microbe 2015, 17, 672-680.

(77) Tsai, H. H.; Dwarakanath, A. D.; Hart, C. A.; Milton, J. D.; Rhodes, J. M. Increased faecal mucin sulphatase activity in ulcerative colitis: a potential target for treatment. Gut 1995, 36, 570-576.

(78) Arnison, P. G.; Bibb, M. J.; Bierbaum, G.; Bowers, A. A.; Bugni, T. S.; Bulaj, G.; Camarero, J. A.; Campopiano, D. J.; Challis, G. L.; Clardy, J.; Cotter, P. D.; Craik, D. J.; Dawson, M.; Dittmann, E.; Donadio, S.; Dorrestein, P. C.; Entian, K.-D.; Fischbach, M. A.; Garavelli, J. S.; Göransson, U.; Gruber, C. W.; Haft, D. H.; Hemscheidt, T. K.; Hertweck, C.; Hill, C.; Horswill, A. R.; Jaspars, M.; Kelly, W. L.; Klinman, J. P.; Kuipers, O. P.; Link, A. J.; Liu, W.; Marahiel, M. A.; Mitchell, D. A.; Moll, G. N.; Moore, B. S.; Müller, R.; Nair, S. K.; Nes, I. F.; Norris, G. E.; Olivera, B. M.; Onaka, H.; Patchett, M. L.; Piel, J.; Reaney, M. J. T.; Rebuffat, S.; Ross, R. P.; Sahl, H.-G.; Schmidt, E. W.; Selsted, M. E.; Severinov, K.; Shen, B.; Sivonen, K.; Smith, L.; Stein, T.; Süssmuth, R. D.; Tagg, J. R.; Tang, G.-L.; Truman, A. W.; Vederas, J. C.; Walsh, C. T.; Walton, J. D.; Wenzel, S. C.; Willey, J. M.; van der Donk, W. A. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 2013, 30, 108-160.

(79) Shelburne, C. E.; An, F. Y.; Dholpe, V.; Ramamoorthy, A.; Lopatin, D. E.; Lantz, M. S. The spectrum of antimicrobial activity of the bacteriocin subtilosin A. J. Antimicrob. Chemother. 2007, 59, 297-300.

(80) Sutyak Noll, K.; Sinko, P. J.; Chikindas, M. L. Elucidation of the Molecular Mechanisms of Action of the Natural Antimicrobial Peptide Subtilosin Against the Bacterial Vaginosis-associated Pathogen Gardnerella vaginalis. Probiotics Antimicrob. Proteins 2011, 3, 41-47.

(81) Rea, M. C.; Sit, C. S.; Clayton, E.; O'Connor, P. M.; Whittal, R. M.; Zheng, J.; Vederas, J. C.; Ross, R. P.; Hill, C. Thuricin CD, a posttranslationally modified bacteriocin with a narrow spectrum of activity against Clostridium difficile. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 9352-9357.

(82) Rea, M. C.; Dobson, A.; O'Sullivan, O.; Crispie, F.; Fouhy, F.; Cotter, P. D.; Shanahan, F.; Kiely, B.; Hill, C.; Ross, R. P. Effect of broad- and narrow-spectrum antimicrobials

30

on Clostridium difficile and microbial diversity in a model of the distal colon. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 4639-4644.

(83) Lanz, N. D.; Booker, S. J. Auxiliary iron–sulfur cofactors in radical SAM enzymes. Biochim. Biophys. Acta, Mol. Cell Res. 2015, 1853, 1316-1334.

(84) Sannino, D. R.; Dobson, A. J.; Edwards, K.; Angert, E. R.; Buchon, N. The Drosophila melanogaster Gut Microbiota Provisions Thiamine to Its Host. mBio 2018, 9, e00155-18.

(85) Hayashi, A.; Mikami, Y.; Miyamoto, K.; Kamada, N.; Sato, T.; Mizuno, S.; Naganuma, M.; Teratani, T.; Aoki, R.; Fukuda, S.; Suda, W.; Hattori, M.; Amagai, M.; Ohyama, M.; Kanai, T. Intestinal Dysbiosis and Biotin Deprivation Induce Alopecia through Overgrowth of Lactobacillus murinus in Mice. Cell Rep. 2017, 20, 1513-1524.

(86) Knight, R.; Vrbanac, A.; Taylor, B. C.; Aksenov, A.; Callewaert, C.; Debelius, J.; Gonzalez, A.; Kosciolek, T.; McCall, L.-I.; McDonald, D.; Melnik, A. V.; Morton, J. T.; Navas, J.; Quinn, R. A.; Sanders, J. G.; Swafford, A. D.; Thompson, L. R.; Tripathi, A.; Xu, Z. Z.; Zaneveld, J. R.; Zhu, Q.; Caporaso, J. G.; Dorrestein, P. C. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 2018, 16, 410-422.

(87) Noguchi, H.; Park, J.; Takagi, T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 2006, 34, 5623-5630.

(88) Caporaso, J. G.; Kuczynski, J.; Stombaugh, J.; Bittinger, K.; Bushman, F. D.; Costello, E. K.; Fierer, N.; Peña, A. G.; Goodrich, J. K.; Gordon, J. I.; Huttley, G. A.; Kelley, S. T.; Knights, D.; Koenig, J. E.; Ley, R. E.; Lozupone, C. A.; McDonald, D.; Muegge, B. D.; Pirrung, M.; Reeder, J.; Sevinsky, J. R.; Turnbaugh, P. J.; Walters, W. A.; Widmann, J.; Yatsunenko, T.; Zaneveld, J.; Knight, R. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 2010, 7, 335-336.

(89) Segata, N.; Waldron, L.; Ballarini, A.; Narasimhan, V.; Jousson, O.; Huttenhower, C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 2012, 9, 811-814.

(90) Kanehisa, M.; Goto, S.; Furumichi, M.; Tanabe, M.; Hirakawa, M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38, D355-D360.

(91) Kanehisa, M.; Sato, Y.; Furumichi, M.; Morishima, K.; Tanabe, M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 2018, gky962-gky962.

(92) Tatusov, R. L.; Galperin, M. Y.; Natale, D. A.; Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28, 33-36.

(93) Huerta-Cepas, J.; Szklarczyk, D.; Forslund, K.; Cook, H.; Heller, D.; Walter, M. C.; Rattei, T.; Mende, D. R.; Sunagawa, S.; Kuhn, M.; Jensen, L. J.; von Mering, C.; Bork, P. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations

31

for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016, 44, D286- D293.

(94) Joice, R.; Yasuda, K.; Shafquat, A.; Morgan, Xochitl C.; Huttenhower, C. Determining Microbial Products and Identifying Molecular Targets in the Human Microbiome. Cell Metab. 2014, 20, 731-741.

(95) Gerlt, J. A.; Babbitt, P. C. Divergent Evolution of Enzymatic Function: Mechanistically Diverse Superfamilies and Functionally Distinct Suprafamilies. Annu. Rev. Biochem. 2001, 70, 209-246.

(96) Franzosa, E. A.; Hsu, T.; Sirota-Madi, A.; Shafquat, A.; Abu-Ali, G.; Morgan, X. C.; Huttenhower, C. Sequencing and beyond: integrating molecular 'omics' for microbial community profiling. Nat. Rev. Microbiol. 2015, 13, 360-372.

(97) Schnoes, A. M.; Brown, S. D.; Dodevski, I.; Babbitt, P. C. Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. PLoS Comput. Biol. 2009, 5, e1000605.

(98) Kalnitsky, G.; Werkman, C. H. The Anaerobic Dissimilartion of Pyruvate by a Cell-free Extract of Escherichia coli. Arch. Biochem. 1943, 2, 113-124.

(99) Knappe, J.; Bohnert, E.; Brümmer, W. S-adenosyl-L-methionine, a component of the clastic dissimilation of pyruvate in Escherichia coli. Biochim. Biophys. Acta, Gen. Subj. 1965, 107, 603-605.

(100) Knappe, J.; Schacht, J.; Möckel, W.; Höpner, T.; Vetter, H.; Edenharder, R. Pyruvate Formate-Lyase Reaction in Escherichia coli. Eur. J. Biochem. 1969, 11, 316-327.

(101) Knappe, J.; Blaschkowski, H. P.; Gröbner, P.; Schmitt, T. Pyruvate Formate-Lyase of Escherichia coli: the Acetyl-Enzyme Intermediate. Eur. J. Biochem. 1974, 50, 253-263.

(102) Knappe, J.; Schmitt, T. A novel reaction of S-adenosyl-L-methionine correlated with the activation of pyruvate formate-lyase. Biochem. Biophys. Res. Commun. 1976, 71, 1110- 1117.

(103) Blaschkowski, H. P.; Knappe, J.; Ludwig-Festl, M.; Neuer, G. Routes of Flavodoxin and Ferredoxin Reduction in Escherichia coli. Eur. J. Biochem. 1982, 123, 563-569.

(104) Conradt, H.; Hohmann-Berger, M.; Hohmann, H.-P.; Blaschkowski, H. P.; Knappe, J. Pyruvate formate-lyase (inactive form) and pyruvate formate-lyase activating enzyme of Escherichia coli: Isolation and structural properties. Arch. Biochem. Biophys. 1984, 228, 133-142.

(105) Knappe, J.; Neugebauer, F. A.; Blaschkowski, H. P.; Gänzler, M. Post-translational activation introduces a free radical into pyruvate formate-lyase. Proc. Natl. Acad. Sci. U. S. A. 1984, 81, 1332-1335.

32

(106) Knappe, J.; Sawers, G. A radical-chemical route to acetyl-CoA: the anaerobically induced pyruvate formate-lyase system of Escherichia coli. FEMS Microbiol. Rev. 1990, 6, 383-398.

(107) Knappe, J.; Volker Wagner, A. F., Stable glycyl radical from pyruvate formate-lyase and ribonucleotide reductase (III). In Adv. Protein Chem., Academic Press: 2001; Vol. 58, pp 277-315.

(108) Rödel, W.; Plaga, W.; Frank, R.; Knappe, J. Primary structures of Escherichia coli pyruvate formate-lyase and pyruvate-formate-lyase-activating enzyme deduced from the DNA nucleotide sequences. Eur. J. Biochem. 1988, 177, 153-158.

(109) Plaga, W.; Frank, R.; Knappe, J. Catalytic-site mapping of pyruvate formate lyase. Eur. J. Biochem. 1988, 178, 445-450.

(110) Unkrig, V.; Neugebauer, F. A.; Knappe, J. The free radical of pyruvate formate-lyase. Eur. J. Biochem. 1989, 184, 723-728.

(111) Wagner, A. F.; Frey, M.; Neugebauer, F. A.; Schäfer, W.; Knappe, J. The free radical in pyruvate formate-lyase is located on glycine-734. Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 996-1000.

(112) Frey, M.; Rothe, M.; Wagner, A. F.; Knappe, J. Adenosylmethionine-dependent synthesis of the glycyl radical in pyruvate formate-lyase by abstraction of the glycine C-2 pro-S hydrogen atom. Studies of [2H]glycine-substituted enzyme and peptides homologous to the glycine 734 site. J. Biol. Chem. 1994, 269, 12432-7.

(113) Wagner, A. F. V.; Demand, J.; Schilling, G.; Pils, T.; Knappe, J. A Dehydroalanyl Residue Can Capture the 5′-Deoxyadenosyl Radical Generated from S- Adenosylmethionine by Pyruvate Formate-Lyase-Activating Enzyme. Biochem. Biophys. Res. Commun. 1999, 254, 306-310.

(114) Becker, A.; Fritz-Wolf, K.; Kabsch, W.; Knappe, J.; Schultz, S.; Volker Wagner, A. F. Structure and mechanism of the glycyl radical enzyme pyruvate formate-lyase. Nat. Struct. Biol. 1999, 6, 969.

(115) Becker, A.; Kabsch, W. X-ray Structure of Pyruvate Formate-Lyase in Complex with Pyruvate and CoA: How the Enzyme Uses the Cys-418 Thiyl Radical for Pyruvate Cleavage. J. Biol. Chem. 2002, 277, 40036-40042.

(116) Vey, J. L.; Yang, J.; Li, M.; Broderick, W. E.; Broderick, J. B.; Drennan, C. L. Structural basis for glycyl radical formation by pyruvate formate-lyase activating enzyme. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 16137-16141.

(117) Horitani, M.; Shisler, K.; Broderick, W. E.; Hutcheson, R. U.; Duschene, K. S.; Marts, A. R.; Hoffman, B. M.; Broderick, J. B. Radical SAM catalysis via an organometallic intermediate with an Fe–[5′-C]-deoxyadenosyl bond. Science 2016, 352, 822-825.

33

(118) Broderick, W. E.; Hoffman, B. M.; Broderick, J. B. Mechanism of Radical Initiation in the Radical S-Adenosyl-L-methionine Superfamily. Acc. Chem. Res. 2018, 51, 2611- 2619.

(119) Heßlinger, C.; Fairhurst, S. A.; Sawers, G. Novel keto acid formate-lyase and propionate kinase enzymes are components of an anaerobic pathway in Escherichia coli that degrades L-threonine to propionate. Mol. Microbiol. 1998, 27, 477-492.

(120) Reddy, S. G.; Wong, K. K.; Parast, C. V.; Peisach, J.; Magliozzo, R. S.; Kozarich, J. W. Dioxygen Inactivation of Pyruvate Formate-Lyase: EPR Evidence for the Formation of Protein-Based Sulfinyl and Peroxyl Radicals. Biochemistry 1998, 37, 558-563.

(121) Wagner, A. F. V.; Schultz, S.; Bomke, J.; Pils, T.; Lehmann, W. D.; Knappe, J. YfiD of Escherichia coli and Y06I of Bacteriophage T4 as Autonomous Glycyl Radical Cofactors Reconstituting the Catalytic Center of Oxygen-Fragmented Pyruvate Formate-Lyase. Biochem. Biophys. Res. Commun. 2001, 285, 456-462.

(122) Broderick, J. B.; Duffus, B. R.; Duschene, K. S.; Shepard, E. M. Radical S- Adenosylmethionine Enzymes. Chem. Rev. 2014, 114, 4229-4317.

(123) Sofia, H. J.; Chen, G.; Hetzler, B. G.; Reyes-Spindola, J. F.; Miller, N. E. Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: functional characterization using new analysis and information visualization methods. Nucleic Acids Res. 2001, 29, 1097-1106.

(124) Broderick, J. B.; Henshaw, T. F.; Cheek, J.; Wojtuszewski, K.; Smith, S. R.; Trojan, M. R.; McGhan, R. M.; Kopf, A.; Kibbey, M.; Broderick, W. E. Pyruvate Formate-Lyase- Activating Enzyme: Strictly Anaerobic Isolation Yields Active Enzyme Containing a [3Fe–4S]+ Cluster. Biochem. Biophys. Res. Commun. 2000, 269, 451-456.

(125) Henshaw, T. F.; Cheek, J.; Broderick, J. B. The [4Fe-4S]1+ Cluster of Pyruvate Formate- Lyase Activating Enzyme Generates the Glycyl Radical on Pyruvate Formate-Lyase: EPR-Detected Single Turnover. J. Am. Chem. Soc. 2000, 122, 8331-8332.

(126) Walsby, C. J.; Ortillo, D.; Broderick, W. E.; Broderick, J. B.; Hoffman, B. M. An Anchoring Role for FeS Clusters: Chelation of the Amino Acid Moiety of S- Adenosylmethionine to the Unique Iron Site of the [4Fe−4S] Cluster of Pyruvate Formate-Lyase Activating Enzyme. J. Am. Chem. Soc. 2002, 124, 11270-11271.

(127) Krebs, C.; Broderick, W. E.; Henshaw, T. F.; Broderick, J. B.; Huynh, B. H. Coordination of Adenosylmethionine to a Unique Iron Site of the [4Fe-4S] of Pyruvate Formate-Lyase Activating Enzyme: A Mössbauer Spectroscopic Study. J. Am. Chem. Soc. 2002, 124, 912-913.

(128) Walsby, C. J.; Hong, W.; Broderick, W. E.; Cheek, J.; Ortillo, D.; Broderick, J. B.; Hoffman, B. M. Electron-Nuclear Double Resonance Spectroscopic Evidence That S- Adenosylmethionine Binds in Contact with the Catalytically Active [4Fe−4S]+ Cluster of Pyruvate Formate-Lyase Activating Enzyme. J. Am. Chem. Soc. 2002, 124, 3143-3151.

34

(129) Grell, T. A. J.; Goldman, P. J.; Drennan, C. L. SPASM and Twitch Domains in S- Adenosylmethionine (SAM) Radical Enzymes. J. Biol. Chem. 2015, 290, 3964-3971.

(130) Landgraf, B. J.; McCarthy, E. L.; Booker, S. J. Radical S-Adenosylmethionine Enzymes in Human Health and Disease. Annu. Rev. Biochem. 2016, 85, 485-514.

(131) Gizzi, A. S.; Grove, T. L.; Arnold, J. J.; Jose, J.; Jangra, R. K.; Garforth, S. J.; Du, Q.; Cahill, S. M.; Dulyaninova, N. G.; Love, J. D.; Chandran, K.; Bresnick, A. R.; Cameron, C. E.; Almo, S. C. A naturally occurring antiviral ribonucleotide encoded by the human genome. Nature 2018, 558, 610-614.

(132) Backman, L. R. F.; Funk, M. A.; Dawson, C. D.; Drennan, C. L. New tricks for the glycyl radical enzyme family. Crit. Rev. Biochem. Mol. Biol. 2017, 52, 674-695.

(133) Selmer, T.; Pierik, A., J.; Heider, J. New glycyl radical enzymes catalysing key metabolic steps in anaerobic bacteria. Biol. Chem. 2005, 386, 981-988.

(134) Shisler, K. A.; Broderick, J. B. Glycyl radical activating enzymes: Structure, mechanism, and substrate interactions. Arch. Biochem. Biophys. 2014, 546, 64-71.

(135) Andersson, J.; Westman, M.; Sahlin, M.; Sjöberg, B.-M. Cysteines Involved in Radical Generation and Catalysis of Class III Anaerobic Ribonucleotide Reductase: A Protein Engineering Study of Bacteriophage T4 NrdD. J. Biol. Chem. 2000, 275, 19449-19455.

(136) Wei, Y.; Mathies, G.; Yokoyama, K.; Chen, J.; Griffin, R. G.; Stubbe, J. A Chemically Competent Thiosulfuranyl Radical on the Escherichia coli Class III Ribonucleotide Reductase. J. Am. Chem. Soc. 2014, 136, 9001-9013.

(137) Raynaud, C.; Sarçabal, P.; Meynial-Salles, I.; Croux, C.; Soucaille, P. Molecular characterization of the 1,3-propanediol (1,3-PD) operon of Clostridium butyricum. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 5010-5015.

(138) Daniel, R.; Bobik, T. A.; Gottschalk, G. Biochemistry of coenzyme B12-dependent glycerol and diol dehydratases and organization of the encoding genes. FEMS Microbiol. Rev. 1998, 22, 553-566.

(139) Saint-Amans, S.; Girbal, L.; Andrade, J.; Ahrens, K.; Soucaille, P. Regulation of Carbon and Electron Flow in Clostridium butyricum VPI 3266 Grown on Glucose-Glycerol Mixtures. J. Bacteriol. 2001, 183, 1748-1754.

(140) O'Brien, J. R.; Raynaud, C.; Croux, C.; Girbal, L.; Soucaille, P.; Lanzilotta, W. N. Insight into the Mechanism of the B12-Independent Glycerol Dehydratase from Clostridium butyricum: Preliminary Biochemical and Structural Characterization. Biochemistry 2004, 43, 4635-4645.

(141) Feliks, M.; Ullmann, G. M. Glycerol Dehydratation by the B12-Independent Enzyme May Not Involve the Migration of a Hydroxyl Group: A Computational Study. J. Phys. Chem. B 2012, 116, 7076-7087.

35

(142) Kovačević, B.; Barić, D.; Babić, D.; Bilić, L.; Hanževački, M.; Sandala, G. M.; Radom, L.; Smith, D. M. Computational Tale of Two Enzymes: Glycerol Dehydration With or Without B12. J. Am. Chem. Soc. 2018, 140, 8487-8496.

(143) Heider, J.; Szaleniec, M.; Martins, B. M.; Seyhan, D.; Buckel, W.; Golding, B. T. Structure and Function of Benzylsuccinate Synthase and Related Fumarate-Adding Glycyl Radical Enzymes. J. Mol. Microbiol. Biotechnol. 2016, 26, 29-44.

(144) Leuthner, B.; Leutwein, C.; Schulz, H.; Hörth, P.; Haehnel, W.; Schiltz, E.; Schägger, H.; Heider, J. Biochemical and genetic characterization of benzylsuccinate synthase from Thauera aromatica: a new glycyl radical enzyme catalysing the first step in anaerobic toluene metabolism. Mol. Microbiol. 1998, 28, 615-628.

(145) Li, L.; Marsh, E. N. G. Mechanism of Benzylsuccinate Synthase Probed by Substrate and Isotope Exchange. J. Am. Chem. Soc. 2006, 128, 16056-16057.

(146) Bharadwaj, V. S.; Dean, A. M.; Maupin, C. M. Insights into the Glycyl Radical Enzyme Active Site of Benzylsuccinate Synthase: A Computational Study. J. Am. Chem. Soc. 2013, 135, 12279-12288.

(147) Funk, M. A.; Judd, E. T.; Marsh, E. N. G.; Elliott, S. J.; Drennan, C. L. Structures of benzylsuccinate synthase elucidate roles of accessory subunits in glycyl radical enzyme activation and activity. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, 10161-10166.

(148) Funk, M. A.; Marsh, E. N. G.; Drennan, C. L. Substrate-bound Structures of Benzylsuccinate Synthase Reveal How Toluene Is Activated in Anaerobic Hydrocarbon Degradation. J. Biol. Chem. 2015, 290, 22398-22408.

(149) Nordlund, P.; Reichard, P. Ribonucleotide Reductases. Annu. Rev. Biochem. 2006, 75, 681-706.

(150) Blaesi, E. J.; Palowitch, G. M.; Hu, K.; Kim, A. J.; Rose, H. R.; Alapati, R.; Lougee, M. G.; Kim, H. J.; Taguchi, A. T.; Tan, K. O.; Laremore, T. N.; Griffin, R. G.; Krebs, C.; Matthews, M. L.; Silakov, A.; Bollinger, J. M.; Allen, B. D.; Boal, A. K. Metal-free class Ie ribonucleotide reductase from pathogens initiates catalysis with a tyrosine-derived dihydroxyphenylalanine radical. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, 10022-10027.

(151) Srinivas, V.; Lebrette, H.; Lundin, D.; Kutin, Y.; Sahlin, M.; Lerche, M.; Eirich, J.; Branca, R. M. M.; Cox, N.; Sjöberg, B.-M.; Högbom, M. Metal-free ribonucleotide reduction powered by a DOPA radical in Mycoplasma pathogens. Nature 2018, 563, 416- 420.

(152) Stubbe, J. Ribonucleotide reductases in the twenty-first century. Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 2723-2724.

(153) Stubbe, J.; Ge, J.; Yee, C. S. The evolution of ribonucleotide reduction revisited. Trends Biochem. Sci. 2001, 26, 93-99.

36

(154) Reichard, P. The evolution of ribonucleotide reduction. Trends Biochem. Sci. 1997, 22, 81-85.

(155) Wei, Y.; Li, B.; Prakash, D.; Ferry, J. G.; Elliott, S. J.; Stubbe, J. A Ferredoxin Disulfide Reductase Delivers Electrons to the Methanosarcina barkeri Class III Ribonucleotide Reductase. Biochemistry 2015, 54, 7019-7028.

(156) Wei, Y.; Funk, M. A.; Rosado, L. A.; Baek, J.; Drennan, C. L.; Stubbe, J. The class III ribonucleotide reductase from bacilliformis can utilize thioredoxin as a reductant. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, E3756-E3765.

(157) Kurokawa, K.; Itoh, T.; Kuwahara, T.; Oshima, K.; Toh, H.; Toyoda, A.; Takami, H.; Morita, H.; Sharma, V. K.; Srivastava, T. P.; Taylor, T. D.; Noguchi, H.; Mori, H.; Ogura, Y.; Ehrlich, D. S.; Itoh, K.; Takagi, T.; Sakaki, Y.; Hayashi, T.; Hattori, M. Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes. DNA Res. 2007, 14, 169-181.

(158) Ellrott, K.; Jaroszewski, L.; Li, W.; Wooley, J. C.; Godzik, A. Expansion of the Protein Repertoire in Newly Explored Environments: Human Gut Microbiome Specific Protein Families. PLoS Comput. Biol. 2010, 6, e1000798.

(159) Kolmeder, C. A.; de Been, M.; Nikkilä, J.; Ritamo, I.; Mättö, J.; Valmu, L.; Salojärvi, J.; Palva, A.; Salonen, A.; de Vos, W. M. Comparative Metaproteomics and Diversity Analysis of Human Intestinal Microbiota Testifies for Its Temporal Stability and Expression of Core Functions. PLoS One 2012, 7, e29913.

(160) Tatusov, R. L.; Koonin, E. V.; Lipman, D. J. A Genomic Perspective on Protein Families. Science 1997, 278, 631-637.

(161) Finn, R. D.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Mistry, J.; Mitchell, A. L.; Potter, S. C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A.; Salazar, G. A.; Tate, J.; Bateman, A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016, 44, D279-D285.

(162) Ackermann, V. D.; Schütze, H. Über die Bildung von Trimethylamin durch Bacterium prodigiosum. Zentralbl. Physiol. 1910, 24, 210-211.

(163) Zeisel, S. H.; Warrier, M. Trimethylamine N-Oxide, the Microbiome, and Heart and Kidney Disease. Annu. Rev. Nutr. 2017, 37, 157-181.

(164) Tang, W. H. W.; Hazen, S. L. Microbiome, trimethylamine N-oxide, and cardiometabolic disease. Transl. Res. 2017, 179, 108-115.

(165) Craciun, S.; Balskus, E. P. Microbial conversion of choline to trimethylamine requires a glycyl radical enzyme. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 21307-21312.

(166) Kaval, K. G.; Garsin, D. A. Ethanolamine Utilization in Bacteria. mBio 2018, 9, 00066- 18.

37

(167) Martínez-del Campo, A.; Bodea, S.; Hamer, H. A.; Marks, J. A.; Haiser, H. J.; Turnbaugh, P. J.; Balskus, E. P. Characterization and Detection of a Widely Distributed Gene Cluster That Predicts Anaerobic Choline Utilization by Human Gut Bacteria. mBio 2015, 6, e00042-15.

(168) Craciun, S.; Marks, J. A.; Balskus, E. P. Characterization of Choline Trimethylamine- Lyase Expands the Chemistry of Glycyl Radical Enzymes. ACS Chem. Biol. 2014, 9, 1408-1413.

(169) Bodea, S.; Funk, M. A.; Balskus, E. P.; Drennan, C. L. Molecular Basis of C–N Bond Cleavage by the Glycyl Radical Enzyme Choline Trimethylamine-Lyase. Cell Chem. Biol. 2016, 23, 1206-1216.

(170) Romano, K. A.; Martinez-del Campo, A.; Kasahara, K.; Chittim, C. L.; Vivas, E. I.; Amador-Noguez, D.; Balskus, E. P.; Rey, F. E. Metabolic, Epigenetic, and Transgenerational Effects of Gut Bacterial Choline Consumption. Cell Host Microbe 2017, 22, 279-290.

(171) Kerfeld, C. A.; Aussignargues, C.; Zarzycki, J.; Cai, F.; Sutter, M. Bacterial microcompartments. Nat. Rev. Microbiol. 2018, 16, 277-290.

(172) Axen, S. D.; Erbilgin, O.; Kerfeld, C. A. A of Bacterial Microcompartment Loci Constructed by a Novel Scoring Method. PLoS Comput. Biol. 2014, 10, e1003898.

(173) Zarzycki, J.; Erbilgin, O.; Kerfeld, C. A. Bioinformatic Characterization of Glycyl Radical Enzyme-Associated Bacterial Microcompartments. Appl. Environ. Microbiol. 2015, 81, 8315-8329.

(174) Zarzycki, J.; Sutter, M.; Cortina, N. S.; Erb, T. J.; Kerfeld, C. A. In Vitro Characterization and Concerted Function of Three Core Enzymes of a Glycyl Radical Enzyme - Associated Bacterial Microcompartment. Sci. Rep. 2017, 7, 42757.

(175) Scott, K. P.; Martin, J. C.; Campbell, G.; Mayer, C.-D.; Flint, H. J. Whole-Genome Transcription Profiling Reveals Genes Up-Regulated by Growth on Fucose in the Human Gut Bacterium “Roseburia inulinivorans”. J. Bacteriol. 2006, 188, 4340-4349.

(176) Petit, E.; LaTouf, W. G.; Coppi, M. V.; Warnick, T. A.; Currie, D.; Romashko, I.; Deshpande, S.; Haas, K.; Alvelo-Maurosa, J. G.; Wardman, C.; Schnell, D. J.; Leschine, S. B.; Blanchard, J. L. Involvement of a Bacterial Microcompartment in the Metabolism of Fucose and Rhamnose by Clostridium phytofermentans. PLoS One 2013, 8, e54337.

(177) Stone, R. W.; Machamer, H. E.; McAleer, W. J.; Oakwood, T. S. Fermentation of tyrosine by marine bacteria. Arch. Biochem. 1949, 21, 217-223.

(178) Elsden, S. R.; Hilton, M. G.; Waller, J. M. The End Products of the Metabolism of Aromatic Amino Acids by Clostridia. Arch. Microbiol. 1976, 107, 283-288.

38

(179) Selmer, T.; Andrei, P. I. p-Hydroxyphenylacetate decarboxylase from Clostridium difficile. Eur. J. Biochem. 2001, 268, 1363-1372.

(180) Selvaraj, B.; Buckel, W.; Golding, B. T.; Ullmann, G. M.; Martins, B. M. Structure and Function of 4-Hydroxyphenylacetate Decarboxylase and Its Cognate Activating Enzyme. J. Mol. Microbiol. Biotechnol. 2016, 26, 76-91.

(181) Clayton, T. A.; Baker, D.; Lindon, J. C.; Everett, J. R.; Nicholson, J. K. Pharmacometabonomic identification of a significant host-microbiome metabolic interaction affecting human drug metabolism. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 14728-14733.

(182) Persico, A. M.; Napolioni, V. Urinary p-cresol in autism spectrum disorder. Neurotoxicol. Teratol. 2013, 36, 82-90.

(183) Kang, D.-W.; Ilhan, Z. E.; Isern, N. G.; Hoyt, D. W.; Howsmon, D. P.; Shaffer, M.; Lozupone, C. A.; Hahn, J.; Adams, J. B.; Krajmalnik-Brown, R. Differences in fecal microbial metabolites and microbiota of children with autism spectrum disorders. Anaerobe 2018, 49, 121-131.

(184) Gacias, M.; Gaspari, S.; Santos, P.-M. G.; Tamburini, S.; Andrade, M.; Zhang, F.; Shen, N.; Tolstikov, V.; Kiebish, M. A.; Dupree, J. L.; Zachariou, V.; Clemente, J. C.; Casaccia, P. Microbiota-driven transcriptional changes in prefrontal cortex override genetic differences in social behavior. eLife 2016, 5, e13442.

(185) Zargar, K.; Saville, R.; Phelan, R. M.; Tringe, S. G.; Petzold, C. J.; Keasling, J. D.; Beller, H. R. In vitro Characterization of Phenylacetate Decarboxylase, a Novel Enzyme Catalyzing Toluene Biosynthesis in an Anaerobic Microbial Community. Sci. Rep. 2016, 6, 31362.

(186) Beller, H. R.; Rodrigues, A. V.; Zargar, K.; Wu, Y.-W.; Saini, A. K.; Saville, R. M.; Pereira, J. H.; Adams, P. D.; Tringe, S. G.; Petzold, C. J.; Keasling, J. D. Discovery of enzymes for toluene synthesis from anoxic microbial communities. Nat. Chem. Biol. 2018, 14, 451-457.

(187) Tailford, L. E.; Crost, E. H.; Kavanaugh, D.; Juge, N. Mucin glycan foraging in the human gut microbiome. Front. Genet. 2015, 6, 81.

(188) Becker, D. J.; Lowe, J. B. Fucose: biosynthesis and biological function in mammals. Glycobiology 2003, 13, 41R-53R.

(189) Pickard, J. M.; Chervonsky, A. V. Intestinal Fucose as a Mediator of Host–Microbe Symbiosis. J. Immunol. 2015, 194, 5588-5593.

(190) Kelly, R. J.; Rouquier, S.; Giorgi, D.; Lennon, G. G.; Lowe, J. B. Sequence and Expression of a Candidate for the Human Secretor Blood Group α(1,2)Fucosyltransferase Gene (FUT2): Homozygosity for an Enzyme-Inactivating Nonsense Mutation Commonly Correlates with the Non-Secretor Phenotype. J. Biol. Chem. 1995, 270, 4640-4649.

39

(191) McGovern, D. P. B.; Jones, M. R.; Taylor, K. D.; Marciante, K.; Yan, X.; Dubinsky, M.; Ippoliti, A.; Vasiliauskas, E.; Berel, D.; Derkowski, C.; Dutridge, D.; International, I. B. D. G. C.; Fleshner, P.; Shih, D. Q.; Melmed, G.; Mengesha, E.; King, L.; Pressman, S.; Haritunians, T.; Guo, X.; Targan, S. R.; Rotter, J. I. Fucosyltransferase 2 (FUT2) non- secretor status is associated with Crohn's disease. Hum. Mol. Genet. 2010, 19, 3468-3476.

(192) Rausch, P.; Rehman, A.; Künzel, S.; Häsler, R.; Ott, S. J.; Schreiber, S.; Rosenstiel, P.; Franke, A.; Baines, J. F. Colonic mucosa-associated microbiota is influenced by an interaction of Crohn disease and FUT2 (Secretor) genotype. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 19030-19035.

(193) Hooper, L. V.; Xu, J.; Falk, P. G.; Midtvedt, T.; Gordon, J. I. A molecular sensor that allows a gut commensal to control its nutrient foundation in a competitive ecosystem. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 9833-9838.

(194) Goto, Y.; Obata, T.; Kunisawa, J.; Sato, S.; Ivanov, I. I.; Lamichhane, A.; Takeyama, N.; Kamioka, M.; Sakamoto, M.; Matsuki, T.; Setoyama, H.; Imaoka, A.; Uematsu, S.; Akira, S.; Domino, S. E.; Kulig, P.; Becher, B.; Renauld, J.-C.; Sasakawa, C.; Umesaki, Y.; Benno, Y.; Kiyono, H. Innate lymphoid cells regulate intestinal epithelial cell glycosylation. Science 2014, 345, 1254009.

(195) Pickard, J. M.; Maurice, C. F.; Kinnebrew, M. A.; Abt, M. C.; Schenten, D.; Golovkina, T. V.; Bogatyrev, S. R.; Ismagilov, R. F.; Pamer, E. G.; Turnbaugh, P. J.; Chervonsky, A. V. Rapid fucosylation of intestinal epithelium sustains host–commensal symbiosis in sickness. Nature 2014, 514, 638-641.

(196) Ng, K. M.; Ferreyra, J. A.; Higginbottom, S. K.; Lynch, J. B.; Kashyap, P. C.; Gopinath, S.; Naidu, N.; Choudhury, B.; Weimer, B. C.; Monack, D. M.; Sonnenburg, J. L. Microbiota-liberated host sugars facilitate post-antibiotic expansion of enteric pathogens. Nature 2013, 502, 96-99.

(197) Pacheco, A. R.; Curtis, M. M.; Ritchie, J. M.; Munera, D.; Waldor, M. K.; Moreira, C. G.; Sperandio, V. Fucose sensing regulates bacterial intestinal colonization. Nature 2012, 492, 113-117.

(198) Dogan, B.; Suzuki, H.; Herlekar, D.; Sartor, R. B.; Campbell, B. J.; Roberts, C. L.; Stewart, K.; Scherl, E. J.; Araz, Y.; Bitar, P. P.; Lefébure, T.; Chandler, B.; Schukken, Y. H.; Stanhope, M. J.; Simpson, K. W. Inflammation-associated Adherent-invasive Escherichia coli Are Enriched in Pathways for Use of Propanediol and Iron and M-cell Translocation. Inflamm. Bowel Dis. 2014, 20, 1919-1932.

(199) Viladomiu, M.; Kivolowitz, C.; Abdulhamid, A.; Dogan, B.; Victorio, D.; Castellanos, J. G.; Woo, V.; Teng, F.; Tran, N. L.; Sczesnak, A.; Chai, C.; Kim, M.; Diehl, G. E.; Ajami, N. J.; Petrosino, J. F.; Zhou, X. K.; Schwartzman, S.; Mandl, L. A.; Abramowitz, M.; Jacob, V.; Bosworth, B.; Steinlauf, A.; Scherl, E. J.; Wu, H.-J. J.; Simpson, K. W.; Longman, R. S. IgA-coated E. coli enriched in Crohn’s disease spondyloarthritis promote TH17-dependent inflammation. Sci. Transl. Med. 2017, 9, eaaf9655.

40

(200) Winter, S. E.; Thiennimitr, P.; Winter, M. G.; Butler, B. P.; Huseby, D. L.; Crawford, R. W.; Russell, J. M.; Bevins, C. L.; Adams, L. G.; Tsolis, R. M.; Roth, J. R.; Bäumler, A. J. Gut inflammation provides a respiratory electron acceptor for Salmonella. Nature 2010, 467, 426-429.

(201) Faber, F.; Thiennimitr, P.; Spiga, L.; Byndloss, M. X.; Litvak, Y.; Lawhon, S.; Andrews- Polymenis, H. L.; Winter, S. E.; Bäumler, A. J. Respiration of Microbiota-Derived 1,2- propanediol Drives Salmonella Expansion during Colitis. PLoS Pathog. 2017, 13, e1006129.

(202) Price-Carter, M.; Tingey, J.; Bobik, T. A.; Roth, J. R. The Alternative Electron Acceptor Tetrathionate Supports B12-Dependent Anaerobic Growth of Salmonella enterica Serovar Typhimurium on Ethanolamine or 1,2-Propanediol. J. Bacteriol. 2001, 183, 2463-2475.

(203) Badía, J.; Ros, J.; Aguilar, J. Fermentation mechanism of fucose and rhamnose in Salmonella typhimurium and Klebsiella pneumoniae. J. Bacteriol. 1985, 161, 435-437.

(204) Chen, Y.-M.; Zhu, Y.; Lin, E. C. C. The organization of the fuc regulon specifying L- fucose dissimilation in Escherichia coli K12 as determined by gene cloning. Mol. Gen. Genet. 1987, 210, 331-337.

(205) Toraya, T. Cobalamin-dependent dehydratases and a deaminase: Radical catalysis and reactivating chaperones. Arch. Biochem. Biophys. 2014, 544, 40-57.

(206) Bobik, T. A.; Havemann, G. D.; Busch, R. J.; Williams, D. S.; Aldrich, H. C. The Propanediol Utilization (pdu) Operon of Salmonella enterica Serovar Typhimurium LT2 Includes Genes Necessary for Formation of Polyhedral Organelles Involved in Coenzyme B12-Dependent 1,2-Propanediol Degradation. J. Bacteriol. 1999, 181, 5967-5975.

41

Chapter 2: Quantification of glycyl radical enzymes in healthy human gut metagenomes and characterization of abundant dehydratasesi

2.1: Introduction

The human gut microbiota, the organisms collectively residing in the human gastrointestinal tract, has a profound impact on host health.1-2 As described in Chapter 1, the gut microbiota consists of trillions of microorganisms and these microbes influence host physiology in a myriad of ways.3-4 Despite the known importance of these microbial activities, we have extremely limited knowledge of the precise biochemical reactions performed in these environments and an even poorer understanding of the molecular mechanisms through which this chemistry impacts host health.5 Tremendous advances in sequencing of both gut microbial genomes and metagenomes have led to a much greater understanding of the microbes present in these complex communities,6-8 but the vast number of uncharacterized enzymes encoded by these organisms complicates efforts to predict their metabolic activities in these environments.

Despite our improved ability to obtain genomic and metagenomic sequencing data, ~80% of genes in the Human Microbiome Project (HMP) metagenomes cannot be assigned a metabolic function, and ~50% cannot be given any annotation at all.5,9 Furthermore, genes that can be assigned a predicted function are typically mapped to large enzyme superfamilies capable of catalyzing many different reactions, and as many as 80% of enzymes within a superfamily can be uncharacterized or misannotated.10-11 Functional profiling approaches that accurately identify enzymes in microbiomes are needed to determine the underlying molecular mechanisms for how gut microbes impact host health.

i Parts of this chapter are adapted from the following publication: Levin, B. J.; Huang, Y. Y.; Peck, S. C.; Wei, Y.; Martínez-del Campo, A.; Marks, J. A.; Franzosa, E. A.; Huttenhower, C.; Balskus, E. P. A prominent glycyl radical enzyme in human gut microbiomes metabolizes trans- 4-hydroxy-L-proline. Science 2017, 355, eaai8386. 42

In this Chapter, I describe a bioinformatics method developed to quantify functionally distinct members of large protein families in metagenomes and metatranscriptomes. This method, termed chemically guided functional profiling (CGFP), can distinguish between and provide abundance data for highly similar enzymes (Figure 2.1). By quantifying enzymes encoded by microbiomes directly, the impact of these enzymes on host biology can be determined more precisely. Traditional techniques to study the gut microbiota rely on 16S rRNA sequencing and microbial culturing to determine which organisms are present,12 but variability in metabolic capabilities of closely related microbes limits the value of this data.13 Metagenomics- centered approaches, like CGFP, do not rely on species identification or the availability of gut microbial reference genomes. Additionally, most bioinformatics methods for analyzing metagenomes rely on assembling short sequencing reads into longer contigs and scaffolds,14 yet these methods lose valuable quantification data that CGFP retains.

Figure 2.1: Outline of chemically guided functional profiling.

To accurately distinguish between different enzymes in metagenomes, my collaborators and I needed two different bioinformatics tools: one to group enzymes with the same function and the other to quantify genes encoding these proteins in metagenomes. We integrated the

Enzyme Function Initiative’s Enzyme Similarity Tool (EFI-EST), which generates protein

43 sequence similarity networks (SSNs),15-17 with ShortBRED (Short, Better Representative Extract

Dataset), a computational suite developed to quantify genes in metagenomes.18 SSNs aid the study of large protein families by clustering sequences with a user-defined similarity together.19

Compared with other methods of group sequences, such as phylogenetic analyses, SSNs can be generated more rapidly and can incorporate many more sequences. If constructed with an understanding of the biochemistry of characterized family members, an SSN can group sequences according to their function.20-22 ShortBRED identifies unique amino acid sequence markers for input protein sequences and then quantifies the abundance of these characteristic sequences in raw metagenomic sequencing reads. As described in this Chapter, combining these tools allows for the identification of characterized and uncharacterized proteins encoded in metagenomes.

In addition to describing the development of CGFP, in this Chapter I also detail the use of

CGFP in studying the glycyl radical enzyme (GRE) superfamily in healthy human gut microbiomes, as well as report the biochemical characterization of the GRE propanediol dehydratase (PD).23 As described in Chapter 1, the GRE superfamily participates in evolutionarily ancient, anaerobic primary metabolism, and previous metagenomic and metaproteomic studies suggest that GREs are one of the most abundant protein superfamilies in the human gut microbiome.24-26 Despite this connection, little is known about the abundance and distribution of different types of GREs in human microbiomes, and previous attempts to profile these enzymes have been complicated by the high sequence similarity shared by GREs with distinct functions.27 By applying CGFP to the GRE family and validating the bioinformatics results by performing in vitro biochemical characterization of abundant GREs, new insights have been gained into this protein superfamily’s prominent role in the human gut. I have also explored

44 how this workflow can be applied to study other protein families present in gut metagenomes and other microbiomes.

Parts of the work in this Chapter were performed by and with other lab members and co- workers; each person’s contributions are recognized in the Results and Discussion section and the Materials and Methods section.

2.2: Results and discussion

2.2.1: GRE sequence similarity network construction

My collaborators and I began our analysis by building an SSN using 6,343 sequences from InterPro family IPR004184,28 which includes enzymes containing a “PFL domain” and encompasses all functionally characterized GREs (as of 2017) except for the phylogenetically distinct ribonucleotide reductases, by using the web-based EFI-EST.15 SSN construction was performed primarily by Dr. Spencer Peck (Harvard). The initial network was constructed so that nodes, each representing sequences sharing ≥95% amino acid identity, were connected if the sequences they represent share an alignment score ≤10−300. This network was iteratively refined with additional percent identity filters, removing edges that did not meet each threshold to generate multiple SSNs (Figure 2.2). Each biochemically characterized GRE was mapped onto these networks, using conserved active site amino acid residues known to be involved in substrate binding and catalysis to confirm these functional assignments (Figure 2.3). For example, sequences containing an active site Cys-Cys motif were annotated as pyruvate formate- lyases, as this motif is found in all GREs with this activity (>20 different proteins).29

Ultimately, 62% percent amino acid identity was used as a minimum edge threshold for the SSN (Figure 2.4 and Figure 2.5). At this level all biochemically verified GRE activities are divided into different clusters. In particular, a characterized glycerol dehydratase (GD),30-31 a

45 predicted 1,2-propanediol dehydrating GRE identified in gene clusters responsible for L-fucose metabolism (PD),32 and a then-unknown GRE with a distinct genomic context (UD)33-34 are separated into unique clusters only when the edge threshold is raised to 62% (Figure 2.4). We cannot know for certain that all of the GRE clusters in this SSN are isofunctional, but the separation of these highly similar GREs indicates that this edge threshold is high enough to divide GREs with similar but distinct activities into different clusters. The presence of an uncharacterized GRE among these dehydratases reflects the large number of uncharacterized family members. 195 of the 241 clusters in the final SSN have no assignable biochemical function, hinting that there is considerable unexplored chemical potential in the GRE superfamily.

46

Figure 2.2: GRE SSNs constructed with varying percent identity cutoffs. Nodes represent groups of GRE sequences sharing ≥95% amino acid sequence identity. All networks were generated with an initial BLAST alignment edge threshold of ≤10−300. An additional filter removes edges between nodes sharing less than (A) 55%, (B) 60%, (C) 62%, and (D) 65%. This figure was created by Dr. Spencer Peck (Harvard).

47

Figure 2.3: Multiple sequence alignment of selected GREs. Regions shown contain residues occupying the active sites of structurally characterized GREs and homology models of uncharacterized GREs. Residues marked with asterisks are conserved and are involved with substrate binding or catalysis. Numbering corresponds to PD from R. inulinivorans (uncharacterized cluster 16). Accessions codes are from UniProt.

Figure 2.4: Subsets of SSNs showing clusters containing similar diol dehydratases. SSNs were constructed as described in Figure 2.2. With a (A) 55% and (B) 60% percent identity edge threshold, clusters containing distinct GREs are still connected, and only at (C) 62% percent edge threshold are all of these distinct GREs divided into different connect components.

48

Figure 2.5: SSN of the GRE superfamily. Nodes are connected if they share ≥62% sequence identity and have a BLAST alignment score of <10−300. Each of the 1,843 nodes represents sequences with ≥95% amino acid identity.

2.2.2: Integrating the GRE SSN with quantitative metagenomics

I then used ShortBRED to profile the abundance of the entire GRE superfamily in 378 high-quality, first-visit metagenomes from healthy participants sequenced during first phase of the Human Microbiome Project (Figure 2.6 and Figure 2.7).9 Metagenomes from six body sites were obtained in this effort: stool, buccal mucosa, supragingival plaque, tongue dorsum, anterior

49 nares, and posterior fornix. These body sites include aerobic (skin, nasal, and vaginal), microaerobic (oral), and anaerobic (gastrointestinal tract, represented by stool samples) environments.

ShortBRED is a bioinformatics tool developed to quantify genes in metagenomic data.18

Accurate methods to measure the abundance of proteins in metagenome are essential for studying the functions of complex microbial communities, but the short length of sequencing reads makes this a challenging task. Profiling of protein families in metagenomes typically involves mapping translated open reading frames to protein sequence databases or assembling entire metagenomes and annotating them using a reference database.35-37 However, these homology-based searches result in many false positives due to the local nature of these searches.38 Furthermore, methods involving metagenome assembly are computationally costly and require significant sequencing depth. To circumvent these challenges, the Huttenhower laboratory developed a tool to identify unique peptide markers within proteins of interest.

Searching for these markers in large sequencing datasets is much more rapid than searching relying on local alignments,39 and the Huttenhower laboratory leveraged that result to create

ShortBRED, which can quantify specific protein markers, and the protein it derives from, in metagenomes faster and more accurately than previous methods.18

50

Figure 2.6: Integration of SSNs and ShortBRED for CGFP. (A) The sequences in an SSN are extracted and ShortBRED-Identify is used to find short markers unique to these sequences yet absent from other proteins. (B) ShortBRED- Quantify is used to calculate the abundances of these markers in metagenomes, and values for proteins from the same SSN cluster are summed.

The first component of ShortBRED is ShortBRED-Identify (Figure 2.6A). This program creates short, unique markers for the inputted proteins of interest. ShortBRED-Identify constructs markers that are present in proteins of interest, but absent from a larger reference set

(typically UniRef90).40 These markers are considerably smaller than and highly specific for the proteins they represent, and this feature accelerates searching and results in fewer false positives than other methods.18 I used ShortBRED-Identify to construct markers for GREs that are highly similar (85% amino acid identity). With these markers, I used ShortBRED-Quantify, the second component of ShortBRED, to measure their abundance in the unassembled metagenomic reads in each of the 378 samples (Figure 2.6B). By combining the abundances of sequence markers belonging to the same connected components on our SSN, I determined the total abundance of each group of GREs. Finally, I normalized these abundance values using previously calculated average microbial genome sizes.41 This normalization step converted the output to gene abundance per microbial genome, allowing for easier interpretation and comparisons between samples (Figure 2.7).

51

Figure 2.7: CGFP of GREs in healthy human microbiomes. The heatmap shows the abundance and distribution of the 50 most abundant GRE clusters from the SSN in the sampled metagenomes as quantified using ShortBRED. Biochemically characterized GRE clusters are listed in bold type.

2.2.3: Abundances and distributions of characterized GREs in healthy human microbiomes

I detected sequences belonging to 75 of the 241 GRE clusters from our SSN, suggesting that the human host supports a wide range of GRE-mediated chemistry. GREs were identified in all of the oral and stool metagenomes profiled, but only a subset of samples from other body sites contained any GREs. These enzymes are inactivated by oxygen, and it was expected that the anaerobic environment of the human gut and the microaerobic oral body sites would contain

52 more GREs than the other, aerobic body sites. Pyruvate formate-lyase (PFL) was the most abundant family member in all GRE-containing samples (Figure 2.8), consistent with its role in anaerobic glucose metabolism.42-43 The presence of PFL in many facultative anaerobes and existence of mechanisms to repair oxygen-damaged PFL may explain its occurrence in both anaerobic and aerobic environments.44 I observed a unique set of GREs in stool samples compared to the other body sites and identified significantly more GREs per microbial genome in this body site [P < 10−58, Kruskal-Wallis (KW); all Ps < 10−8, Dunn’s multiple comparisons

(DMC) test]. Additionally, more distinct GRE clusters were present in the gut than in other body sites (75 vs. ≤15), indicating that the gastrointestinal tract harbors a wider range of anaerobic metabolic processes than other human body microbiomes.

Figure 2.8: Abundance of PFL in metagenomes. Per-site abundance values across the six body sites are presented as Tukey boxplots.

In addition to PFL, other biochemically characterized GREs were identified in this dataset. 4-Hydroxyphenylacetate decarboxylase (HPAD) was found almost exclusively in stool samples, while choline trimethylamine-lyase (CutC) was present in stool, supragingival plaque, buccal mucosa, and tongue dorsum samples at similar abundance levels (Figure 2.9). Identifying

53 the disease-linked CutC in oral microbiomes is intriguing, as periodontal disease and invasion of the GI tract by oral bacteria are associated with heart and liver diseases.45-46 This finding suggests that the oral microbiome may be a reservoir for trimethylamine-producing bacteria.

Unlike PFL, HPAD and CutC were detected in only a subset of stool metagenomes, which is consistent with the observed variability in the production and amounts of downstream metabolites p-cresol sulfate and trimethylamine-N-oxide in humans.47-48 This variation could contribute to inter-individual differences in drug metabolism and disease susceptibility.

Figure 2.9: Abundances of characterized GREs in metagenomes. Per-site abundances of (A) HPAD and (B) CutC across the 6 body sites are presented as Tukey boxplots.

2.2.4: Previously uncharacterized GREs were profiled using CGFP

I also obtained information about the abundance of uncharacterized GREs in human microbiomes, and the data suggest that many as-yet unappreciated GRE-mediated activities exist in the human gut. GREs of unknown function represent 9 of the 10 most abundant GRE clusters in stool metagenomes and, excluding PFL, outnumber characterized family members 63-fold.

The 9th and 10th most abundant unknown GRE clusters were widely distributed in stool

54 metagenomes (>50% of samples) but were each represented by a single node in the SSN. The

GREs in the former cluster were found in Bacteroides fragilis and related microbes, while the latter cluster was composed of GREs from various microbes in the phylum Firmicutes. This observation underscores how proteins that are poorly represented in sequence databases may be widespread in biological habitats. This analysis also led to our prioritizing two specific GREs for further study. In particular, my collaborators and I focused on the two most broadly distributed and abundant uncharacterized GREs in the human gut microbiome. Cluster 16 was found in 96% of stool samples, is the third-most abundant GRE in stool metagenomes, and is enriched in this habitat relative to other body sites (P < 10−72, KW; all Ps < 10−15, DMC; Figure 2.10A). Cluster

15 was identified in every stool sample and is the second most abundant GRE in these samples, aside from PFL (Figure 2.10B). Cluster 15 was also significantly enriched in the stool metagenomes relative to the other body sites (P < 10−60, KW; all Ps < 10−11, DMC).

Figure 2.10: Per-site abundances of previously uncharacterized GREs. Abundances of (A) Cluster 16 (PD) and (B) Cluster 15 (t4LHypD) across the 6 body sites are presented as Tukey boxplots.

55

2.2.5: Analysis of metatranscriptomes for GREs

The high abundance and wide distribution of these two previously uncharacterized GREs suggested that they might perform prominent metabolic functions in the healthy human gut. To determine whether the genes encoding for these GREs were expressed in gut microbiomes, I applied the CGFP method to analyze paired stool metagenomes and metatranscriptomes from eight healthy human subjects.49 The genes encoding for the Cluster 15 and 16 GREs were found in all samples, and transcripts for these genes were present as well (Figure 2.11), indicating that these GREs are expressed and likely active in the human gut. These results support the hypothesis that these GREs perform core functions within the healthy human gut and are distinctive of this habitat.

Figure 2.11: Detection of GREs in paired meta-omics data. The abundances of the genes and transcripts encoding for Cluster 16 (PD) and Cluster 15 (t4LHypD) were quantified in eight paired stool (A) metagenomes and (B) metatranscriptomes. All samples encoded for and expressed these GREs.

56

2.2.6: Biochemical characterization of propanediol dehydratase

We were able to connect the GREs in Cluster 16 to anaerobic L-fucose utilization, a microbial metabolic activity that plays an important role in maintaining gut microbial-host symbiosis. Gut bacteria consume this host-derived deoxysugar to produce beneficial short-chain fatty acids like propionate as end products (Figure 2.12).50-53 A key transformation in the bacterial production of propionate from L-fucose is the dehydration of (S)-1,2-propanediol to

54 propionaldehyde by a B12-dependent propanediol dehydratase. However, the L-fucose- metabolizing human gut bacterium Roseburia inulinivorans A2–194 lacks this B12-dependent enzyme but instead encodes a GRE from Cluster 16. This R. inulinivorans GRE was previously hypothesized to be a B12-independent propanediol dehydratase (PD) based on its position in an L-

32 fucose utilization (fuc) gene cluster and its upregulation during growth on L-fucose.

57

Figure 2.12: Metabolism of L-fucose to propionate by gut microbes. (A) Overview of L-fucose metabolism. Host intestinal epithelial cells produce L- fucosylated glycans, which gut microbes can hydrolyze to access free L-fucose. In turn, these gut microbes excrete propionate, which is beneficial for the host. (B) Overview of fuc and pdu pathways. Gut microbes use the fuc pathway to catabolize L-fucose to (S)-1,2-propanediol and dihydroxyacetone phosphate (DHAP). Some microbes can also further metabolize (S)-1,2-propanediol to propionaldehyde, which is further oxidized or reduced.

I verified this proposal by characterizing the R. inulinivorans PD and its activating enzyme (PD-AE) in vitro. The genes encoding for PD and PD-AE were cloned into pET-29b and pET-28a expression vectors, respectively, and heterologously expressed in Escherichia coli

BL21(DE3). The C-His6-tagged construct of PD was overexpressed and purified aerobically and sparged with argon before use to remove oxygen, while the N-His6-tagged construct of PD-AE was overexpressed and purified under anaerobic conditions. Electron paramagnetic resonance

(EPR) spectroscopy showed that PD-AE can generate a glycine-centered radical on PD (Figure

2.13). As described in Chapter 1, all GREs contain a catalytically essential glycyl radical, and the unpaired electron in this species can be detected spectroscopically.55-56 The EPR spectrum I obtained for activated PD is entirely consistent with spectra obtained with other GREs. The

58 conserved glycine residue G817 in PD is conserved in all GREs and thought to be the site of the glycyl radical, and my inability to activate the G817A mutant of PD supports this prediction.

Figure 2.13: EPR spectra for wild-type PD and mutants activated by PD-AE. Representative EPR spectra are shown for wild-type PD (18% of protein monomers activated) and four PD mutants: PD-G817A (0%), PD-C438S (40%), PD-E440Q (7%), and PD-H166Q (12%). For all spectra (except PD-G817A), g = 2.0036 ± 0.0001 and A = 1.40 ± 0.02 mT. No spectrum was simulated for the PD- G817A mutant because no radical could be detected.

Next, I used (GC–MS) to confirm that activated

PD converts (S)-1,2-propanediol to propionaldehyde (Figure 2.14A). Assays containing all components required for PD activation [S-adenosylmethionine (SAM), PD-AE, and 5- deazariboflavin] as well as substrate produced propionaldehyde as expected. Assays not containing activated PD did not produce any propionaldehyde, confirming that activated PD is

59 necessary for catalysis. (R)-1,2-propanediol was also dehydrated by PD, and it was unclear from these endpoint experiments which compound was turned over with greater efficiency.

Figure 2.14: Validation of activity of PD. (A) End-point GC-MS assays for detection of activity by PD. Propionaldehyde detected in end-point GC-MS assays containing 5 mM (S)-1,2-propanediol. Chromatograms were taken in single ion recording mode. Assays were quenched after 20 min of incubation. (B) Kinetic analysis of PD. Error bars represent the mean ± SD of three replicates.

Coupled spectrophotometric assays with horse liver alcohol dehydrogenase were performed to determine the catalytic efficiency of PD (Figure 2.14B). These analyses revealed a

-1 26-fold difference in specificity for (S)- versus (R)-1,2-propanediol (kcat = 1500 ± 100 s , Km =

5 -1 -1 -1 7.8 ± 0.6 mM, kcat/Km = 1.9 ± 0.2 × 10 M s vs. kcat = 330 ± 40 s , Km = 44 ± 4 mM, kcat/Km =

7.5 ± 0.8 × 103 M-1 s-1), a stereochemical preference in accordance with PD’s proposed role in

L-fucose metabolism. The kinetic parameters obtained with the native substrate are similar to

57-58 those of other GREs, while the increased Km value may indicate a high effective concentration of substrate in vivo, which would be consistent with the predicted localization of

PD within a bacterial microcompartment.32,59 These findings were corroborated qualitatively by an independent study of PD undertaken by another laboratory.60 Overall, these kinetics experiments support that PD dehydrates (S)-1,2-propanediol derived from L-fucose.

60

2.2.7: Identification of conserved residues in dehydrating GREs

Jonathan Marks (Harvard) constructed a homology model of PD and docked both (S)- and (R)-1,2-propanediol into its active site. These models were compared to a crystal structure of

GD in order to identify which residues may be important for substrate binding or catalysis

(Figure 2.15A; Figure 2.16A and B).31 Several active site amino acids from PD are conserved in

GD, including G817 and C438, the sites of the radical intermediates thought to initiate catalysis via hydrogen atom abstraction from C1 of the substrate. In addition, E440, a general base proposed to deprotonate the C1-hydroxyl group, and H166, which is predicted to protonate the departing C2-hydroxyl group, were also present in both enzymes.61-62 The homology models constructed agree well with crystal structures of PD that were reported after homology model generation (Figure 2.16C).60 Using site-directed mutagenesis, I confirmed that these four residues are critical for activity (Figure 2.15B). Crucially, I was able to detect the active glycyl radical on each of the mutants except for the G817A mutant (Figure 2.13), confirming that these proteins can still be posttranslationally modified by PD-AE. However, no turnover was observed for any of the PD mutants, verifying that these additional active site residues are critical for activity.

61

Figure 2.15: Identifying active site residues conserved in dehydrating GREs. (A) Comparing the PD homology model (green) with the GD crystal structure (yellow) highlights a characteristic set of active site residues found in GREs catalyzing dehydration. (B) GC–MS analysis of assays containing wild-type PD or PD active site mutants and substrate demonstrates that the residues of the “dehydratase motif” are essential for activity.

62

Figure 2.16: Overlays of the PD homology model with GD and PD crystal structures. (A) An overlay of the homology model of PD (green) with the crystal structure of GD (yellow) (PDB ID: 4MTJ). (B) An overlay of the homology model of PD with (R)-1,2-propanediol docked (green) and the crystal structure of GD with (S)-1,2-propanediol bound (yellow) (PDB ID: 4MTJ). (C) An overlay of the homology model of PD with (S)-1,2-propanediol bound (green) and the recently published crystal structure of PD with (S)-1,2-propanediol bound (purple) (PDB ID: 5I2G). The good agreement between the structures indicates that the homology model provides an accurate representation of PD.

The four amino acids G817, C438, E440, and H166 are found in both PD and GD and are essential for activity, but H166 and E440 are absent from GREs not catalyzing dehydration reactions.23 These residues therefore appear to constitute a “dehydratase motif” that is predictive of enzyme function. Using a multiple sequence alignment, we identified this motif in 100 out of

195 uncharacterized clusters in the GRE SSN, hinting that dehydration may be a widespread

63 activity of the GREs in the human gut microbiome. However, because the proposed dehydratase motif is based only on the computational modeling of GD, characterization of PD, and empirical presence in other GREs, there are limitations to this analysis. It cannot be ruled out that an uncharacterized GRE may encode all of these residues but not catalyze dehydration, or it might lack these residues yet still catalyze dehydration. Future work characterizing additional GREs containing this putative dehydratase motif and site-directed mutagenesis of these residues will provide insight into their catalytic function in other GREs.

2.2.8: Comparing abundances of PD and B12-PD in the healthy human gut microbiome

Despite PD’s presence in high abundance in 96% of the HMP stool metagenomes, it remained unclear whether PD or B12-PD was more abundant in the human gut microbiome. The two enzymes catalyze the same dehydration reaction, yet they differ in their sensitivity to oxygen, making it unclear whether one type of enzyme would predominate in the largely anaerobic environment of the healthy human gut. I used ShortBRED to determine the abundance of B12-PD in the 80 HMP stool metagenomes analyzed previously (Figure 2.17A). Although

B12-PD is also widely distributed (present in 87% of samples), PD is significantly more abundant

−4 that B12-PD (P < 10 , Mann-Whitney U test). Furthermore, by comparing the abundance of the two enzymes within each gut metagenome, I found that the median ratio of PD to B12-PD across all samples was 5.1 to 1 (Figure 2.17B). This observation suggests that PD may make a greater contribution to propionate production from L-fucose in the healthy human gut. However, the presence of both enzymes hints that organisms also rely on B12-PD. This enzyme may allow commensals and pathogens, including Salmonella enterica serovar Typhimurium, to process (S)-

1,2-propanediol derived from L-fucose in the gut, particularly during periods of stress or inflammation.63-64

64

Figure 2.17: Comparison of the abundances of PD and B12-PD in stool metagenomes. (A) Abundances of PD and B12-PD in HMP stool metagenomes. PD is found in a slightly higher percentage of metagenomes (96.3% vs. 87.5%) and is significantly more abundant overall. (B) Ratio of PD to B12-PD in each metagenome. Among metagenomes encoding for both enzymes, >75% of samples contained at least twice as many copies of PD compared to B12-PD. The median abundance per sample of genomic copies of PD to B12-PD is 5.2-fold, implying that the GRE-mediated reactivity may be responsible for a greater portion of propanediol dehydration than the B12-dependent process.

2.2.9: Discovery and characterization of trans-4-hydroxy-L-proline dehydratase

In addition, we observed that the Cluster 15 GREs, which were found in high abundance in every stool metagenome analyzed (Figure 2.10B), also contain the characteristic dehydratase motif (Figure 2.3). However, the active site residues of the Cluster 15 and 16 GREs were distinct enough to propose that Cluster 15 was a dehydratase for a different substrate. My collaborator Yolanda Huang (Harvard) and I used sequences from Cluster 15 to search for this

GRE in sequenced genomes, and identified it in >850 sequenced bacterial and archaeal genomes deposited in the National Center for Biotechnology Information (NCBI) genome database.23

Importantly, >97% of sequenced isolates of Clostridium difficile, a prominent human pathogen, encoded a Cluster 15 GRE.

65

The genomic context of this putative dehydratase was crucial for identifying its function.

Dr. Yifeng Wei (MIT) observed that the genes encoding for this GRE and its partner AE are frequently colocalized with a gene encoding for a predicted Δ1-pyrroline-5-carboxylate (P5C) reductase (Figure 2.18A). P5C reductase reduces P5C to L-proline as the final step in L-proline

65 biosynthesis. Therefore, we predicted the nonproteinogenic amino acid trans-4-hydroxy-L- proline (Hyp) was the native substrate of this GRE (Figure 2.18B). Dehydration of Hyp could yield P5C, which would be reduced to L-proline by the P5C reductase present in the gene cluster.

Many Clostridiales can use L-proline as an electron acceptor in amino acid fermentations using known enzymes.66 Certain organisms, including C. difficile, also use Hyp as an electron acceptor, but the enzymes that mediate the process with Hyp have not been identified.67 The pathway proposed here accounts for this metabolic activity by providing a mechanism for microbes to convert Hyp to L-proline.

Figure 2.18: Gene cluster context and biological pathway for t4LHypD. (A) Conserved genomic context of GRE Cluster 15 in Clostridiales. (B) Proposed pathway for anaerobic Hyp metabolism consistent with the colocalization of these genes and the prediction that Cluster 15 GREs catalyze dehydration.

66

Yolanda Huang performed in vitro characterization of the putative Hyp dehydratase

(t4LHypD), its partner activating enzyme (t4LHypD-AE), and the colocalized P5C reductase from C. difficile 70-100-2010, verifying that each enzyme had the expected activity. She first used a spectrophotometric assay to establish P5C reductase activity, showing that it interconverted P5C and L-proline (Figure 2.19). EPR experiments showed that t4LHypD-AE could install a glycine-centered radical on t4LHypD, demonstrating that it behaves similarly to characterized GREs (Figure 2.20A). Finally, assays with activated t4LHypD, P5C reductase,

NADH, and Hyp yielded full conversion of Hyp to L-proline (Figure 2.20B). Crucially, assays lacking P5C reductase or NADH did not yield L-proline but did consume Hyp, indicating that t4LHypD catalyzes the dehydration of Hyp to P5C with subsequent reduction of P5C by P5C

23 reductase. Kinetic parameters of t4LHypD further support the physiological relevance of this

-1 4 -1 -1 68 reaction (kcat = 45 ± 1 s , Km = 1.2 ±0.1 mM, kcat/Km = 3.8 ± 0.3 × 10 M s ) (Figure 2.21).

Improved growth of sequenced Clostridiales isolates encoding t4LHypD in Hyp-containing media, as well as the accompanying consumption of Hyp support the biological relevance of this pathway.23

67

Figure 2.19: Spectrophotometric assay to detect P5CR activity. (A) Overview of assay. Although physiological P5CR reduces P5C, the activity is reversible and can forced in the reverse direction using high concentrations of NAD+ and L-proline. Activity can be monitored by measuring the absorbance at 340 nm, where NADH absorbs. (B) Assays demonstrating that the full assay, including NAD+ and not NADP+, leads to NADH production.

Figure 2.20: Verification of t4LHypD activity. (A) EPR spectrum of the glycine- centered radical in activated t4LHypD. An average of 0.51 ± 0.01 (mean ± SD) glycyl radical per GRE monomer was observed with hyperfine coupling A = 1.44 mT. (B) LC–MS/MS detection of proline produced in vitro. Error bars represent the mean ± SD of three replicates.

68

Figure 2.21: Kinetic analysis of t4LHypD. Conversion of Hyp to P5C was coupled to the reduction of P5C to L-proline, which is accompanied by NADH production. The change in absorbance at 340 nm, which is characteristic of NADH, was monitored to calculate initial rates. Data points and error bars represent mean ± SD with three replicates.

These experiments show that this abundant, universally distributed human gut microbial

GRE is a Hyp dehydratase and define a pathway for anaerobic 4-hydroxyproline metabolism.

This discovery reveals a previously unappreciated host-gut microbe metabolic interaction. Many host and dietary proteins contain Hyp, including collagen, the most abundant host protein.69 In eukaryotes, Hyp is produced posttranslationally by prolyl 4-hydroxylase, a member of the nonheme iron-dependent dioxygenase family. Although C4-hydroxylation of L-proline is the most common posttranslational modification in the human proteome, it is rare in bacteria. In addition, this modification is irreversible by human metabolism; Hyp is instead oxidized to yield

70 glyoxylate without forming L-proline by host proteins. t4LHypD and P5C reductase allow bacteria to reverse this transformation and incorporate L-proline into microbial metabolism. This enzyme is also interesting from an evolutionary perspective. Hyp formation requires molecular oxygen, yet O2 inactivates GREs and was not present during the evolution of ancestral GRE family members. Therefore, t4LHypD likely emerged after the oxygenation of Earth’s

69 atmosphere in response to the evolution of this posttranslational modification in eukaryotic organisms.

2.2.10: Conclusions

My collaborators and I have incorporated knowledge of enzymatic chemistry into quantitative metagenomics, designing and implementing the CGFP strategy. My analysis of the

GRE superfamily in human microbiomes provides both new insights into GREs of known activity, including enzymes linked to human disease, as well as enzymes of unknown activity in these communities. Our strategy revealed intriguing targets for further study, and a combination of bioinformatic experiments proved critical for linking these highly abundant, uncharacterized sequences to corresponding microbial metabolic processes. In particular, the many questions raised by the activity and distribution of t4LHypD illustrate how enzyme discovery efforts can inspire hypothesis-driven microbiome research.

CGFP is novel because it both facilitates the identification of functionally distinct microbial enzymes in complex multi-omics sequence data and prioritizes them for characterization based on their abundance, distribution, and expression. The use of ecological context to guide characterization of unknown enzymes represents an advance compared to methods that focus on targets present in sequenced organisms without consideration of their abundance in vivo. The general strategy may be applied broadly to investigate the chemistry present in microbial communities, and CGFP can used to profile metagenomes and metatranscriptomics obtained from any environment. Moreover, it can be readily extended to identify other types of enzymes, provided that some superfamily members have been biochemically characterized. Further uses of CGFP could uncover novel metabolic interactions both within microbiomes and between microbes and hosts. By expanding our knowledge of

70 microbial enzymes and metabolism, CGFP will advance progress towards a deeper mechanistic understanding of microbiomes.

2.3: Materials and methods

2.3.1: General materials and methods

All chemicals and solvents were purchased from Sigma-Aldrich, except where otherwise noted. Clostridiales strains were purchased from ATCC and DSMZ. Luria-Bertani Lenox (LB) medium was purchased from EMD Millipore or Alfa Aesar. Reinforced Clostridial Medium

(RCM) was purchased from BD Difco. DNA sequencing results and multiple sequence alignments were analyzed with Geneious Pro 7.1.5 or Clustal Omega.71-72 Multiple sequence alignments were visualized with Jalview.73 Primers were purchased from Integrated DNA

Technologies (Coralville, IA). PCR was performed with a MyCycler gradient cycler (Bio-Rad) or a C1000 Gradient Cycler (Bio-Rad). All PCR amplifications were analyzed by 1% agarose gel electrophoresis with ethidium bromide staining in 1×TAE buffer. PCR products and digested

DNA were purified using an Illustra GFX PCR DNA and Gel Band Purification Kit (GE

Healthcare). All plasmid constructs were verified by DNA sequencing (Beckman Coulter

Genomics). All restriction enzymes, ligases, polymerases, and PCR mixes were obtained from

New England Biolabs. SDS-PAGE (4-15% Tris-HCl gel, Bio-Rad) was routinely used to visualize fractions from protein purifications following staining (Biosafe Coomassie, Bio-Rad).

Isopropyl β-D-1-thiogalactopyranoside (IPTG) was obtained from Teknova. Ni-NTA resin was obtained from Qiagen. Water and solvents used for GC–MS and LC–MS were B&J Brand high- purity solvents (Honeywell Burdick & Jackson). All absorbance measurements in 96-well plates were carried out using a PowerWave HT Spectrophotometer (BioTek) inside of an anaerobic chamber (MBraun).

71

Samples were made anaerobic as follows. Solids were brought into anaerobic chambers

(MBraun and Coy Laboratory Products) in perforated 1.7 mL microcentrifuge tubes. Protein solutions with volumes greater than 1 mL were made anaerobic on a Schlenk line with 3 cycles of evacuation on vacuum (5 min) followed by filling with argon (5 min). Protein solutions with volumes less than 1 mL were made anaerobic by flushing the headspace of the solution with argon for 30 min. Buffers and other solutions were made anaerobic in 1 to 20 mL volumes by bubbling argon through the liquid for 30 min. Media solutions of 200 to 300 mL in volume were first microwaved until boiling, followed by bubbling nitrogen through the liquid for 1 h. Inside an anaerobic chamber, aliquots were dispensed in smaller volumes into 16×125 mm Hungate tubes, which were then autoclaved.

2.3.2: Construction of GRE SSNs

Dr. Spencer Peck (Harvard) was primarily responsible for constructing GRE SSNs. All sequence similarity networks were created as described previously.15 In brief, the InterPro family28 IPR004184 (pyruvate formate lyase domain, version 53.0, accessed on October 9, 2015) was used as the input for option B of the Enzyme Function Initiative’s Enzyme Similarity Tool

(found at http://efi.igb.illinois.edu/efi-est). IPR004184 includes every characterized GRE except for RNR. RNR shares little sequence homology with PFL and other GREs and its uniqueness led to its exclusion from IPR004184 and our analysis.74-75 Networks were generated at initial score values of 10−50 and 10−300 with a minimum length of 500 amino acids. To reduce the file size of the networks, the 10−50 and 10−300 networks were downloaded as 60% and 95% representative node networks, respectively, in which collections of proteins with ≥60% or ≥95% ID to one another are represented by single nodes. Nodes with similarity greater than the threshold score are connected by an edge. For the 10−50 network, this initial score value was 27 ± 1% ID, (mean

72

± standard deviation) and for the 10−300 network, this initial edge value was 65 ± 2% ID (mean

± standard deviation). Due to the sheer number of edges within each network, there are still numerous edges that fall well below the initial average edge value (e.g., down to 55% ID in the

E-300 network). Cytoscape 3.2.0 was used to visualize the networks and further refine the threshold scores by raising the minimum identity required for two nodes to be connected.76 By scouring the literature and sequence databases, we connected reported GRE activities to sequences within our SSN. Moreover, all functionally characterized α-ketoacid lyases (PFLs;

>20 instances) contain a catalytically essential Cys-Cys active site motif. Therefore, any sequence containing these residues in a multiple sequence alignment was also annotated as a

PFL. The edge scores within the 10−50 network were made more stringent so that GREs known to have different functions were clustered separately.57,77-78

While this initial analysis separated characterized functions from each other, it was possible that our clusters of proteins were still not isofunctional. We therefore examined the more stringent networks generated from an initial score value of 10−300 and produced a series of networks in which the minimum edge score was set to 55, 60, 62, or 65% ID (Figure 2.2). We used multiple sequence alignments in tandem with construction of homology models of uncharacterized GREs to predict the residues that occupied the active sites of these GREs. At edge identities <62% ID, we noticed significant heterogeneity in the predicted active site residues for a subset of the GREs, including GDE (Figure 2.4). Additionally, the genomic contexts associated with these GREs were strikingly different. GD itself has a relatively minimal operon including regulatory proteins and a single alcohol dehydrogenase.30 By contrast, two of the clusters that separate from GD by raising the edge score to 62% ID contain GREs that are consistently encoded in the vicinity of genes annotated as an alcohol dehydrogenase, aldehyde

73 dehydrogenases, and microcompartment proteins. We suspected that these GREs operated on different substrates, a hypothesis borne out by the later determination that one of these proteins is

PD, which does not act on glycerol. Only by raising the stringency threshold of our network to

62% ID do these diol dehydratases split apart into different clusters (Figure 2.4).

One cautionary note is that we cannot be certain that all of the clusters in our network are isofunctional with the edge score set to 62% ID. To gain further evidence either way, we constructed multiple sequence alignments of the sequences that are grouped with structurally- characterized GREs including PFL, CutC, HPAD, benzylsuccinate synthase (BSS), and GD. We identified functionally important residues in the active sites of these crystal structures and checked to see if they were conserved in other sequences from these clusters. In nearly all cases, the residues present in the crystal structure were universally conserved in the other sequences from the same cluster of the SSN. Deviations invariably resulted in conservative mutations. We therefore think it quite likely that groups of proteins within our 10−300/62% ID network catalyze the same reaction.

2.3.3: Determination of enzyme abundances in metagenomes and metatranscriptomes

ShortBRED was used to determine the abundance of GREs in metagenomes.18 For a set of proteins of interest, ShortBRED groups the proteins at a specified amino acid similarity threshold (here, 85% ID) to identify non-redundant representative sequences. These representative sequences are then compared to a comprehensive, non-redundant protein catalog to identify distinguishing peptide markers among the proteins of interest. Metagenomic reads are then mapped to the identified markers to profile the abundance of their corresponding proteins with high specificity. To find markers to profile the glycyl radical enzyme family, the InterPro family IPR004184 was prepared for study by removing redundant (100% amino acid identity)

74 entries, and all sequences containing fewer than 500 amino acids were discarded. The remaining sequences were used as input for ShortBRED-Identify and UniRef90 (downloaded on September

22, 2015) was used as the comprehensive protein reference catalog.40 ShortBRED-Identify was run with the default parameters.

Once markers were obtained, ShortBRED-Quantify was used to search for these markers in metagenomic datasets generated as part of the Human Microbiome Project (HMP). Shotgun sequencing reads for 378 metagenomes were obtained from the HMP website

(www.hmpdacc.org).9,79 These samples had previously passed quality control assessment and originated from six different body sites: anterior nares, posterior fornix, buccal mucosa, supragingival plaque, tongue dorsum, and stool (reflective of the lower GI tract). The selected metagenomes were restricted to subjects’ first sampling visits, and thus we included no duplicate samples for a given subject within a given body site. ShortBRED-Quantify was also run with the default parameters. The markers file used was the file generated by running ShortBRED-Identify according to the parameters described above.

By default, ShortBRED reports protein abundance in RPKM units (reads mapped per kilobase of coding sequence per million sample reads). I converted these values to units of

“copies per microbial genome” in the following way. First, each ShortBRED RPKM value (C) was converted to an equivalent coverage value (Scov). If H is the number of hits to a given sequence of length L, and R is the average read length for the sample, then coverage can be estimated as:

H  R Scov  L

75

Similarly, the average coverage of a genome in the corresponding sample (Gcov) can be estimated. Taking N as the total number of reads in a sample and AGS as the sample’s average genome size, the coverage can be expressed as:

N  R Gcov  AGS

The AGS for all HMP samples used in this study were previously computed.41 The ratio of Scov/Gcov then serves as a measure of the relative copy number of a sequence of interest among genomes in the sample (“copies per microbial genome”). The computation of this ratio reduces considerably to:

 H  R    S L H H cov     AGS  AGS 109  AGS C 109 G  N  R  L  N  L   N  cov       AGS  103  106 

The factor of 10−9 arises because L and N as used by ShortBRED are in units of kb and million reads respectively.

This analysis outputs a C value for each representative protein sequence. These representatives are chosen by ShortBRED by clustering the input sequences at 85% amino acid identity, so some of these sequences will have the same function. By summing the C values for all protein sequences that belong to the same GRE cluster on the SSN, we obtain C values that measure the abundances of each isofunctional group of GREs. As C and Scov/Gcov are related by the equation above, we can determine the total number of GREs with a particular function per microbial genome in a metagenome.

In addition to the metagenomes from the HMP, the distribution and abundance of GREs were measured in metagenomes and metatranscriptomes in stool samples from eight healthy individuals.49 The metagenomes were analyzed in the same way as above, except that because

76 the average genome sizes of those samples were unavailable, the RPKM value (C) was used instead of Scov/Gcov. Metatranscriptomes can be processed by ShortBRED in the same manner as metagenomes and the RPKM value (C) was used to determine if particular GREs were being transcribed in those datasets. The heatmap shown in Figure 2.7 was assembled with a custom- written script in hclust2 (https://bitbucket.org/nsegata/hclust2).

For analyzing the abundance of B12-PD, the entire InterPro family IPR003206, containing sequences with the “diol/glycerol dehydratase, large subunit” domain, was used as the input for

ShortBRED-Identify with UniRef90 as the reference set. The default parameters for ShortBRED-

Identify were used. The resulting markers were used as the input to ShortBRED-Quantify to the profile the abundances of these sequences in the same 80 HMP stool metagenomes analyzed for

GREs. The default parameters for ShortBRED-Quantify were used. Results were normalized to counts per microbial genome as described above. Counts for all sequences in a given sample were summed to provide the total number of B12-dependent diol dehydratase sequences per microbial genome in that sample. Because the B12-dependent glycerol dehydratase can process

1,2-propanediol as well, we did not attempt to distinguish between members of this family, and our results consequently represent an upper limit for the abundance of B12-PD in this context.

For users interested in applying this workflow to analyze their own SSNs and meta’omics data, we have provided a detailed protocol online at http://scholar.harvard.edu/balskus/metagenomic-profiling. In addition to a protocol, we have included links to necessary softwares, scripts helpful for extracting information from the SSN and merging ShortBRED results, examples of ShortBRED code, and sample input and output files.

77

2.3.4: Statistical analysis

Tukey boxplots in the main text show the median, first quartile (Q1), and third quartile

(Q3). Whiskers are extended to include data points between Q1–1.5(Q3–Q1) and Q1 and between

Q3 and Q3+1.5(Q3–Q1) (the lower and upper inner fences, respectively). Values outside of this range are individually marked with dots.

I determined the statistical significance of enrichments of selected GREs in human microbiomes using Kruskal–Wallis (KW) tests (the nonparametric analog of one-way ANOVA) followed by Dunn’s multiple comparisons test. For each protein of interest, the protein’s normalized abundance across all 378 metagenomes was used as input for the KW test, treating sample body site as the single categorical label. As five such tests were performed (for the enzymes PFL, CutC, 4HPAD, PD, and t4LHypD), we adopted a conservative significance threshold of two-tailed P < 0.001. A Mann-Whitney U test (unpaired) was used to compare the normalized abundances of PD and B12-PD in the 80 stool metagenomes from the HMP with a significance threshold of P < 0.001.

2.3.5: Cloning of expression plasmids for PD and PD-AE

The genes encoding PD (UniProt ID: Q1A666) and PD-AE (UniProt ID: Q1A665) were

PCR amplified from R. inulinivorans DSM 16841 genomic DNA (DSMZ, Braunschweig,

Germany) using the primers shown in Table 2.1. Reactions (total volume 20 µL) for amplifying the gene encoding for PD-AE contained gDNA (1.3 ng), forward primer (10 µM), reverse primer

(10 µM), DMSO (0.3 µL, 1.5%), MgCl2 (0.6 mM), and Phusion High-Fidelity Master Mix

(10 µL) (New England Biolabs). Reactions (total volume 50 µL) for amplifying the gene encoding for PD contained gDNA (100 ng), forward primer (2 µM), reverse primer (5 µM),

DMSO (0.3 µL, 1.5%), MgCl2 (0.6 mM), and Phusion High-Fidelity Master Mix (2 µL) (New

78

England Biolabs). PCR parameters were as follows: initial denaturation (98 °C for 30 s), 35 cycles of denaturation (98 °C for 30 s), annealing at (60 °C for 30 s), and extension (72 °C for 90 s [PD] or 120 s [PD-AE]), and a final extension (72 °C for 10 min). Amplified fragments were analyzed by agarose gel electrophoresis and purified.

The fragment encoding for PD-AE was digested with NheI and XhoI for 2.5 hours at

37 °C; digests contained amplified fragment (20 µL), NheI (1.5 µL, 30 U total), XhoI (1.5 µL, 30

U), BSA (10×) (3 µL), NEB Buffer 1 (10×) (3 µL), and nuclease-free water (1 µL). The fragment encoding for PD was digested with KpnI and Xho for 1 hour at 23 °C; digests contained amplified fragment (44 µL), KpnI (2 µL, 40 U), XhoI (2 µL, 40 U), BSA (10×) (6 µL), and NEB Buffer 2 (10×) (6 µL). Restriction digests were purified and ligated into linearized expression vectors using T4 DNA ligase (New England Biolabs). The N-His6-tagged construct of

PD-AE was formed by incubating at 16 °C for 18 hours digested insert DNA (3 µL), digested pET-28a (1 µL), 1 µL T4 ligase buffer (10×) (1 µL), T4 DNA ligase (1 µL, 400 U), and nuclease-free water (3 µL). The C-His6-tagged construct of PD was formed by incubating at

16 °C for 18 hours digested insert DNA (7 µL), digested pET-29b (10 µL), T4 ligase buffer

(10×) (2 µL), and T4 DNA ligase (1 µL, 400 U). 6 µL of the PD-AE ligation and 10 µL of the

PD ligation were used to transform E. coli TOP10 competent cells (Invitrogen). The identities of the resulting constructs were confirmed by sequencing purified plasmid DNA. Plasmid DNA was purified with a QIAprep Spin Miniprep Kit (Qiagen) and transformed into chemically competent E. coli BL21 (DE3) cells (Invitrogen). After transformations, cells were stored at

−80 °C as frozen LB/glycerol stocks.

79

Table 2.1: Primers used for cloning and site directed mutagenesis. Primer Name Sequence (5ʹ to 3ʹ) PD-pET29b-F CGGGGTACCATGGGAAATTATGATAGTACTCC PD-pET29b-R CCGCTCGAGTCGATTATCAGCCTGTTCTG PDAE-pET28a-F TTAAGCTAGCATGAAAGAATATTTGAATACATCCGGC PDAE-pET28a-R TAACTCGAGTTAACCACCAATCTGGCAGTGTAATG PD-G817A-F CGTACGTGTAGCCGCATATTCTGCTTTG PD-G817A-R CGATAACGTAGTAAACAAAGCAGAATATGC PD-C438S-F CAACATCATCGGATCTGTAGAACCGCAGG PD-C438S-R CTGTTTTACCCGGAACCTGCGGTTCTACAG PD-E440Q-F CATCGGATGTGTACAACCGCAGGTTCCGG PD-E440Q-R CCCGGAACCTGCGGTTGTACACATCCGATG PD-H166Q-F GGTGTAGGACAGGTAACAGTTCAGTATG PD-H166Q-R CAATACGGTTTCATACTGAACTGTTACCTG GTATACACCCCCTCCCCGGATAAAGTCGCCGCCCTCCA EcP5CR-aac(3)IV-F TGTGTAGGCTGGAGCTGCTTC AATGGTGGTGCCTCCCGGTGAGCAGACCATATCTTTCA EcP5CR-aac(3)IV-R GCCGGCCTTTGAATGGGTTCATG CCTGGTGCCGCGCGGCAGCCATATGGCAAGAGGAACT CdHypD-F TTTGAGAGAACTAAAAAATTAAG GGTGCTCGAGTGCGGCCGCAAGCTTTAGAATGTTTGCT CdHypD-R CAGTTCTTCCTATTATCTCATC CCTGTATTTTCAGGGCGCCCATATGAATCCATTAGTTA CdHypDAE-F TAAACTTACAAAAATGTAGC GCCCGTTTGATCTCGAGTGCGGCCGTTACCCTCCAATC CdHypDAE-R TTAGTATTGAAATTACTTGC CCTGGTGCCGCGCGGCAGCCATATGAAAACTTTAGGA CdP5CR-F TTTATTGGTTCAGGG GGTGCTCGAGTGCGGCCGCAAGCTTATTTACTCATATC CdP5CR-R TTTAGACTTATCTATACAAGC

2.3.6: Site-directed mutagenesis of PD

Plasmids encoding for the C438S and H166Q mutants of PD were constructed as follows.

The pET-29b vector encoding for PD constructed previously was used as the template. The template (10 ng), forward and reverse primers (125 ng, Table S1), and 2× concentrated Phusion-

HF Polymerase Master Mix (25 µL) were combined and brought to a total volume of 50 µL with the addition of nuclease-free water. Amplification was performed with initial denaturation

(95 °C, 30 s), 18 cycles of denaturation (95 °C, 30 s), annealing (55 °C, 1 min), and extension

80

(68 °C, 8 min), and a final extension step (68 °C, 8 min). DpnI (1 µL) was added to each reaction and the reactions were incubated for 1 h at 37 °C. Additional DpnI (1 µL) was added and the reactions incubated at room temperature overnight. The digested PCR product (2 µL) was used to transform chemically competent E. coli TOP10 cells. Plasmid DNA was isolated and sequenced to confirm the identities of these constructs.

The G817A mutant was constructed in the same manner as above, except that 0.5 µM of each primer (Table 2.1) and 10 µL of polymerase in a total volume of 20 µL was used for the

PCR. In addition, amplification was performed with initial denaturation (98 °C, 30 s) °C, 21 cycles of denaturation (98 °C, 30 s), annealing (70 °C, 1 min), and extension (72 °C, 8 min), and a final extension step (72 °C, 8 min).

The E440Q mutant was constructed in the same manner as the G817A mutant, except that 100 ng of template DNA was used in the PCR. Amplification was performed with initial denaturation (98 °C, 30 s), 21 cycles of denaturation (98 °C, 20 s), annealing (57.8 °C, 40 s), and extension (72 °C, 320 s), and a final extension step (72 °C, 8 min). The digested PCR product

(5 µL) was used to transform chemically competent E. coli TOP10 cells.

Mutant constructs were heterologously expressed in E. coli BL21 (DE3) in the same manner as wild-type PD (see below).

2.3.7: Overexpression and purification of PD

An LB with 50 µg mL-1 kanamycin (LB-Kan50) was streaked with a frozen stock of E. coli BL21 (DE3) cells transformed with the plasmid pET-29b-PD. A single colony from each plate was inoculated into 50 mL of LB-Kan50, which was grown overnight at 37 °C.

The starter culture was diluted 1:100 into a 4 L Erlenmeyer baffled flask containing 2 L of LB-

Kan50, incubated at 37 °C with shaking at 175 rpm, moved to 25 °C at OD600 ≈ 0.3, induced

81 with IPTG (0.5 mM) at OD600 ≈ 0.6, and incubated at 25 °C with shaking at 175 rpm for 12 h.

Cells were harvested by centrifugation (6,730×g, 10 min) and resuspended in 80 mL of PD lysis buffer (50 mM Tris-HCl pH 7.0, 150 mM NaCl, 10 mM MgCl2). All subsequent steps were performed at 4 °C unless otherwise specified. Cells were lysed by two passages through a cell disruptor (Avestin EmulsiFlex-C3) at ~6,000 psi. The lysate was clarified by centrifugation

(28,800×g, 30 min). The supernatant was supplemented with imidazole (5 mM final concentration) and incubated with Ni-NTA resin (2.5 mL) for 2 h at 4 °C. The mixture was centrifuged (1,811×g, 5 min) and the supernatant removed. The Ni-NTA resin was resuspended in 4 mL of lysis buffer supplemented with 5 mM imidazole and loaded onto a column. Protein was eluted from the column using a stepwise gradient of imidazole in lysis buffer (25 mM, 50 mM, 75 mM, 100 mM, 125 mM, 150 mM, 200 mM) in 4 mL fractions. SDS-PAGE analysis was employed to determine the presence and purity of protein in each fraction. Fractions containing

PD were combined, transferred to Slide-a-Lyzer™ dialysis cassettes with a 20 kDa MWCO

(Thermo Scientific), and dialyzed twice against 2 L of PD dialysis buffer (50 mM Tris-HCl pH

7.0, 150 mM NaCl, 10 mM MgCl2). After dialysis the solutions were concentrated using a Spin-

X® UF 20 mL centrifugal concentrator with a 30 kDa MWCO membrane (Corning®) to ~4.5 mL.

Concentrated enzyme was frozen in N2 (l) and stored at –80 °C. This procedure yielded about

7 mg L-1 of PD (Figure 2.22). The concentrations of PD and mutants were determined with a

NanoDrop 2000 UV-Vis Spectrophotometer (Thermo Scientific) using an extinction coefficient of 90,495 M-1 cm-1 as calculated by Geneious.71

82

Figure 2.22: SDS-PAGE of purified PD, PD mutants, and PD-AE. Precision Plus Protein™ All Blue Standards (Bio-Rad) (lanes 1 and 8), PD-AE (lane 2), wild- type PD (lane 3), PD-G817A (lane 4), PD-C438S (lane 5), PD-E440Q (lane 6), and PD-H166Q (lane 7).

2.3.8: Overexpression and purification of PD-AE

PD-AE was expressed in chemically competent E. coli BL21 (DE3) cells (Invitrogen) that had been co-transformed with the plasmid pET-28a-PDAE and the plasmid pPH149 containing the E. coli IscSUA-HscBA-Fd genes.80 An LB agar plate containing 50 µg mL-1 kanamycin (Kan50) and 50 µg mL-1 chloramphenicol (Cam50) was streaked with a frozen stock of the aforementioned E. coli cells. A single colony was used to inoculate 50 mL of LB-Kan50-

Cam50 and the culture was grown overnight at 37 °C. The saturated starter culture was diluted

1:100 into a screw-capped 2.8 L Fernbach flask containing 2 L of LB-Kan50-Cam50 supplemented with 2 mM iron(III) ammonium citrate and 0.5% (w/v) glucose. The culture was incubated at 37 °C with shaking at 175 rpm until OD600 ~0.3, at which point it was cooled to room temperature and sparged with N2 for ~30 min. Sodium fumarate and L-cysteine were then added as filter sterilized aqueous solutions to give final concentrations of 20 mM and 2 mM, respectively. Sparging was continued until OD600 ≈ 0.6, at which point expression was induced

83 by addition of IPTG (0.5 mM). The flask was capped tightly, sealed with electrical tape, and incubated at 15 °C with shaking at 50 rpm for 18 h.

All purification steps were performed in an anaerobic chamber at 4 °C (Coy Laboratory

Products), with the exception of centrifugation. Cells from the 2 L culture were harvested by centrifugation in 250 mL polypropylene centrifuge tubes with plug seal caps (6,730×g, 30 min) and resuspended in 50 mL anoxic PD-AE lysis buffer (50 mM HEPES pH 8.0, 200 mM NaCl, 10 mM MgCl2). 8 mg of chicken egg lysozyme and 5 mM dithiothreitol (DTT) were added to this mixture. This mixture was incubated for 1 h with occasional agitation. Cells were lysed by sonicating with a ½ inch horn at 25% amplitude for 10 min (10 s on followed by 30 s off) while being kept in a water bath at 4 °C. The resulting suspension became dark grey during lysis. The lysate was clarified by centrifugation (17,090×g, 30 min) and the supernatant was supplemented with 5 mM imidazole and incubated with 2 mL of anoxic Ni-NTA resin (previously sparged with

N2) for 2 h at 4 °C with manual agitation every 15 min. The mixture was transferred to 50 mL centrifuge tubes and the tubes were sealed with plug seal caps and electrical tape. The tubes were removed from the anaerobic chamber, centrifuged (3,220×g, 5 min), and returned to the anaerobic chamber, where the supernatant was discarded. The Ni-NTA resin was resuspended inside the anaerobic chamber in PD-AE lysis buffer (2 mL) containing 5 mM imidazole and loaded into a column. Protein was eluted from the resin using a stepwise imidazole gradient in

PD-AE lysis buffer (10 mM, 25 mM, 100 mM, 200 mM), collecting fractions of sizes 2, 4, 2, 2, and 6 mL, respectively. SDS–PAGE analysis was employed to ascertain the presence and purity of protein in each fraction. Fractions containing PD-AE were combined and stirred slowly for 16 h with 2 mM DTT, 0.2 mM Na2S•9H2O, and 0.25 mM Fe(NH4)2(SO4)2•6H2O to reconstitute the

[4Fe–4S] clusters. Subsequently, the solution was filtered through a 25 mm, 0.22 μm pore-size

84

Acrodisc syringe filter with HT Tuffryn Membrane (Pall Life Sciences) to remove particulates.

This solution was dialyzed with three 1 L batches of anoxic PD-AE dialysis buffer (50 mM

HEPES pH 8.0, 50 mM NaCl, 10 mM MgCl2) in Slide-a-Lyzer™ dialysis cassettes with a 10 kDa MWCO (Thermo Scientific). After dialysis, the enzyme was spin concentrated using a Spin-

X® UF 6 mL centrifugal concentrator with a 10 kDa MWCO membrane (Corning®) sealed inside a 50 mL plug-seal conical tube. The concentrated protein solution was aliquoted into 0.5 mL cryogenic vials and placed in 18×150 mm Hungate tubes (Chemglass). The tubes were sealed with butyl stoppers and aluminum seals and frozen in N2 (l). This procedure yielded about 6 mg

L-1 of PD-AE (Figure 2.22). The concentration of PD-AE was determined with a NanoDrop

2000 UV-Vis Spectrophotometer (Thermo Scientific) using an extinction coefficient of 23,755

M-1 cm-1 as calculated by Geneious.71

2.3.9: Construction of PD homology model

Jonathan Marks (Harvard) constructed a homology model for PD using a previously described approach.57,81 In brief, Modeller v9.12 was used to generate 100 randomly-seeded homology models of PD.82-83 To generate these models, the structures of 1,2-propanediol-bound

GD (PDB ID: 2F3O)31 and of a GRE of unknown function from Archaeoglobus fulgidus (PDB

ID: 2F3O) were used as templates.75 Following model generation using the automodel algorithm, models were optimized using Modeller’s built-in conjugate gradient optimization algorithm set to the highest setting and optimization was repeated twice per model. Validation of the optimal model was performed as described previously.57,81 The PD homology model is consistent with known GRE structures (Figure 2.15 and Figure 2.16), and is highly similar (RMSD <0.56 Å, Cα alignment of 682 residues) to the recently published crystal structure of PD with (S)-1,2- propanediol bound (PDB ID: 5I2G).60

85

(S)-1,2-Propanediol and (R)-1,2-propanediol were docked into the PD homology model using a previously described approach.57,81 In brief, Glide84-86 and Prime87-88, components of

Schrödinger Suite 2013, were used for active site refinement and induced-fit docking of calculated conformations of the two substrates. In addition to the validation methods described previously,57,81 the homology models are consistent with the recently published crystal structure of PD with (S)-1,2-propanediol bound (PDB ID: 5I2G) (Figure 2.16).60

2.3.10: Construction of a t4LHypD homology model

t4LHypD homology model construction was performed by Yolanda Huang. The t4LHypD amino acid sequence from C. difficile 70-100-2010 (UniProt ID: A0A031WDE4) was used as the query for an HHpred analysis to identify templates in the PDB database (pdb70_03Jun16) for homology modelling.83,89 The multiple sequence alignment generation was carried out using the

HHBlits method.83,90

The hits with the highest secondary structure scores were the characterized GREs PD

(PDB ID: 5I2A),60 GD (PDB ID: 1R9D),31 and CutC (PDB ID: 5A0U).91 A homology model of

82-83 t4LHypD was constructed using Modeller v9.16 with GD (1R9D) as the template. The t4LHypD homology model and the template GD structure are highly similar (RMSD = 0.141 Å,

Cα alignment of 789 (t4LHypD) vs 786 (GD) residues). The conserved Gly and Cys residues involved in glycyl radical formation and H-atom abstraction of substrate were identified as G765 and C434 in t4LHypD, both of which map to the positions of G763 and C433 in GD, respectively

(Figure 2.3). Notably, the “dehydratase motif” H160/E436 residue pair in t4LHypD is located in the active site in the homology model and aligns well with H164 and E435 of GD.

86

2.3.11: Compilation of sequenced prokaryotes encoding t4LHypD

I employed UCLUST to generate a set of representative amino acid sequences of t4LHypD with clustering at 60% ID using all t4LHypD sequences from the SSN (Cluster 15,

>62% ID) as the input.92 Representative sequences were queried against the NCBI non- redundant protein sequence database using the blastp algorithm (performed on 2016-02-16).93

The top 500 unique amino acid sequence hits were aligned using the ClustalW alignment tool in

Geneious.71 A total of 391 sequences were identified from the alignment that contained all of the predicted active site residues present in the t4LHypD homology model. A total of 850 deposited genomes encoding these sequences were retrieved from the NCBI database. Three additional organisms encoding t4LHypD were identified from the t4LHypD cluster in the SSN (InterPro database) and were added to those identified from the NCBI database to yield a total of 853 genomes.

2.3.12: Cloning of expression plasmids for t4LHypD, t4LHypD-AE, and P5C reductase

Cloning of these expression vectors was performed by Dr. Yifeng Wei and Yolanda

Huang. Genomic DNA was purified from C. difficile 70-100-2010 (BEI Resources, USA) and was used for PCR amplification of genes encoding t4LHypD (UniProt ID: A0A031WDE4), t4LHypD-AE (UniProt ID: A0A069AMK2), and P5CR (UniParc ID: UPI000235AE56). Primers used to amplify these genes are shown in Table 2.1. PCR was carried out using Phusion-HF polymerase according to the manufacturer’s protocol. PCR products of t4LHypD and P5CR were purified and ligated into pET-28a to yield N-terminal His6-tagged constructs. t4LHypD-AE was

94 ligated into pSV272-PfMBP to yield an N-terminal His6-tagged maltose-binding protein (MBP) fusion construct containing a TEV linker, PfMBP-CdHypDAE. The NdeI cleavage site

(CATATG) at the 5ʹ start codon of the His6 tag in pSV272-PfMBP had been mutated to

87

CACATG by site-directed mutagenesis to prevent digestion by NdeI and removal of the His6- tagged MBP during cloning. Both pET-28a and pSV272-PfMBP were linearized with NdeI and

EagI. All vectors were constructed by Gibson isothermal assembly according to the manufacturer’s protocol to yield pET-28a-CdHypD, pET-28a-CdP5CR, and pSV-PfMBP-

CdHypDAE. Plasmids were transformed into chemically competent E. coli BL21-

CodonPlus(DE3)-RIL ΔproC::aac(3)IV(ApramycinR or AmR) and LB glycerol stocks were stored at –80 °C.

2.3.13: Generation of a P5CR deletion mutant in E. coli

Yolanda Huang (Harvard) constructed a P5CR deletion mutant of E. coli BL21(DE3)-

CodonPlus-RIL for overexpression of t4LHypD and t4LHypD-AE. Deletion of proC from the protein overexpression strain was necessary to remove contaminating E. coli P5CR activity in purified t4LHypD and t4LHypD-AE solutions. E. coli BW25113/pKD46 (Coli Genetic Stock

Center, #7739) was mutated to E. coli BW25113 ΔproC::aac(3)IV(AmR) to be used as the donor strain in P1 transduction. E. coli BL21(DE3)-CodonPlus-RIL was used as the recipient strain to generate E. coli BL21-CodonPlus(DE3)-RIL ΔproC::aac(3)IV(AmR) for protein overexpression.

Apramycin resistance marker (aac(3)IV) was PCR amplified from vector pIJ77395 to contain regions flanking the proC gene in E. coli BW25113 for homologous recombination

(Table 2.1). PCR mixes contained Phusion-HF PCR Master Mix, primers, and purified pIJ773.

PCR product was excised and purified from 1% agarose gel. The purified solution (50 µL) was digested with 1 µL DpnI and 6 µL CutSmart buffer (New England Biolabs) for 1 h at 37 °C. The digested PCR product was purified and stored at −20 °C.

A frozen glycerol stock of E. coli BW25113/pKD46 was streaked onto LB agar plate containing 100 µg mL-1 ampicillin (Amp100) and incubated at 30 °C overnight. A single colony

88 was inoculated into 2 mL LB-Amp100 and grown overnight at 30 °C. This overnight culture was diluted 1:100 into 5 mL LB-Amp100 and 10 mM arabinose to prepare electrocompetent cells.

This culture was incubated at 30 °C for 4 h with shaking at 220 rpm before harvesting by centrifugation (3,220 g, 10 min, 4°C). The cell pellet was washed twice with 10 mL and then once with 5 mL chilled 10% (v/v) glycerol, with pelleting in between (3,220×g, 10 min, 4 °C).

110 ng aac(3)IV PCR product was added to 100 µL cell resuspension and incubated on ice for 1 min. Cells were electroporated in a 1 mm at 1.8 kV for 5.1 ms. Electroporated cells were resuspended in 1 mL LB medium and recovered at 37 °C for 3 h. The recovered cells were plated onto LB agar plates containing 25 µg mL-1 Am and grown at 37 °C. Colonies were screened for successful homologous recombination by colony PCR.

P1 transduction was performed to generate E. coli BL21-CodonPlus(DE3)-RIL

ΔproC::aac(3)IV(AmR) strain.96 An E. coli BW25113 ΔproC::aac(3)IV(AmR) starter culture was diluted 1:100 into 2 mL LB medium containing 5 mM CaCl2, 10 mM MgCl2, and 0.2% (w/v) glucose, and incubated at 37 °C, 200 rpm for 2 h. Two drops of P1 lysate were added to the culture, which was incubated at 37 °C with shaking at 200 rpm until the culture clarified. 50 µL of chloroform was added to the tube and the tube was vortexed. Cells were pelleted (3,220×g, 10 min) and P1 lysate supernatant was collected and stored at 4 °C. Cells from an overnight culture of E. coli BL21-CodonPlus(DE3)-RIL were resuspended in 550 µL of buffer (100 mM MgCl2, 5

−1 −4 mM CaCl2). 10 to 10 serial dilutions of P1 lysate of E. coli BW25113

R ΔproC::aac(3)IV(Am ) were prepared with LB containing 5 mM CaCl2. 100 µL of resuspended

CodonPlus cells was mixed with 100 µL of P1 lysate at each dilution and all samples were incubated at 37 °C for 20 min. 1 mL LB medium with 50 mM sodium citrate was added to each solution and cells were recovered at 30 °C for 2 h. Cells were harvested and washed with 100 µL

89

LB containing 50 mM sodium citrate twice. Resuspended cells were plated on LB-Cam25-Am25 agar plates and were incubated at 37 °C overnight. Colonies were restreaked on LB-Cam25-

Am25 agar plates three times to remove any remaining P1 phage. The proC deletion was confirmed by colony PCR and sequencing.

2.3.14: Overexpression and purification of t4LHypD

Yolanda Huang (Harvard) carried out the overexpression and purification of t4LHypD. A frozen stock of E. coli BL21-CodonPlus(DE3)-RIL ΔproC::aac(3)IV(AmR) transformed with pET-28a-CdHypD was streaked onto an LB-Kan50-Cam25-Am25 agar plate. A single colony was inoculated into 100 mL LB-Kan50-Cam25-Am25 and was grown overnight at 37 °C with shaking at 175 rpm. The starter culture was diluted 1:50 into 4 L Erlenmeyer baffled flasks containing 2 L LB-Kan50-Cam25. Cultures were incubated at 37 °C at 175 rpm until

OD600 = 0.6. t4LHypD overexpression was induced by addition of 0.1 mM IPTG and cultures were incubated at 25 °C for 20 h at 200 rpm. Cells were harvested by centrifugation (7,900×g,

20 min, 4 °C). Cell pellets from 2 L cultures were transferred into 50 mL conical tubes, flash frozen with N2 (l), and stored at −80 °C prior to protein purification.

All purification steps were carried out at 4 °C. 30 mL HypD lysis buffer (20 mM Tris-

HCl pH 7.5, 200 mM KCl, 5 mM 2-mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride,

0.2 mg mL-1 lysozyme, 1% (w/v) streptomycin sulfate) was added to each frozen cell pellet. Cell pellets were incubated with lysis buffer for 30-60 min until completely thawed on a mixer.

Resuspended cells were lysed by passaging three times through a cell disruptor (Avestin

EmulsiFlex-C3) at 10,000 psi. Cell debris was removed by centrifugation (20,000×g, 20 min).

Clarified lysate was incubated with 10 mL TALON metal affinity resin (Clontech) for 30 min with mixing. The resin was transferred onto a column and the flow through was passed through

90 the resin again. The resin was washed with 40 mL HypD wash buffer (20 mM Tris-HCl pH 7.5,

200 mM KCl, 5 mM 2-mercaptoethanol). Protein was eluted using 3 mL HypD wash buffer supplemented with 5 mM and 60 mL HypD wash buffer supplemented with150 mM imidazole.

Samples were collected at every step of the purification for analysis of purity by SDS-PAGE.

Fractions containing protein, typically both the 5 mM and 150 mM imidazole fractions, were combined, concentrated, and dialyzed twice in HypD dialysis buffer (25 mM Tris-HCl pH 7.5,

50 mM KCl, 5 mM DTT) overnight. The dialyzed protein solution was concentrated using a

Spin-X® UF 20 mL 30 kDa MWCO concentrator (3,220×g, 4 °C). Concentrated protein solution was incubated on ice and rendered anaerobic using a Schlenk line as described in the general materials and methods section. In an anaerobic chamber (Coy Laboratory Products), purified protein solution was passed through a 0.22 μm pore-size Acrodisc syringe filter to remove precipitates and aliquoted into 0.5 mL cryogenic vials that were then placed in 18×150 mm

Hungate tubes that were sealed with butyl stoppers and crimped aluminum seals. Hungate tubes

-1 were frozen with N2 (l) and stored at −80 °C. An average yield of 60 mg L culture was obtained for t4LHypD based on protein concentration determined by Nanodrop (Figure 2.23). An extinction coefficient of 78,300 M-1 cm-1 was calculated using the ProtParam tool.97

91

Figure 2.23: SDS-PAGE of purified t4LHypD, t4LHypD-AE, and P5CR. Precision Plus ProteinTM All Blue Standards (Bio Rad) (lane 1), t4lHypD (lane 2), MBP-t4lHypD-AE (lane 3), and P5CR (lane 4).

2.3.15: Overexpression and purification of t4LHypD-AE

Yolanda Huang (Harvard) carried out the overexpression and purification of

R t4LHypD-AE. A frozen stock of E. coli BL21-CodonPlus(DE3)-RIL ΔproC::aac(3)IV(Am ) transformed with pSV272-CdHypDAE was streaked onto an LB-Kan50-Cam25-Am25 agar plate. A single colony was inoculated into 85 mL LB-Kan50-Cam25-Am50, and was grown overnight at 37 °C with shaking at 175 rpm. The starter culture was diluted 1:100 into 4 L

Erlenmeyer baffled flasks each containing 2 L LB-Kan50-Cam25. Cultures were incubated at

37 °C with shaking until OD600 = 0.6. t4LHypD-AE overexpression was induced by addition of

0.1 mM IPTG and cultures were incubated at 15 °C for 16 h. Cells were harvested by centrifugation (7,900×g, 20 min, 4 °C).

All purification steps were carried out at 4 °C under aerobic conditions until [4Fe–4S] cluster reconstitution. 30 mL HypD lysis buffer was added to each cell pellet harvested from a 2

L culture. Cell pellets were incubated with lysis buffer for 30 min on a mixer until homogeneous.

92

Resuspended cells were lysed by passaging two times through a cell disruptor (Avestin

EmulsiFlex-C3) at 10,000 psi. Cell debris was removed by centrifugation (20,000×g, 20 min,

4 °C). Clarified lysate was passed over 7 mL TALON metal affinity resin by gravity, and the resin was washed with 35 mL HypD wash buffer. Protein was eluted using 35 mL HypD wash buffer supplemented with 5 mM imidazole and 80 mL HypD wash buffer supplemented with 150 mM imidazole. Samples were collected at every step of purification for analysis of purity by

SDS-PAGE. Fractions containing protein, typically the 150 mM imidazole eluates, were combined, concentrated, and dialyzed twice in HypD-AE dialysis buffer (20 mM Tris-HCl pH

7.5, 100 mM KCl, 5 mM DTT) overnight. The dialyzed protein solution was concentrated using

20 mL Spin-X® UF 10 kDa MWCO concentrators (3,220×g, 4 °C). The concentrated protein solution was rendered anaerobic using a Schlenk line as described in the general materials and methods section. In an anaerobic chamber (Coy Laboratory Products), t4LHypD-AE was incubated with 10 mM DTT, 12 equiv Na2S•9 H2O to protein, and 12 equiv Fe(NH4)2(SO4)2•6

H2O to protein for 12 h to reconstitute the [4Fe–4S] clusters. Reconstituted t4LHypD-AE was buffer exchanged into anoxic storage buffer (25 mM Tris-HCl pH 7.5, 100 mM KCl, 5 mM

DTT) by repeatedly concentrating and diluting in 6 mL Spin-X® UF 30 kDa MWCO concentrators (3,220×g, 4 °C). The concentrated protein solution was passed through a 0.22 μm pore-size Acrodisc syringe filter to remove precipitates. t4LHypD-AE was frozen and stored as

-1 previously described for t4LHypD. An average yield of 6.5 mg L culture was obtained for t4LHypD-AE based on protein concentration determined by Nanodrop (Figure 2.23). An extinction coefficient of 109,550 M-1 cm-1 was calculated using ProtParam tool.97

93

2.3.16: Overexpression and purification of P5CR

Yolanda Huang (Harvard) performed the overexpression and purification of P5CR. A frozen stock of E. coli BL21-CodonPlus(DE3)-RIL ΔproC::aac(3)IV(AmR) transformed with pET-28a-CdP5CR was streaked onto an LB -Kan50-Cam25-Am25 agar plate. A single colony was inoculated into 5 mL LB-Kan50-Cam25-Am25 and was grown overnight at 37 °C with shaking at 175 rpm. This overnight culture was diluted 1:100 into 50 mL LB-Kan50-Cam25-

Am25 and was grown overnight at 37 °C with shaking at 175 rpm. The starter culture was diluted 1:100 into 4 L Erlenmeyer baffled flasks containing 2 L LB-Kan50-Cam35-Am25.

Cultures were incubated at 37 °C at 190 rpm until OD600 = 0.5. P5CR overexpression was induced by addition of 0.5 mM IPTG and cultures were incubated at 25 °C for 16 h. Cells were harvested by centrifugation (7,900×g, 20 min, 4 °C). Cell pellets from 2 L cultures were transferred into 50 mL conical tubes, flash frozen with N2 (l), and stored at −80 °C prior to protein purification.

All purification steps were carried out at 4 °C. 30 mL HypD lysis buffer was added to each frozen cell pellet. Cell pellets were incubated with lysis buffer for 30-60 min on a mixer until homogeneous. Resuspended cells were lysed by passaging three times through a cell disruptor (Avestin EmulsiFlex-C3) at 10,000 psi. Cell debris was removed by centrifugation

(20,000×g, 20 min). Clarified lysate was slowly passed over 5 mL TALON metal affinity resin.

Resin was washed with 35 mL HypD wash buffer. Protein was eluted using 25 mL HypD wash buffer supplemented with 5 mM imidazole and 60 mL HypD wash buffer supplemented with 150 mM imidazole. Samples were collected at every step of purification for analysis of purity by

SDS-PAGE. Fractions containing protein, the 150 mM imidazole eluates, were combined, concentrated, and dialyzed twice in P5CR dialysis buffer (20 mM Tris-HCl pH 7.5, 100 mM

94

KCl, 5 mM DTT) overnight. Dialyzed protein solution was concentrated using Spin-X® UF 20 mL 10 kDa MWCO concentrators (3,220×g, 4 °C). The concentrated protein solution was rendered anaerobic using a Schlenk line as described in the general materials and methods section. Purified P5CR was frozen and stored as previously described for t4LHypD. An average yield of 39 mg L-1 culture was obtained for P5CR based on protein concentration determined by

Bradford assay using BSA as an external standard (Figure 2.23).

2.3.17: Glycyl radical detection and quantification by EPR spectroscopy

Wild-type PD and mutants were prepared for EPR spectroscopy as follows. All assay concentrations are final concentrations. The enzyme was rendered anoxic by sparging the headspace with argon for 30 min. In an anaerobic chamber (Coy Laboratory Products), DTT (10 mM), 5-deazariboflavin (100 µM), and PD-AE (100 µM) were incubated together in buffer (25 mM Tris-HCl pH 8.0, 40 mM NaCl) for 20 min under ambient light. PD (100 µM total monomer concentration) and S-adenosylmethionine (SAM) (500 µM) were then added. After 2 h, the samples were diluted from 100 µL to 250 µL into the same buffer before analysis.

t4LHypD was prepared for EPR spectroscopy as follows. All assay concentrations are final concentrations. Anoxic t4LHypD and t4LHypD-AE aliquots were brought into an anaerobic chamber (MBraun). Acriflavine (0.1 mM) and t4LHypD-AE (60 µM) were incubated together in buffer (20 mM Tris-HCl pH 7.5, 100 mM KCl, 50 mM bicine) for 20 min about 10 inches under light. t4LHypD (15 µM total monomer concentration) and SAM (1.5 mM) were then added and the solution incubated for 2 h. The entire volume of 220 µL for each sample was used for analysis.

Perpendicular mode X-band EPR spectra were recorded on a Bruker ElexSysE500 EPR instrument fitted with a quartz dewar (Wilmad Lab-Glass) for measurements at 77 K. All

95 samples were loaded into EPR tubes with 4 mm outer diameter and 8” length (Wilmad Lab-

Glass, 734-LPV-7), sealed, and frozen in N2 (l). Data acquisition was performed with Xepr software (Bruker). The magnetic field was calibrated with an external standard of α,γ- bisdiphenylene-β-phenylallyl (BDPA), g = 2.0026 (Bruker). The experimental spectra for the glycyl radicals were modeled with EasySpin (Version 5.0.22)98 for MATLAB (MathWorks) to obtain g values, hyperfine coupling constants, and line widths. Spin concentration measurements were performed by numerically calculating the double integral of the simulated spectra and comparing the area with that of a K2(SO3)2NO standard. This standard was prepared before each set of EPR measurements by dissolving solid K2(SO3)2NO under anaerobic conditions in anoxic

0.5 M KHCO3 and diluting to a final concentration of 0.3–0.7 mM. To account for any decomposition during dissolution, the concentration was measured at 248 nm

(ε = 1,690 M−1 cm−1), using a NanoDrop 2000 UV-Vis Spectrophotometer.99 EPR spectra represent the average of 1 to 25 scans and were recorded under the following conditions: temperature, 77 K; center field, 3350 Gauss; sweep width, 200 Gauss; microwave power, 20 μW; microwave frequency, 9.45 MHz; modulation amplitude, 0.4 mT; modulation frequency, 100 kHz; time constant, 20.48 ms; conversion time, 20.48 ms; scan time, 20.97 s; receiver gain, 60 dB (for enzymatic assays) or 30 dB (for standards). Normalization for the difference in receiver gain was performed by the spectrometer. Simulated spectra were integrated twice to quantify the number of spins in each sample.

All EPR assays were performed in triplicate. Negative controls, in which GRE, GRE activating enzyme, or SAM was excluded, were performed for the PD and t4LHypD systems. No signal was observed in any of the negative controls.

96

2.3.18: Synthesis of 5-deazariboflavin and enantioenriched propanediols

5-Deazariboflavin was prepared as previously described and the 1H NMR and HRMS matched previously reported spectra.100-101 (S)-1,2-Propanediol and (R)-1,2-propanediol were prepared from propylene oxide using a previously reported protocol and the 1H NMR spectra matched those of the previously characterized compounds.102 The enantiomeric excess of each propanediol substrate was determined as previously described and was found to be >99% for both compounds in accordance with the previous reported synthesis.102

2.3.19: GC–MS assay for PD activity

All concentrations of assay components are final concentrations. Samples were prepared by first incubating DTT (10 mM), 5-deazariboflavin (70 µM), and 70 µM PD-AE in buffer (25 mM Tris-HCl pH 8.0, 50 mM NaCl) for 20 min under ambient light inside an anaerobic chamber

(Coy Laboratory Products). Wild-type PD (70 µM), previously sparged with argon for 30 min) and SAM (750 µM) were then added and the solution was incubated under ambient light for 2 h to install the glycyl radical on PD. Assays for each substrate were prepared by incubating PD from the activation mixture (1 µM total monomer concentration) with (S)-1,2-propanediol (5 mM) in buffer (25 mM Tris-HCl pH 8.0, 50 mM NaCl) for 20 min. The 200 μL assay mixtures were then transferred to 10 mL screw cap headspace vials (Sigma Aldrich) containing 2.3 mL of water and 1.8 g NaCl. They were quickly sealed and capped tightly. The vials were kept at 4 °C until analyzed.

All GC–MS experiments were conducted with an inlet helium carrier gas flow rate of

2.3 mL min−1 in constant flow mode with a fused-silica capillary column of cross-linked DB-

624UI (30 m × 0.32 mm × 1.80 μm, Agilent, Santa Clara, CA) on a Quattro micro GC Mass

Spectrometer (Waters, Milford, MA) equipped with a Combi PAL autosampler (CTC Analytics,

97

Zwingen, Swtizerland) and a split/splitless injector. For headspace-GC–MS experiments, 1 mL was injected into the GC–MS via a transfer syringe held at 120 °C. The needle flush time was

120 s and a 1 mm straight single taper Ultra Inert liner (Agilent) was used. The inlet and transfer line temperatures were set to 220 °C, and the ion source temperature was set to 200 °C. All GC–

MS data were acquired and analyzed with Waters MassLynx V4.1 software package.

Headspace (HS)-GC–MS was used for the detection of propionaldehyde. The headspace extraction was carried out at 50 °C under agitation at 500 rpm for 15 min. The GC conditions were as follows: oven temperature program, 30 °C for 3 min, 13 °C min-1 to 90 °C, 50 °C min-1 to 250 °C; split ratio, 20:1. The electron impact (EI)-MS conditions were as follows: full scan m/z range, 10-100 Da at 1.31-10.82 min; selected ion recording (SIR) mode from 0-10.82 min for m/z 29 for detection of propionaldehyde.

2.3.20: Coupled spectrophotometric assay for PD kinetics

All concentrations of assay components are final concentrations. PD was activated by first incubating DTT (10 mM), 5-deazariboflavin (70 µM), and PD-AE (70 µM) in buffer (25 mM Tris-HCl pH 7.5, 50 mM NaCl) for 20 min under ambient light inside an anaerobic chamber

(MBraun). PD (70 µM, previously sparged with argon for 30 min) and SAM (500 µM) were then added, and the solution was incubated under ambient light for 2 h. The activated PD was used in the kinetic assays. Kinetics assays contained horse liver alcohol dehydrogenase (5 µM), NADH

(200 µM), substrate as indicated in the text, and activated PD (2.5 or 25 nM total monomer concentration) in buffer (25 mM Tris-HCl pH 7.5, 50 mM NaCl). The coupled spectrophotometric assays were conducted at 20 °C in 96-well plates and the absorbance at 340 nm was recorded, with measurements being taken in 20 s intervals for 10 min in a 96-well plate.

Initial rates were calculated from absorbance measurements that decreased linearly. Pathlengths

98 were corrected to 1 cm and absorbance values were converted to concentrations assuming

-1 -1 ε340 = 6,220 M cm for NADH. The rate from assays containing no substrate was subtracted from assays containing substrate to account for background activity. The same rates were recorded when the concentration of alcohol dehydrogenase was doubled, confirming that PD activity was being measured. Assays were performed in triplicate on separate days. All of the data was fit to the Michaelis–Menten equation simultaneously using nonlinear regression in

Graphpad Prism 7.00. The kobs was calculated assuming 18 ± 1% (mean ± standard error (SE)) activation of PD monomers as measured by EPR.

2.3.21: Spectrophotometric assay for detection of P5CR activity

Yolanda Huang (Harvard) performed the following assays. P5CR from C. difficile 70-

100-2010 was used in coupled enzyme assays to characterize t4LHypD activity and kinetics.

Biochemical and structural studies of bacterial P5CRs have been previously reported.103-104 The non-physiological conversion of L-proline to P5C was shown to be favored for P5CR at high pH

104 and at high concentrations of L-proline and cofactor. A similar assay was used in this study to investigate the activity of purified C. difficile P5CR (Figure 2.19).

Assay mixtures contained 200 mM sodium bicarbonate (pH 10), 5 µM P5CR, 50 mM proline, and 10 mM NAD+ in a final volume of 200 µL. Assays were initiated by addition of

NAD+ cofactor. All assays were carried out in triplicate. The pathlength-corrected absorbance at

340 nm was measured every 15 s over a period of 10 min in a 96-well plate. Absorbance of buffer blanks was subtracted from assay conditions tested. P5CR purified from C. difficile 70-

23 100-2010 was shown to have high specificity toward L-proline and NADH.

99

2.3.22: Coupled spectrophotometric assay for t4LHypD kinetics

Yolanda Huang (Harvard) performed the following assays. t4LHypD was first activated following conditions described for EPR spectroscopic assays. Activated t4LHypD was used for coupled enzyme kinetic assays. Kinetic assays contained 20 mM Tris-HCl pH 7.5, 50 mM bicine pH 7.5, 100 mM KCl, 400 µM NADH, 900 nM P5CR, and 30 nM t4LHypD (total monomer concentration). Assays were initiated by addition of Hyp to a final concentration of 0, 0.25, 0.5,

1, 2, 5, or 10 mM. The absorbance at 340 nm was measured every 10 s over a period of 5 min in a 96-well plate. Initial rates were calculated from absorbance measurements that decreased linearly. Pathlengths were corrected to 1 cm and absorbance values were converted to

−1 −1 concentrations assuming ε340 = 6,220 M cm for NADH. The average initial rate from 0 mM t4LHyp assays was subtracted from assays containing t4LHyp. The same rates were recorded at higher concentrations of P5CR, confirming that t4LHypD activity was being measured. Each condition was carried out in triplicate of a single experiment. The data was fit simultaneously to the Michaelis–Menten equation using nonlinear regression in Graphpad Prism 7.00. The kobs parameter was calculated based on 51 ± 1% (mean ± SE) activation of t4LHypD monomers as determined by EPR spectroscopic assays (Figure 2.20A).

2.3.23: LC–MS/MS assays for t4LHypD enzymatic activity

Yolanda Huang (Harvard) performed the following assays. LC–MS/MS analyses were performed on an Agilent 6410 Triple Quadrupole LC–MS instrument (Agilent Technologies).

The proline detection method was performed as described previously using a Luna SCX column

(5 µm, 100 Å, 50 x 2.0 mm, Phenomenex) with the following modifications.105 The capillary voltage was set to 4000 V, the collision energy to 13 eV, and the fragmentor voltage to 80 V.

The drying gas temperature was maintained at 300 °C with a flow rate of 12 L min−1 and a

100 nebulizer pressure of 35 psi. Proline was monitored using an isocratic flow of 30 mM ammonium acetate (solvent A) and 5% acetic acid (solvent B) at a ratio of 15:85 with a flow rate of

0.4 mL min−1 for 6 min. The hydroxyproline detection method used 200 mM ammonium acetate as solvent A with the other parameters identical to the proline detection method. The mass spectrometer was operated in multiple reaction monitoring (MRM) mode with positive ionization monitoring. Precursor and product ions of m/z 116.1  m/z 70.1 were monitored for proline and m/z 132.1  m/z 86.1 were monitored for hydroxyproline. On the Agilent MassHunter

Workstation Data Acquisition software, the MS1 resolution was set to “wide” for proline and

“unit” for hydroxyproline. MS2 resolution was set to “unit” for both amino acids. The injection volume for all samples was 3 µL. Data analysis was performed with Agilent MassHunter

Qualitative Analysis software. Amino acid standards were dissolved in water and diluted to various concentrations prior to sample injections. A standard curve was used to calculate proline and hydroxyproline concentrations in samples based on peak integrations.

All assays were prepared and incubated inside an anaerobic chamber (MBraun). P5CR coupled enzymatic assays contained 20 mM Tris-HCl pH 7.5, 100 mM KCl, 0.4 mM NADH, 3

µM P5CR, 0.2 mM t4LHyp, and 0.3 µM t4LHypD. t4LHypD was first activated under conditions described for EPR spectroscopic assays. Activated t4LHypD mix was then diluted to a final concentration of 0.3 µM protein monomer into buffer containing P5CR and NADH. All assays were carried out in triplicate and were initiated by adding t4LHyp into 2 mL glass vials. Vials were capped immediately and allowed to incubate for 1 h at 22 °C (mixtures incubated for 23 h yielded similar results). Upon removal from the anaerobic chamber, reactions were quenched with a 2× volume of methanol and protein precipitates were removed by centrifugation

101

(15,200×g, 10 min). Supernatants were further diluted with water 60-fold for proline detection and 12-fold for hydroxyproline detection by LC–MS/MS (Figure 2.20B).

The enantiopurity of the Hyp commercial stock was confirmed by chiral HPLC. Analysis was performed using a Dionex Ultimate 3000 instrument (Thermo Scientific). Lux® Amylose-1 column (5 µm, 100 Å, 100 × 4.6 mm, Phenomenex) was used to separate each stereoisomer using a flow gradient of 0-15% solvent A over 15 min at a flow rate of 1 mL min-1. Solvent A consisted of 10 mM aqueous ammonium acetate and solvent B consisted of acetonitrile with no additive. Hydroxyproline isomers were derivatized as previously reported prior to injection.106

All samples were diluted 2-fold and a volume of 1 µL was injected onto the HPLC. Derivatized amino acids were monitored by absorbance at 340 nm. Single peaks were observed for each commercial stock, indicating >99% ee.

2.4: References

(1) Lynch, S. V.; Pedersen, O. The Human Intestinal Microbiome in Health and Disease. N. Engl. J. Med. 2016, 375, 2369-2379.

(2) Blaser, M. J.; Cardon, Z. G.; Cho, M. K.; Dangl, J. L.; Donohue, T. J.; Green, J. L.; Knight, R.; Maxon, M. E.; Northen, T. R.; Pollard, K. S.; Brodie, E. L. Toward a Predictive Understanding of Earth’s Microbiomes to Address 21st Century Challenges. mBio 2016, 7, e00714-16.

(3) Sender, R.; Fuchs, S.; Milo, R. Revised Estimates for the Number of Human and Bacteria Cells in the Body. PLoS Biol. 2016, 14, e1002533.

(4) Nicholson, J. K.; Holmes, E.; Kinross, J.; Burcelin, R.; Gibson, G.; Jia, W.; Pettersson, S. Host-Gut Microbiota Metabolic Interactions. Science 2012, 336, 1262-1267.

(5) Joice, R.; Yasuda, K.; Shafquat, A.; Morgan, Xochitl C.; Huttenhower, C. Determining Microbial Products and Identifying Molecular Targets in the Human Microbiome. Cell Metab. 2014, 20, 731-741.

(6) Levy, S. E.; Myers, R. M. Advancements in Next-Generation Sequencing. Annu. Rev. Genomics Hum. Genet. 2016, 17, 95-115.

102

(7) Shendure, J.; Balasubramanian, S.; Church, G. M.; Gilbert, W.; Rogers, J.; Schloss, J. A.; Waterston, R. H. DNA sequencing at 40: past, present and future. Nature 2017, 550, 345- 353.

(8) Knight, R.; Vrbanac, A.; Taylor, B. C.; Aksenov, A.; Callewaert, C.; Debelius, J.; Gonzalez, A.; Kosciolek, T.; McCall, L.-I.; McDonald, D.; Melnik, A. V.; Morton, J. T.; Navas, J.; Quinn, R. A.; Sanders, J. G.; Swafford, A. D.; Thompson, L. R.; Tripathi, A.; Xu, Z. Z.; Zaneveld, J. R.; Zhu, Q.; Caporaso, J. G.; Dorrestein, P. C. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 2018, 16, 410-422.

(9) The Human Microbiome Project Consortium; Huttenhower, C. Structure, function and diversity of the healthy human microbiome. Nature 2012, 486, 207-214.

(10) Franzosa, E. A.; Hsu, T.; Sirota-Madi, A.; Shafquat, A.; Abu-Ali, G.; Morgan, X. C.; Huttenhower, C. Sequencing and beyond: integrating molecular 'omics' for microbial community profiling. Nat. Rev. Microbiol. 2015, 13, 360-372.

(11) Schnoes, A. M.; Brown, S. D.; Dodevski, I.; Babbitt, P. C. Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. PLoS Comput. Biol. 2009, 5, e1000605.

(12) Lagier, J.-C.; Hugon, P.; Khelaifia, S.; Fournier, P.-E.; La Scola, B.; Raoult, D. The Rebirth of Culture in Microbiology through the Example of Culturomics To Study Human Gut Microbiota. Clin. Microbiol. Rev. 2015, 28, 237-264.

(13) Greenblum, S.; Carr, R.; Borenstein, E. Extensive Strain-Level Copy-Number Variation across Human Gut Microbiome Species. Cell 2015, 160, 583-594.

(14) Quince, C.; Walker, A. W.; Simpson, J. T.; Loman, N. J.; Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 2017, 35, 833-844.

(15) Gerlt, J. A.; Bouvier, J. T.; Davidson, D. B.; Imker, H. J.; Sadkhin, B.; Slater, D. R.; Whalen, K. L. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta 2015, 1854, 1019-1037.

(16) Gerlt, J. A. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence– Function Space and Genome Context to Discover Novel Functions. Biochemistry 2017, 56, 4293-4308.

(17) Zallot, R.; Oberg, N. O.; Gerlt, J. A. ‘Democratized’ genomic enzymology web tools for functional assignment. Curr. Opin. Chem. Biol. 2018, 47, 77-85.

(18) Kaminski, J.; Gibson, M. K.; Franzosa, E. A.; Segata, N.; Dantas, G.; Huttenhower, C. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLoS Comput. Biol. 2015, 11, e1004557.

103

(19) Atkinson, H. J.; Morris, J. H.; Ferrin, T. E.; Babbitt, P. C. Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies. PLoS One 2009, 4, e4345.

(20) Huang, H.; Carter, M. S.; Vetting, M. W.; Al-Obaidi, N.; Patskovsky, Y.; Almo, S. C.; Gerlt, J. A. A General Strategy for the Discovery of Metabolic Pathways: D-Threitol, L- Threitol, and Erythritol Utilization in Mycobacterium smegmatis. J. Am. Chem. Soc. 2015, 137, 14570-14573.

(21) Carter, M. S.; Zhang, X.; Huang, H.; Bouvier, J. T.; Francisco, B. S.; Vetting, M. W.; Al- Obaidi, N.; Bonanno, J. B.; Ghosh, A.; Zallot, R. G.; Andersen, H. M.; Almo, S. C.; Gerlt, J. A. Functional assignment of multiple catabolic pathways for D-apiose. Nat. Chem. Biol. 2018, 14, 696-705.

(22) Brown, S. D.; Babbitt, P. C. Inference of Functional Properties from Large-scale Analysis of Enzyme Superfamilies. J. Biol. Chem. 2012, 287, 35-42.

(23) Levin, B. J.; Huang, Y. Y.; Peck, S. C.; Wei, Y.; Martínez-del Campo, A.; Marks, J. A.; Franzosa, E. A.; Huttenhower, C.; Balskus, E. P. A prominent glycyl radical enzyme in human gut microbiomes metabolizes trans-4-hydroxy-L-proline. Science 2017, 355, eaai8386.

(24) Kurokawa, K.; Itoh, T.; Kuwahara, T.; Oshima, K.; Toh, H.; Toyoda, A.; Takami, H.; Morita, H.; Sharma, V. K.; Srivastava, T. P.; Taylor, T. D.; Noguchi, H.; Mori, H.; Ogura, Y.; Ehrlich, D. S.; Itoh, K.; Takagi, T.; Sakaki, Y.; Hayashi, T.; Hattori, M. Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes. DNA Res. 2007, 14, 169-181.

(25) Ellrott, K.; Jaroszewski, L.; Li, W.; Wooley, J. C.; Godzik, A. Expansion of the Protein Repertoire in Newly Explored Environments: Human Gut Microbiome Specific Protein Families. PLoS Comput. Biol. 2010, 6, e1000798.

(26) Kolmeder, C. A.; de Been, M.; Nikkilä, J.; Ritamo, I.; Mättö, J.; Valmu, L.; Salojärvi, J.; Palva, A.; Salonen, A.; de Vos, W. M. Comparative Metaproteomics and Diversity Analysis of Human Intestinal Microbiota Testifies for Its Temporal Stability and Expression of Core Functions. PLoS One 2012, 7, e29913.

(27) Martínez-del Campo, A.; Bodea, S.; Hamer, H. A.; Marks, J. A.; Haiser, H. J.; Turnbaugh, P. J.; Balskus, E. P. Characterization and Detection of a Widely Distributed Gene Cluster That Predicts Anaerobic Choline Utilization by Human Gut Bacteria. mBio 2015, 6, e00042-15.

(28) Finn, R. D.; Attwood, T. K.; Babbitt, P. C.; Bateman, A.; Bork, P.; Bridge, A. J.; Chang, H.-Y.; Dosztányi, Z.; El-Gebali, S.; Fraser, M.; Gough, J.; Haft, D.; Holliday, G. L.; Huang, H.; Huang, X.; Letunic, I.; Lopez, R.; Lu, S.; Marchler-Bauer, A.; Mi, H.; Mistry, J.; Natale, D. A.; Necci, M.; Nuka, G.; Orengo, C. A.; Park, Y.; Pesseat, S.; Piovesan, D.; Potter, S. C.; Rawlings, N. D.; Redaschi, N.; Richardson, L.; Rivoire, C.; Sangrador- Vegas, A.; Sigrist, C.; Sillitoe, I.; Smithers, B.; Squizzato, S.; Sutton, G.; Thanki, N.; 104

Thomas, P. D.; Tosatto, Silvio C. E.; Wu, C. H.; Xenarios, I.; Yeh, L.-S.; Young, S.-Y.; Mitchell, A. L. InterPro in 2017—beyond protein family and domain annotations. Nucleic Acids Res. 2017, 45, D190-D199.

(29) Becker, A.; Fritz-Wolf, K.; Kabsch, W.; Knappe, J.; Schultz, S.; Volker Wagner, A. F. Structure and mechanism of the glycyl radical enzyme pyruvate formate-lyase. Nat. Struct. Biol. 1999, 6, 969.

(30) Raynaud, C.; Sarçabal, P.; Meynial-Salles, I.; Croux, C.; Soucaille, P. Molecular characterization of the 1,3-propanediol (1,3-PD) operon of Clostridium butyricum. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 5010-5015.

(31) O'Brien, J. R.; Raynaud, C.; Croux, C.; Girbal, L.; Soucaille, P.; Lanzilotta, W. N. Insight into the Mechanism of the B12-Independent Glycerol Dehydratase from Clostridium butyricum: Preliminary Biochemical and Structural Characterization. Biochemistry 2004, 43, 4635-4645.

(32) Scott, K. P.; Martin, J. C.; Campbell, G.; Mayer, C.-D.; Flint, H. J. Whole-Genome Transcription Profiling Reveals Genes Up-Regulated by Growth on Fucose in the Human Gut Bacterium “Roseburia inulinivorans”. J. Bacteriol. 2006, 188, 4340-4349.

(33) Zarzycki, J.; Erbilgin, O.; Kerfeld, C. A. Bioinformatic Characterization of Glycyl Radical Enzyme-Associated Bacterial Microcompartments. Appl. Environ. Microbiol. 2015, 81, 8315-8329.

(34) Zarzycki, J.; Sutter, M.; Cortina, N. S.; Erb, T. J.; Kerfeld, C. A. In Vitro Characterization and Concerted Function of Three Core Enzymes of a Glycyl Radical Enzyme - Associated Bacterial Microcompartment. Sci. Rep. 2017, 7, 42757.

(35) Smillie, C. S.; Smith, M. B.; Friedman, J.; Cordero, O. X.; David, L. A.; Alm, E. J. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 2011, 480, 241-244.

(36) Abubucker, S.; Segata, N.; Goll, J.; Schubert, A. M.; Izard, J.; Cantarel, B. L.; Rodriguez- Mueller, B.; Zucker, J.; Thiagarajan, M.; Henrissat, B.; White, O.; Kelley, S. T.; Methé, B.; Schloss, P. D.; Gevers, D.; Mitreva, M.; Huttenhower, C. Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome. PLoS Comput. Biol. 2012, 8, e1002358.

(37) Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K. S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; Mende, D. R.; Li, J.; Xu, J.; Li, S.; Li, D.; Cao, J.; Wang, B.; Liang, H.; Zheng, H.; Xie, Y.; Tap, J.; Lepage, P.; Bertalan, M.; Batto, J.-M.; Hansen, T.; Le Paslier, D.; Linneberg, A.; Nielsen, H. B.; Pelletier, E.; Renault, P.; Sicheritz-Ponten, T.; Turner, K.; Zhu, H.; Yu, C.; Li, S.; Jian, M.; Zhou, Y.; Li, Y.; Zhang, X.; Li, S.; Qin, N.; Yang, H.; Wang, J.; Brunak, S.; Doré, J.; Guarner, F.; Kristiansen, K.; Pedersen, O.; Parkhill, J.; Weissenbach, J.; MetaHIT Consortium; Antolin, M.; Artiguenave, F.; Blottiere, H.; Borruel, N.; Bruls, T.; Casellas, F.; Chervaux, C.; Cultrone, A.; Delorme, C.; Denariaz, G.; Dervyn, R.; Forte, M.; Friss, C.; van de 105

Guchte, M.; Guedon, E.; Haimet, F.; Jamet, A.; Juste, C.; Kaci, G.; Kleerebezem, M.; Knol, J.; Kristensen, M.; Layec, S.; Le Roux, K.; Leclerc, M.; Maguin, E.; Melo Minardi, R.; Oozeer, R.; Rescigno, M.; Sanchez, N.; Tims, S.; Torrejon, T.; Varela, E.; de Vos, W.; Winogradsky, Y.; Zoetendal, E.; Bork, P.; Ehrlich, S. D.; Wang, J. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59- 65.

(38) Carr, R.; Borenstein, E. Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads. PLoS One 2014, 9, e105776.

(39) Berendzen, J.; Bruno, W. J.; Cohn, J. D.; Hengartner, N. W.; Kuske, C. R.; McMahon, B. H.; Wolinsky, M. A.; Xie, G. Rapid phylogenetic and functional classification of short genomic fragments with signature peptides. BMC Res. Notes 2012, 5, 460.

(40) Suzek, B. E.; Wang, Y.; Huang, H.; McGarvey, P. B.; Wu, C. H.; The UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015, 31, 926-932.

(41) Nayfach, S.; Pollard, K. S. Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome. Genome Biol. 2015, 16, 51.

(42) Knappe, J.; Sawers, G. A radical-chemical route to acetyl-CoA: the anaerobically induced pyruvate formate-lyase system of Escherichia coli. FEMS Microbiol. Rev. 1990, 6, 383-398.

(43) Knappe, J.; Volker Wagner, A. F., Stable glycyl radical from pyruvate formate-lyase and ribonucleotide reductase (III). In Adv. Protein Chem., Academic Press: 2001; Vol. 58, pp 277-315.

(44) Wagner, A. F. V.; Schultz, S.; Bomke, J.; Pils, T.; Lehmann, W. D.; Knappe, J. YfiD of Escherichia coli and Y06I of Bacteriophage T4 as Autonomous Glycyl Radical Cofactors Reconstituting the Catalytic Center of Oxygen-Fragmented Pyruvate Formate-Lyase. Biochem. Biophys. Res. Commun. 2001, 285, 456-462.

(45) Mattila, K. J.; Nieminen, M. S.; Valtonen, V. V.; Rasi, V. P.; Kesäniemi, Y. A.; Syrjälä, S. L.; Jungell, P. S.; Isoluoma, M.; Hietaniemi, K.; Jokinen, M. J. Association between dental health and acute myocardial infarction. Br. Med. J. 1989, 298, 779-781.

(46) Qin, N.; Yang, F.; Li, A.; Prifti, E.; Chen, Y.; Shao, L.; Guo, J.; Le Chatelier, E.; Yao, J.; Wu, L.; Zhou, J.; Ni, S.; Liu, L.; Pons, N.; Batto, J. M.; Kennedy, S. P.; Leonard, P.; Yuan, C.; Ding, W.; Chen, Y.; Hu, X.; Zheng, B.; Qian, G.; Xu, W.; Ehrlich, S. D.; Zheng, S.; Li, L. Alterations of the human gut microbiome in liver cirrhosis. Nature 2014, 513, 59-64.

(47) Craciun, S.; Balskus, E. P. Microbial conversion of choline to trimethylamine requires a glycyl radical enzyme. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 21307-21312.

106

(48) Selmer, T.; Andrei, P. I. p-Hydroxyphenylacetate decarboxylase from Clostridium difficile. Eur. J. Biochem. 2001, 268, 1363-1372.

(49) Franzosa, E. A.; Morgan, X. C.; Segata, N.; Waldron, L.; Reyes, J.; Earl, A. M.; Giannoukos, G.; Boylan, M. R.; Ciulla, D.; Gevers, D.; Izard, J.; Garrett, W. S.; Chan, A. T.; Huttenhower, C. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, E2329-E2338.

(50) Hooper, L. V.; Xu, J.; Falk, P. G.; Midtvedt, T.; Gordon, J. I. A molecular sensor that allows a gut commensal to control its nutrient foundation in a competitive ecosystem. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 9833-9838.

(51) Reichardt, N.; Duncan, S. H.; Young, P.; Belenguer, A.; McWilliam Leitch, C.; Scott, K. P.; Flint, H. J.; Louis, P. Phylogenetic distribution of three pathways for propionate production within the human gut microbiota. ISME J. 2014, 8, 1323-1335.

(52) Chen, Y.-M.; Zhu, Y.; Lin, E. C. C. The organization of the fuc regulon specifying L- fucose dissimilation in Escherichia coli K12 as determined by gene cloning. Mol. Gen. Genet. 1987, 210, 331-337.

(53) Becker, D. J.; Lowe, J. B. Fucose: biosynthesis and biological function in mammals. Glycobiology 2003, 13, 41R-53R.

(54) Daniel, R.; Bobik, T. A.; Gottschalk, G. Biochemistry of coenzyme B12-dependent glycerol and diol dehydratases and organization of the encoding genes. FEMS Microbiol. Rev. 1998, 22, 553-566.

(55) Wagner, A. F.; Frey, M.; Neugebauer, F. A.; Schäfer, W.; Knappe, J. The free radical in pyruvate formate-lyase is located on glycine-734. Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 996-1000.

(56) Eaton, G. R.; Eaton, S. S.; Barr, D. P.; Weber, R. T., Quantitative EPR. Springer: Wien; New York, 2010.

(57) Craciun, S.; Marks, J. A.; Balskus, E. P. Characterization of Choline Trimethylamine- Lyase Expands the Chemistry of Glycyl Radical Enzymes. ACS Chem. Biol. 2014, 9, 1408-1413.

(58) Ragsdale, S. W. Metals and Their Scaffolds To Promote Difficult Enzymatic Reactions. Chem. Rev. 2006, 106, 3317-3337.

(59) Bobik, T. A.; Havemann, G. D.; Busch, R. J.; Williams, D. S.; Aldrich, H. C. The Propanediol Utilization (pdu) Operon of Salmonella enterica Serovar Typhimurium LT2 Includes Genes Necessary for Formation of Polyhedral Organelles Involved in Coenzyme B12-Dependent 1,2-Propanediol Degradation. J. Bacteriol. 1999, 181, 5967-5975.

(60) LaMattina, J. W.; Keul, N. D.; Reitzer, P.; Kapoor, S.; Galzerani, F.; Koch, D. J.; Gouvea, I. E.; Lanzilotta, W. N. 1,2-Propanediol Dehydration in Roseburia inulinivorans:

107

STRUCTURAL BASIS FOR SUBSTRATE AND ENANTIOMER SELECTIVITY. J. Biol. Chem. 2016, 291, 15515-15526.

(61) Feliks, M.; Ullmann, G. M. Glycerol Dehydratation by the B12-Independent Enzyme May Not Involve the Migration of a Hydroxyl Group: A Computational Study. J. Phys. Chem. B 2012, 116, 7076-7087.

(62) Kovačević, B.; Barić, D.; Babić, D.; Bilić, L.; Hanževački, M.; Sandala, G. M.; Radom, L.; Smith, D. M. Computational Tale of Two Enzymes: Glycerol Dehydration With or Without B12. J. Am. Chem. Soc. 2018, 140, 8487-8496.

(63) Faber, F.; Thiennimitr, P.; Spiga, L.; Byndloss, M. X.; Litvak, Y.; Lawhon, S.; Andrews- Polymenis, H. L.; Winter, S. E.; Bäumler, A. J. Respiration of Microbiota-Derived 1,2- propanediol Drives Salmonella Expansion during Colitis. PLoS Pathog. 2017, 13, e1006129.

(64) Campbell, Eric L.; Bruyninckx, Walter J.; Kelly, Caleb J.; Glover, Louise E.; McNamee, Eóin N.; Bowers, Brittelle E.; Bayless, Amanda J.; Scully, M.; Saeedi, Bejan J.; Golden- Mason, L.; Ehrentraut, Stefan F.; Curtis, Valerie F.; Burgess, A.; Garvey, John F.; Sorensen, A.; Nemenoff, R.; Jedlicka, P.; Taylor, Cormac T.; Kominsky, Douglas J.; Colgan, Sean P. Transmigrating Neutrophils Shape the Mucosal Microenvironment through Localized Oxygen Depletion to Influence Resolution of Inflammation. Immunity 2014, 40, 66-77.

(65) Deutch, A. H.; Smith, C. J.; Rushlow, K. E.; Kretschmer, P. J. Escherichia coli Δ1- pyrroline-5-carboxylate reductase: gene sequence, protein overproduction and purification. Nucleic Acids Res. 1982, 10, 7701-7714.

(66) Mead, G. C. The Amino Acid-fermenting Clostridia. J. Gen. Microbiol. 1971, 67, 47-56.

(67) Jackson, S.; Calos, M.; Myers, A.; Self, W. T. Analysis of Proline Reduction in the Nosocomial Pathogen Clostridium difficile. J. Bacteriol. 2006, 188, 8487-8495.

(68) Bar-Even, A.; Noor, E.; Savir, Y.; Liebermeister, W.; Davidi, D.; Tawfik, D. S.; Milo, R. The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters. Biochemistry 2011, 50, 4402-4410.

(69) Gorres, K. L.; Raines, R. T. Prolyl 4-hydroxylase. Crit. Rev. Biochem. Mol. Biol. 2010, 45, 106-124.

(70) Adams, E.; Frank, L. Metabolism of Proline and the Hydroxyprolines. Annu. Rev. Biochem. 1980, 49, 1005-1061.

(71) Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; Thierer, T.; Ashton, B.; Meintjes, P.; Drummond, A. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647-1649.

108

(72) Sievers, F.; Wilm, A.; Dineen, D.; Gibson, T. J.; Karplus, K.; Li, W.; Lopez, R.; McWilliam, H.; Remmert, M.; Söding, J.; Thompson, J. D.; Higgins, D. G. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011, 7, 539.

(73) Waterhouse, A. M.; Procter, J. B.; Martin, D. M. A.; Clamp, M.; Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189-1191.

(74) Leppänen, V.-M.; Merckel, M. C.; Ollis, D. L.; Wong, K. K.; Kozarich, J. W.; Goldman, A. Pyruvate formate lyase is structurally homologous to type I ribonucleotide reductase. Structure 1999, 7, 733-744.

(75) Lehtiö, L.; Goldman, A. The pyruvate formate lyase family: sequences, structures and activation. Protein Eng., Des. Sel. 2004, 17, 545-552.

(76) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003, 13, 2498-2504.

(77) Selmer, T.; Pierik, A., J.; Heider, J. New glycyl radical enzymes catalysing key metabolic steps in anaerobic bacteria. Biol. Chem. 2005, 386, 981-988.

(78) Funk, M. A.; Marsh, E. N. G.; Drennan, C. L. Substrate-bound Structures of Benzylsuccinate Synthase Reveal How Toluene Is Activated in Anaerobic Hydrocarbon Degradation. J. Biol. Chem. 2015, 290, 22398-22408.

(79) The Human Microbiome Project Consortium; Methé, B. A. A framework for human microbiome research. Nature 2012, 486, 215-221.

(80) Wecksler, S. R.; Stoll, S.; Tran, H.; Magnusson, O. T.; Wu, S.-p.; King, D.; Britt, R. D.; Klinman, J. P. Pyrroloquinoline Quinone Biogenesis: Demonstration That PqqE from Klebsiella pneumoniae Is a Radical S-Adenosyl-L-methionine Enzyme. Biochemistry 2009, 48, 10151-10161.

(81) Bharadwaj, V. S.; Dean, A. M.; Maupin, C. M. Insights into the Glycyl Radical Enzyme Active Site of Benzylsuccinate Synthase: A Computational Study. J. Am. Chem. Soc. 2013, 135, 12279-12288.

(82) Šali, A.; Potterton, L.; Yuan, F.; van Vlijmen, H.; Karplus, M. Evaluation of comparative protein modeling by MODELLER. Proteins: Struct. Funct., Genet. 1995, 23, 318-326.

(83) Alva, V.; Nam, S.-Z.; Söding, J.; Lupas, A. N. The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Res. 2016, 44, W410-W415.

(84) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin,

109

P. S. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004, 47, 1739-1749.

(85) Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren, T. A.; Sanschagrin, P. C.; Mainz, D. T. Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein−Ligand Complexes. J. Med. Chem. 2006, 49, 6177-6196.

(86) Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening. J. Med. Chem. 2004, 47, 1750-1759.

(87) Jacobson, M. P.; Pincus, D. L.; Rapp, C. S.; Day, T. J. F.; Honig, B.; Shaw, D. E.; Friesner, R. A. A hierarchical approach to all-atom protein loop prediction. Proteins: Struct., Funct., Bioinf. 2004, 55, 351-367.

(88) Jacobson, M. P.; Friesner, R. A.; Xiang, Z.; Honig, B. On the Role of the Crystal Environment in Determining Protein Side-chain Conformations. J. Mol. Biol. 2002, 320, 597-608.

(89) Hildebrand, A.; Remmert, M.; Biegert, A.; Söding, J. Fast and accurate automatic structure prediction with HHpred. Proteins: Struct., Funct., Bioinf. 2009, 77, 128-132.

(90) Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 2011, 9, 173.

(91) Kalnins, G.; Kuka, J.; Grinberga, S.; Makrecka-Kuka, M.; Liepinsh, E.; Dambrova, M.; Tars, K. Structure and Function of CutC Choline Lyase from Human Microbiota Bacterium Klebsiella pneumoniae. J. Biol. Chem. 2015, 290, 21732-21740.

(92) Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26, 2460-2461.

(93) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403-410.

(94) Wei, Y.; Li, B.; Prakash, D.; Ferry, J. G.; Elliott, S. J.; Stubbe, J. A Ferredoxin Disulfide Reductase Delivers Electrons to the Methanosarcina barkeri Class III Ribonucleotide Reductase. Biochemistry 2015, 54, 7019-7028.

(95) Gust, B.; Challis, G. L.; Fowler, K.; Kieser, T.; Chater, K. F. PCR-targeted Streptomyces gene replacement identifies a protein domain needed for biosynthesis of the sesquiterpene soil odor geosmin. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 1541-1546.

(96) Thomason, L. C.; Costantino, N.; Court, D. L. E. coli Genome Manipulation by P1 Transduction. Curr. Protoc. Mol. Biol. 2007, 79, 1.17.1-1.17.8.

110

(97) Gasteiger, E.; Gattiker, A.; Hoogland, C.; Ivanyi, I.; Appel, R. D.; Bairoch, A. ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31, 3784-3788.

(98) Stoll, S.; Schweiger, A. EasySpin, a comprehensive software package for spectral simulation and analysis in EPR. J. Magn. Reson. 2006, 178, 42-55.

(99) Murib, J. H.; Ritter, D. M. Decomposition of Nitrosyl Disulfonate Ion. I. Products and Mechanism of Color Fading in Acid Solution. J. Am. Chem. Soc. 1952, 74, 3394-3398.

(100) Carlson, E. E.; Kiessling, L. L. Improved Chemical Syntheses of 1- and 5- Deazariboflavin. J. Org. Chem. 2004, 69, 2614-2617.

(101) Mansurova, M.; Koay, M. S.; Gärtner, W. Synthesis and Electrochemical Properties of Structurally Modified Flavin Compounds. Eur. J. Org. Chem. 2008, 2008, 5401-5406.

(102) Schaus, S. E.; Brandes, B. D.; Larrow, J. F.; Tokunaga, M.; Hansen, K. B.; Gould, A. E.; Furrow, M. E.; Jacobsen, E. N. Highly Selective Hydrolytic Kinetic Resolution of Terminal Epoxides Catalyzed by Chiral (salen)CoIII Complexes. Practical Synthesis of Enantioenriched Terminal Epoxides and 1,2-Diols. J. Am. Chem. Soc. 2002, 124, 1307- 1315.

(103) Nocek, B.; Chang, C.; Li, H.; Lezondra, L.; Holzle, D.; Collart, F.; Joachimiak, A. Crystal Structures of Δ1-Pyrroline-5-carboxylate Reductase from Human Pathogens Neisseria meningitides and Streptococcus pyogenes. J. Mol. Biol. 2005, 354, 91-106.

(104) Kenklies, J.; Ziehn, R.; Fritsche, K.; Pich, A.; Andreesen, J. R. Proline biosynthesis from 1 L-ornithine in Clostridium sticklandii: purification of Δ -pyrroline-5-carboxylate reductase, and sequence and expression of the encoding gene, proC. Microbiology 1999, 145, 819-826.

(105) Thiele, B.; Stein, N.; Oldiges, M.; Hofmann, D., Direct Analysis of Underivatized Amino Acids in Plant Extracts by LC-MS-MS. In Amino Acid Analysis: Methods and Protocols, Alterman, M. A.; Hunziker, P., Eds. Humana Press: Totowa, NJ, 2012; pp 317-328.

(106) Langrock, T.; García-Villar, N.; Hoffmann, R. Analysis of hydroxyproline isomers and hydroxylysine by reversed-phase HPLC and mass spectrometry. J. Chromatogr. B 2007, 847, 282-288.

111

Chapter 3: Biochemical characterization of propanediol dehydratases reveals distinct i mechanisms of 1,2-diol dehydration by glycyl radical and B12-dependent enzymes

3.1: Introduction

Enzymes use radical chemistry to catalyze challenging transformations that cannot be achieved solely with two-electron chemistry.1 Understanding how proteins employ cofactor- and backbone-based radicals is crucial for elucidating the novel chemistry they use and ultimately critical for designing therapeutics to modulate their activities in vivo. As described in Chapter 1, the dehydration of 1,2-propanediol by radical enzymes in gut microbes influences host health. In

Chapter 2, I detailed how genes encoding for propanediol dehydratase (PD) were found to be widely distributed and highly abundant in healthy human gut microbiomes.2 In this Chapter, I report the biochemical and mechanistic characterization of the glycyl radical enzyme (GRE) PD and its activating enzyme (PD-AE).3 In addition to establishing how PD and PD-AE are biochemically similar to other canonical GRE and GRE-AE pairs, isotope labeling experiments

18 with O-labeled substrates establish that PD and B12-dependent propanediol dehydratase

(B12-PD) use distinct mechanisms to catalyze 1,2-diol dehydration, with the former mediating a direct elimination of the C2 hydroxyl group and the latter catalyzing hydroxyl group migration

(Figure 3.1).

i Parts of this chapter are adapted from the following publication: Levin, B. J.; Balskus, E. P. Characterization of 1,2-Propanediol Dehydratases Reveals Distinct Mechanisms for B12- Dependent and Glycyl Radical Enzymes. Biochemistry 2018, 57, 3222-3226. 112

Figure 3.1: Overview of 18O labeling experiments.

PD has an important role in the metabolism of the host-derived deoxysugar L-fucose, which is found on epithelial glycans.4-6 As described in Chapter 1, gut microbial fucosidases cleave this sugar from glycans, allowing other microbes to use it as a source of carbon and energy. Microbes metabolize L-fucose via the L-fucose utilization (fuc) pathway to dihydroxyacetone phosphate (DHAP) and (S)-1,2-propanediol.7 The enzymes in the fuc pathway are analogous to those of glycolysis. Although DHAP can be shunted directly into primary metabolism, the catabolism of (S)-1,2-propanediol is more challenging. Organisms possessing propanediol utilization (pdu) genes can convert (S)-1,2-propanediol to propionaldehyde, which can either be reduced to 1-propanol or oxidized to produce the beneficial short-chain fatty acid propionate.8-10 The C2 hydroxyl group is a poor leaving group and the C1 hydrogen atoms are not acidic, which limits how an enzyme catalyst could promote direct elimination of the C2 hydroxyl group. Therefore, enzymes catalyzing the dehydration of 1,2-diols have evolved to use radical chemistry.11

As described in Chapter 1, B12-dependent diol dehydratases were discovered by the

Abeles laboratory in the 1960s, and seminal isotope labeling experiments from the Abeles and

Arigoni groups established that B12-PD mediates an intramolecular rearrangement (Figure

3.2).12-13 Homolysis of the Co–C bond of the adenosylcobalamin cofactor, initiated by substrate binding, generates a 5ʹ-deoxyadenosine radical (5ʹ-dA•), which abstracts a hydrogen atom from

113

C1 of (S)-1,2-propanediol.11 This α-hydroxyalkyl radical then rearranges, forming a 1,1-gem diol intermediate. This intermediate collapses stereoselectively in the B12-PD active site to yield propionaldehyde.

Figure 3.2: Consensus mechanism of B12-PD.

As detailed in Chapters 1 and 2, the discovery of the GRE PD revealed an alternative

14-15 enzyme capable of performing this reaction. In contrast to its B12-dependent analog, PD contains no cofactor. Instead, it relies on its backbone glycyl radical for catalysis. The highly similar GRE glycerol dehydratase (GD) was initially proposed to operate like the B12-dependent

1,2-diol dehydratases, with a thiyl radical intermediate abstracting a C1 hydrogen atom from glycerol, and glutamate and histidine residues facilitating hydroxyl group migration.16 However, a quantum mechanics/molecular mechanics study cast doubt on this proposal, revealing that

114 shifting the C2 hydroxyl group closer to C1 has a barrier of >20 kcal/mol versus a barrier of

<6 kcal/mol for eliminating the C2 hydroxyl group from the initial α-hydroxylalkyl substrate- based radical.17 More recent computational work has also raised doubts that GRE dehydratases can catalyze hydroxyl migration, as B12-PD is predicted to lower the barrier to migration substantially, while PD is unable to replicate that reactivity.18

18 In this Chapter, I report O labeling experiments with PD and B12-PD to compare their mechanisms of catalysis. Assays with 18O-labeled substrates support a direct elimination mechanism for PD, as no evidence of hydroxyl migration was detected. Results with B12-PD obtained from heterologous expression as well as from the native host corroborate the involvement of hydroxyl migration. These experiments reveal how PD and other GREs important to human health perform challenging transformations and illustrate mechanistic differences between GREs and B12-dependent enzymes that have evolved identical functions.

3.2: Results and discussion

3.2.1: PD-AE has one [4Fe–4S] cluster that is similar to other GRE activating enzymes

PD and PD-AE were heterologously overexpressed and purified as described in Chapter

2. In brief, the genes encoding for these enzymes were cloned from gDNA from Roseburia inulinivorans A2-194. The gene encoding PD was ligated into pET-29b to express C-His6-tagged

PD, and the gene encoding PD-AE was ligated into pET-28a to express N-His6-tagged PD-AE.

PD was overexpressed aerobically and sparged with argon before use, while PD-AE was overexpressed and purified under anaerobic conditions. I had previously confirmed that PD catalyzes (S)-1,2-propanediol dehydration and contains a glycyl radical under anaerobic conditions upon activation by PD-AE and that PD,2 but it was an open question how biochemically similar PD and PD-AE are to other GREs. Most notably, the characterized GRE

115 and GRE-AE pair with the highest sequence similarity to PD and PD-AE, glycerol dehydratase

(GD) (53% amino acid identity) and GD-AE (35% amino acid identity), seemed unique from other GREs. Most notably, GD-AE had been reported to cleave S-adenosylmethionine (SAM) into 5ʹ-deoxy-5ʹ-(methylthio)adenosine and 2-aminobutyrate.19 Nearly all other radical SAM

20 enzymes cleave SAM into 5ʹ-deoxyadenosine (5ʹ-dA) and L-methionine, with the sole and notable exception of diphthamide synthase.21 I hoped to clarify how PD-AE reacts with SAM and identify which SAM derivatives this enzyme forms.

I began by biochemically characterizing the [4Fe–4S] cluster of PD-AE. PD-AE possessed the CX3CX2C motif characteristic of radical SAM enzymes, and this was the sole predicted [4Fe–4S] binding motif in PD-AE. Its ultraviolet–visible spectra in the presence and absence of an external reductant were consistent with previously reported radical SAM enzymes

(Figure 3.3).22-23 After reconstitution, the cluster is in the [4Fe–4S]2+ state and has a characteristic absorbance at ~410 nm. The addition of sodium dithionite reduces the cluster to the [4Fe–4S]+ state, which is accompanied by the disappearance of the peak at ~410 nm. The amount of iron and sulfur in the PD-AE preparations were also measured, and the results were consistent with PD-AE containing one [4Fe–4S] cluster (2.80 ± 0.03 equiv of iron and 2.71 ±

0.02 equiv of sulfur per PD-AE monomer). The levels of incorporated iron and sulfide suggest this enzyme preparation contains a mixture of properly reconstituted and inactive [4Fe–4S] clusters, which is not uncommon for in vitro preparations of these enzymes.23

116

30

)

1

- m

c 10 µM PD-AE

1 -

M 20 10 µM PD-AE m

( +100 µM Sodium Dithionite

y

t

i

v

i

t

p r

o 10

s

b

a

r

a

l o

M 0 300 400 500 600 700 Wavelength (nm)

Figure 3.3: UV–Vis spectrum of PD-AE (blue) and PD-AE (red) reduced with sodium dithionite.

3.2.2: PD-AE catalyzes the formation of 5ʹ-deoxyadenosine from S-adenosylmethionine

I incubated PD-AE with SAM in the presence of the reductants 5-deazariboflavin and dithiothreitol and verified that SAM was cleaved to 5ʹ-dA using high-performance liquid chromatography (HPLC) (Figure 3.4) and liquid chromatography-mass spectrometry (LC–MS)

(Figure 3.5), as is typical for radical SAM enzymes. The activated [4Fe–4S]+ performs a one-

• electron reduction of SAM, cleaving a C–S bond to produce L-methionine and 5ʹ-dA . The latter species is ultimately responsible for abstracting a hydrogen atom from PD to form the glycyl radical. As mentioned, GD-AE was reported to produce methylthioadenosine and 2- aminobutyrate via an alternative mode of SAM cleavage.19 I was unable to detect methylthioadenosine in the HPLC assays and observed only background levels of this compound using LC–MS. These experiments confirm that PD-AE cleaves SAM in a similar manner to other

GRE activating enzymes and radical SAM enzymes.24

117

Full Assay

No PD-AE

No 5-Deazariboflavin

5'-dA 5'-dA Standard

MTA MTA Standard

5-Deazariboflavin 5-Deazariboflavin Standard

4 6 8 10 12 14 16 Time (min)

Figure 3.4: HPLC assays for determination of SAM cleavage products by PD-AE. Compounds were detected by monitoring the absorbance at 260 nm.

118

Figure 3.5: LC–MS assays for determination of SAM cleavage products by PD-AE. Extracted ion chromatograms of 5ʹ-deoxyadenosine and methylthioadenosine in full assays and controls indicate robust 5ʹ-deoxyadenosine production and only background levels of methylthioadenosine generation.

3.2.3: PD forms a dimer in solution

As described in Chapter 2, I had previously verified that after activation by PD-AE, PD contains a glycyl radical and that activated PD catalyzes the dehydration of (S)-1,2-propanediol.

These results, combined with the SAM cleavage experiments and [4Fe–4S] cluster characterization data, support that PD is highly similar to other characterized GREs. Another conserved feature of most GREs is that their active oligomeric state is dimeric. GD,16 choline

119 trimethylamine-lyase,23 pyruvate formate-lyase,25 and ribonucleotide reductase (Class III)26 are all active as dimers. Measuring theses states is important for quantifying GRE activation, as these dimeric enzymes display “half of sites” reactivity. The measured activation of these GREs is always ≤1 glycyl radical per dimer, implying only one GRE monomer can be active at a given time in the GRE dimer.27 To both determine how similar PD is biochemically to those other

GREs and to better quantify the extent of PD activation, size exclusion chromatography was performed to determine the native oligomeric state of PD (Figure 3.6). The calculated molecular weight of the major peak in the chromatogram is consistent with the PD dimer. The oligomeric state of PD is therefore consistent with other GREs.

Molecular Weight Standards PD

0.25 0.6

0.2 0.5 0.4 0.15 0.3 0.1 0.2

0.05 0.1

Absorbance Absorbance 280 at nm Absorbance 280 at nm

0 0 0 100 200 300 0 100 200 300 Elution Volume (mL) Elution Volume (mL)

Figure 3.6: Purification of PD and determination of native molecular mass. Molecular weight markers used were bovine thyroglobulin (670 kDa), bovine γ- globulin (158 kDa), chicken ovalbumin (44 kDa), horse myoglobin (17 kDa), and vitamin B12 (1,350 Da). The molecular weights of the protein fractions collected after gel filtration were calculated by fitting a second order polynomial between elution volume and log(molecular weight). The observed molecular weight of the major peak is 225 kDa. The expected mass of the N-terminal His6-tagged PD construct is 96.2 kDa.

120

3.2.4: PD catalyzes 1,2-diol dehydration of (S)-1,2-propanediol and other similar substrates

As described in Chapters 1 and 2, PD was predicted to dehydrate (S)-1,2-propanediol as a part of L-fucose metabolism, and so I wanted to verify that PD could catalyze this reaction and determine if any other 1,2-diols could serve as substrates for this enzyme. Gas chromatography– mass spectrometry (GC–MS) assays confirmed that activated PD converted (S)-1,2-propanediol to propionaldehyde in vitro, and additional GC–MS assays were performed to access the reactivity of PD towards alternative diol substrates (Figure 3.7). Ethylene glycol and

(S)-1,2-butanediol were both transformed into the corresponding aldehydes, while meso-2,3-butanediol was converted to 2-butanone. I did not detect the formation of the alternative propanediol dehydration product acetone.15 Glycerol, (R,R) and (S,S)-2,3-butanediol, and (R)-1,2-butanediol were not processed by PD. Additional experiments revealed that PD dehydratases ethylene glycol and (S)-1,2-propanediol with similar kinetics parameters (for

5 -1 -1 ethylene glycol: kcat = 2,700 ± 416, Km = 27 ± 3, kcat/Km = 1.0 ± 0.2 × 10 M s ; rates with the native substrate were previously reported).2

121

) )

Reaction with ) Reaction with Reaction with

m/z m/z Ethylene Glycol m/z (S)-1,2-Propanediol (R)-1,2-Propanediol

Acetaldehyde Propionaldehyde Propionaldehyde

Standard Standard Standard

Counts (SIR at 29 29 at (SIR Counts

Counts (SIR at 29 29 at (SIR Counts Counts (SIR at 29 29 at (SIR Counts

1.3 1.4 1.5 1.6 2 2.5 3 2 2.5 3

Time (min) Time (min) Time (min)

)

)

) Reaction with Reaction with Reaction with m/z

Glycerol m/z (S)-1,2-Butanediol (R)-1,2-Butanediol m/z

3-Hydroxypropanal Butyraldehyde Butyraldehyde

Standard Standard Standard

Counts (SIR at 44 44 at (SIR Counts

Counts (SIR at 44 44 at (SIR Counts Counts (SIR at 43 43 at (SIR Counts

5 5.5 6 4 4.5 5 4 4.5 5

Time (min) Time (min) Time (min)

) ) )

m/z m/z m/z Reaction with Reaction with Reaction with meso-2,3- (S,S)-2,3- (R,R)-2,3- Butanediol Butanediol Butanediol

2-Butanone 2-Butanone 2-Butanone

Standard Standard Standard

Counts (SIR at 43 43 at (SIR Counts 43 at (SIR Counts 43 at (SIR Counts

4.6 4.7 4.8 4.9 4.6 4.7 4.8 4.9 4.6 4.7 4.8 4.9 Time (min) Time (min) Time (min)

Figure 3.7: GC–MS assays for PD activity with alternative potential diol substrates. Traces are of the selected ion recording (SIR) chromatograms for activated PD with the indicated substrate.

122

3.2.5: Stable isotope labeling experiments support a direct elimination mechanism for PD

Because the mechanism of B12-PD has been extensively studied and there was some evidence that GRE dehydratases cannot catalyze dehydration in the same manner as B12-PD,17-18

I used stable isotope labeling experiments to study the mechanism of PD (Figure 3.8A). 18O labeled substrates have been used previously to study B12-dependent dehydratases, and those

11,13 experiments were crucial for establishing that B12-PD mediates hydroxyl group migration. To explore the possibility of a hydroxyl migration in the PD-catalyzed reaction, I used the Jacobsen hydrolytic kinetic resolution to synthesis all four possible 18O-labeled 1,2-propanediol stereoisomers in high enantiomeric excess (≥98% ee) (Figure 3.8B).28 Although similar substrates have been constructed previously, my 18O-labeled 1,2-propanediols are labeled to a much greater extent (>95% vs. ~10%).13 In addition, syntheses of enantioenriched

1,2-(2-18O)propanediols have not been reported previously and so have never been tested with these enzymes.

Figure 3.8: Design of 18O-labeling experiments to probe PD mechanism. (A) Two potential mechanisms for radical enzyme-mediated diol dehydration. (B) Asymmetric synthesis of differentially 18O-labeled 1,2-propanediol substrates. 123

I incubated the four isotopomers of 1,2-(18O)propanediol individually with activated PD and yeast alcohol dehydrogenase and used GC–MS/MS to assess the presence of 18O in the 1- propanol product. I observed nearly complete retention of the label with both enantiomers of

1,2-(1-18O)propanediol and nearly complete loss of the label with both enantiomers of

1,2-(2-18O)propanediol (Table 3.1, Figure 3.9, and Figure 3.10). These results are distinct from those observed by Arigoni and co-workers for B12-PD and do not afford analogous support for a mechanism involving hydroxyl migration.13 Instead, they suggest GRE-mediated dehydration reactions proceed in a manner different from those performed by B12-dependent enzymes. A direct elimination mechanism, where the C2 hydroxyl group is eliminated without first migrating to C1, is more consistent with these results.

Table 3.1: Dehydration of 18O-labeled substrates. Samples contain 1,2-Propanediol (10 mM), PD or B12-PD (1 µM), yeast alcohol dehydrogenase (35 µM), NADH (10 mM), adenosylcobalamin (B12- PD only) (15 µM), Tris-HCl (25 mM, pH 7.5), NaCl (50 mM), 1.5 h, 18 23 °C (PD), 37 °C (B12-PD). O-Enrichment was measured by GC– MS. Mean ± SD are from three replicates.

Oxygen-18 Enrichment (%) 1-Propanol Substrate 1,2-Propanediol K. oxytoca PD B -PD 12 extract

96.1 ± 0.4 91.4 ± 1.1 51.7 ± 0.1 61 ± 2

96.1 ± 0.5 92.2 ± 0.5 49.6 ± 1.4 13.8 ± 0.2

96 ± 3 2.5 ± 0.4 47.7 ± 0.3 14.2 ± 0.3

94 ± 5 2.5 ± 0.4 48.9 ± 0.2 48 ± 5

124

A )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

B )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

C )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

Figure 3.9: EI spectra of 1-propanol derived from the reaction of activated PD with different enantioenriched 1,2-propanediol isotopologues. (A) (S)-1,2- propanediol, (B) (R)-1,2-propanediol, and (C) (S)-1,2-(1-18O)propanediol. Experiments were done in triplicate. Representative spectra are shown.

125

A )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

B )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

C )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

Figure 3.10: EI spectra of 1-propanol derived from the reaction of activated PD with additional enantioenriched 1,2-propanediol isotopologues. (A) (R)-1,2-(1- 18O)propanediol, (B) (S)-1,2-(2-18O)propanediol, and (C) (R)-1,2-(2- 18O)propanediol. Experiments were done in triplicate. Representative spectra are shown.

3.2.6: Stable isotope labeling experiments confirm a hydroxyl group migration mechanism is operative for B12-PD

To more directly compare the evidence for the two mechanistic proposals, I also tested the reactivity of B12-PD toward these isotope-labeled substrates. Previous work with this enzyme has supported a mechanism involving hydroxyl group migration,11 and obtaining different results

126 with PD and B12-PD would suggest the two enzymes operate with distinct mechanisms.

Crucially, the 18O-labeled substrates used for these studies are labeled to a significantly greater extent than those previously used,13 and enantioenriched 1,2-(2-18O)propanediol has never been tested with B12-PD. I heterologusly expressed B12-PD from Klebsiella oxytoca ATCC 8724 and verified that this enzyme dehydrated 1,2-propanediol using previously reported procedures.29 I

18 then incubated B12-PD and yeast alcohol dehydrogenase with the O-labeled substrates.

Regardless of which 1,2-(18O)propanediol substrate was used, the product 1-propanol always contained ~50% 18O (Table 3.1, Figure 3.11, and Figure 3.12). This scrambling of oxygen atoms supports the existence of the 1,1-gem diol intermediate and a mechanism involving hydroxyl group migration.

Surprisingly, the results obtained with the purified B12-PD were inconsistent with the original 18O labeling experiments performed by Arigoni et al., who used cell-free extracts of K.

13 oxytoca ATCC 8724 instead of purified B12-PD as well as less extensively labeled substrates.

To explain this inconsistency, I replicated their work using the same strain of K. oxytoca and generating cell-free extracts according to an Abeles group protocol.30 When the cell-free extracts,

NADH, and alcohol dehydrogenase were incubated with (S)-1,2-(1-18O)propanediol and

(R)-1,2-(1-18O)propanediol, I observed 61% and 14% retention of the 18O label, respectively

(Table 3.1, Figure 3.13, and Figure 3.14). Complementary results were obtained with both enantiomers of 1,2-(2-18O)propanediol, confirming that the fate of each oxygen atom depends on the substrate’s stereochemistry. These experiments are consistent with the results from the

Arigoni group, yet which component(s) of the cell-free extract alters the stereochemical outcome of this reaction remains unclear. Regardless, these experiments with purified B12-PD confirm that this enzyme mediates 1,2-diol dehydration via hydroxyl group migration.

127

A )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

B )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

C )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

Figure 3.11: EI spectra of 1-propanol derived from the reaction of B12-PD with different enantioenriched 1,2-propanediol isotopologues. (A) (S)-1,2-propanediol, (B) (R)-1,2-propanediol, and (C) (S)-1,2-(1-18O)propanediol. Experiments were done in triplicate. Representative spectra are shown.

128

A )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

B )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

C )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

Figure 3.12: EI spectra ofEI spectra of 1-propanol derived from the reaction of B12-PD with additional enantioenriched 1,2-propanediol isotopologues. (A) (R)- 1,2-(1-18O)propanediol, (B) (S)-1,2-(2-18O)propanediol, and (C) (R)-1,2-(2- 18O)propanediol. Experiments were done in triplicate. Representative spectra are shown.

129

A )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

B )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

C )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

Figure 3.13: EI spectra of 1-propanol derived from the reaction of cell-free extracts of K. oxytoca with different enantioenriched 1,2-propanediol isotopologues. (A) (S)-1,2-propanediol, (B) (R)-1,2-propanediol, and (C) (S)-1,2- (1-18O)propanediol. Experiments were done in triplicate. Representative spectra are shown.

130

A )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

B )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

C )

% 100

(

y

t

i

s

n e

t 50

n

I

e

v

i

t a

l 0 e

R 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 m/z

Figure 3.14: EI spectra of 1-propanol derived from the reaction of cell-free extracts of K. oxytoca with additional enantioenriched 1,2-propanediol isotopologues. (A) (R)-1,2-(1-18O)propanediol, (B) (S)-1,2-(2-18O)propanediol, and (C) (R)-1,2-(2-18O)propanediol. Experiments were done in triplicate. Representative spectra are shown.

3.2.7: Thermodynamic analysis of intermediates generated during 1,2-propanediol dehydration

Chemical differences between the glycyl radical in PD and the adenosylcobalamin cofactor in B12-PD could explain why these enzymes evolved distinct mechanisms to catalyze

• the same transformation. To initiate catalysis, B12-PD uses the highly reactive 5ʹ-dA [5ʹ-dA–H bond dissociation energy (BDE) of 99.8 kcal/mol]31 to abstract the C1 hydrogen of

131

(S)-1,2-propanediol (C–H BDE of 92 kcal/mol).32 Although the thiyl radical in PD is less reactive than 5ʹ-dA• (S–H BDE of ~87 kcal/mol),33 hydrogen bonding between E440 and the C1 hydroxyl group may decrease the C1–H BDE.32 Similarly, the propanal-2-yl radical intermediate formed by PD would not be reactive enough (C–H BDE of ~90 kcal/mol)34 to abstract a hydrogen atom from 5ʹ-dA to regenerate the 5ʹ-dA• cofactor, but it is able to regenerate the thiyl radical due to the lower BDE of the S–H bond. Conversely, the product-centered radical generated by B12-PD cannot be delocalized and is therefore reactive enough (C–H BDE of

~98.6 kcal/mol)33 to regenerate the adenosylcobalamin cofactor. In addition, while the Ca2+ ion in B12-PD lowers the barrier to hydroxyl migration, no such Lewis acid is found in PD, further suggesting key mechanistic differences.35

3.2.8: Mechanistic proposal for PD

The results of these experiments allow for the proposal of a mechanism for PD (Figure

3.15). Analogously to other GREs, the glycine-centered radical, derived from G817, abstracts a hydrogen atom from C438, generating a thiyl radical that in turn abstracts a hydrogen atom from

C1 of the 1,2-propanediol substrate. Results from our 18O experiments and previous calculations with GD suggest that PD catalyzes the direct elimination of the C2 hydroxyl group from this substrate-based radical.17-18 This spin-center shift is likely triggered by deprotonation of the C1 hydroxyl group. The pKa values of α-hydroxyalkyl radicals are ~5 units more acidic than the corresponding alcohols,36 and E440 is well-positioned to perform this chemistry. The electron density from this ketyl radical fills the σ* of the C2–O bond, triggering loss of the hydroxyl group, which is likely protonated by H166. The direct elimination of a hydroxyl group adjacent to an α-hydroxyalkyl radical has precedence in enzymatic catalysis, as all known types of ribonucleotide reductase demonstrate this reactivity. To complete the reaction, the propanal-2-yl

132 radical abstracts a hydrogen atom from C817, regenerating the protein-based radical intermediates.

Figure 3.15: Proposed mechanism for PD.

Structural data also corroborate this proposed mechanism for PD. The crystal structure of

PD with (S)-1,2-propanediol bound reveals how C438 can interact with G817 and the pro-S

C1-hydrogen atom of the substrate due to its close proximity to both (≤4.0 Å), supporting its role in reaction initiation (Figure 3.16).15 The dihedral angle between the two hydroxyl groups of

(S)-1,2-propanediol in the crystal structure is 57°, and we can thus infer that the pro-S hydrogen atom of C1 is antiperiplanar (~177°) to the C2 hydroxyl group, facilitating dehydration.

Furthermore, E440 and H166 are well-positioned to deprotonate and protonate the C1 and C2 hydroxyl groups, respectively. These residues are essential for PD activity and are conserved in all characterized GRE dehydratases.2 Moreover, choline trimethylamine-lyase binds choline in a

133 manner similar to how PD binds (S)-1,2-propanediol,37 suggesting GRE dehydratases and other

1,2-eliminases use similar mechanisms. Though the labeling experiments presented here do not completely rule out the possibility of migration followed by stereospecific dehydration of a

1,2-gem diol intermediate, these studies together with computational and structural studies provide strong support for a direct elimination mechanism.

Figure 3.16: Crystal structure of PD with substrate bound (PDB 5I2G). C438 is close to both G817 and substrate to mediate H atom transfers. E440 is predicted to deprotonate the C1-hydroxyl group, while H166 is proposed to protonate the C2- hydroxyl group. The dihedral angle between the hydroxyl groups is 57.4°, positioning the substrate to facilitate direct elimination of the C2-hydroxyl group from an initial substrate-based radical.

3.2.9: Conclusions

The wide distribution of GREs and their roles in important microbial metabolic processes underscore the importance of understanding their mechanisms. PD and B12-PD both perform the same reaction and are both encoded by the healthy human gut microbiome, although PD is present to a significantly greater extent.2 Selectively manipulating their activity in vivo could help to study the biological roles of this activity and potentially have therapeutic applications.

Despite PD’s prominence in anaerobic microorganisms and B12-PD’s distribution in pathogens, it

134

18 remained unclear if the mechanism of PD was similar to that of B12-PD. The O labeling experiments presented here suggest that PD and B12-PD employ distinct mechanisms. Unlike

B12-PD, PD appears to catalyze the direct elimination of a hydroxyl group from an initially formed substrate-based radical, avoiding the generation of a 1,1-gem diol intermediate. These results provide insight into the mechanisms of PD and homologous GREs, which could be used to develop inhibitors of these enzymes. In particular, there is significant interest in inhibiting gut microbial choline trimethylamine-lyase because the trimethylamine it produces is metabolized by the host to trimethylamine-N-oxide, which is linked to several disease states.38-40

Two other characterized GREs, choline trimethylamine-lyase and ribonucleotide reductase (Class III), have B12-dependent counterparts [ethanolamine ammonia-lyase and ribonucleotide reductase (Class II)], hinting that these protein families may have additional functional overlap. Future efforts to understand the functions of both enzyme families will continue to enhance our understanding of host–microbe interactions as they pertain to carbohydrate metabolism.

3.3: Materials and methods

3.3.1: Materials and general methods

All chemicals and solvents were purchased from Sigma-Aldrich, except where otherwise noted. Luria-Bertani Lennox (LB) medium was purchased from Alfa Aesar. Water and solvents used for GC–MS and LC–MS were B&J Brand high-purity solvents (Honeywell Burdick &

Jackson). DNA sequencing results were analyzed with Geneious Pro 11.0.5 (Biomatters).

Primers were purchased from Sigma Aldrich and DNA sequencing was performed by Eton

Bioscience. Proton nuclear magnetic resonance (1H NMR) spectra were recorded in the Magnetic

Resonance Laboratory in the Harvard University Department of Chemistry and Chemical

135

Biology on a Varian Mercury-400 (400 MHz) NMR spectrometer. Chemical shifts are reported in parts per million downfield from tetramethylsilane using the solvent resonance as an internal

1 standard for H (CDCl3 = 7.26 ppm). Data are reported as follows: Chemical shift, integration multiplicity (s = single, d = doublet, t = triplet), coupling constant, and integration. NMR spectra were visualized using ACD/NMR Processor Academic Edition.

3.3.2: UV–Vis spectroscopy of PD-AE

To obtain UV-Vis spectra, PD-AE was diluted to 10 µM with anoxic buffer (25 mM Tris-

HCl, pH 8, 50 mM NaCl) inside of an anaerobic chamber containing N2 and <5 ppm O2

(MBraun). The absorbance of the solution was measured from 260 nm to 800 nm in a quartz

96-well plate using a PowerWave HT Microplate Spectrophotometer (Biotek). To obtain a spectrum for the reduced protein, a 10 µM PD-AE solution was incubated with 100 µM sodium dithionite in the same anoxic buffer for 20 min before the absorbance was measured. This experiment was performed in triplicate, and the average and standard deviation of these assays were reported.

3.3.3: Quantification of iron content of PD-AE

The iron content of a 6 µM solution of PD-AE was determined using Ferene (3-(2-

Pyridyl)-5,6-di(2-furyl)-1,2,4-triazine-5′,5′′-disulfonic acid disodium salt), according to a previously published procedure,41 with the only differences being that the assay volume was tripled and the standard curve was prepared with ammonium iron(II) sulfate hexahydrate. This experiment was performed in triplicate, and the average and standard deviation of these assays were reported.

136

3.3.4: Quantification of sulfide content of PD-AE

The sulfide content of a 6 µM solution of PD-AE was determined using a previously published procedure42 with the following differences: Assays were performed in microcentrifuge tubes, vortexing was performed instead of stirring, and the mixture was incubated for 20 min after the addition of NaOH.

3.3.5: HPLC assay for detection of S-adenosylmethionine cleavage products

HPLC samples were prepared by incubating PD-AE (70 µM), S-adenosylmethionine

(SAM) (500 µM), 5-deazariboflavin (70 µM), and dithiothreitol (DTT) (10 mM) in anoxic buffer

(25 mM Tris-HCl, pH 7.5, 50 mM NaCl) in a total volume of 50 µL in an anaerobic chamber containing 97% N2 and 3% H2 (Coy Laboratory Products). Samples were incubated at room temperature for 3 h, then quenched by the addition of formic acid (5 µL). The samples were incubated on ice for 10 min and centrifuged (16,100×g, 10 min). The supernatant was analyzed by analytical HPLC on a Dionex Ultimate 3000 instrument (Thermo Scientific) on a Chromolith

HighResolution RP-18e column (4.6 × 100 mm) (EMD Millipore). 40 µL of each sample was injected onto the column. The flow rate was 1 mL min-1 using 10 mM ammonium acetate in water (pH 6) as mobile phase A and acetonitrile as mobile phase B. The column was maintained at room temperature. The following gradient was applied: 0–2 min: 0% B isocratic, 2–9 min: 0–

21% B, 9–9.1 min: 21–85% B, 9.1–11.6 min: 85% B isocratic, 11.6–11.7 min: 85–0% B, 11.7–

17.2 min: 0% B isocratic. SAM, 5′-deoxyadenosine (5′-dA), methylthioadenosine (MTA), and

5-deazariboflavin were detected by measuring the absorbance at 260 nm. The SAM standard was run at 500 µM, 5′-dA and MTA standards were run at 100 µM, and the 5-deazariboflavin standard was run at 70 µM.

137

3.3.6: LC–MS assays for detection of S-adenosylmethionine cleavage products

LC–MS samples were prepared by combining PD-AE (70 µM), SAM (500 µM),

5-deazariboflavin (70 µM), and DTT (10 mM) in anoxic buffer (25 mM Tris-HCl, pH 8, 50 mM

NaCl) in a total volume of 100 µL in an anaerobic chamber. Samples were incubated at room temperature for 3 h. They were subsequently quenched by the addition of 0.4 mL of acetonitrile, incubated on ice for 10 min, centrifuged (16,100×g, 10 min), flash frozen in liquid nitrogen, and lyophilized. The samples were redissolved in 50 μL of 1:1 water:methanol, vortexed, and centrifuged (16,100×g, 10 min). The supernatant was analyzed by LC–MS on an XTerra MS

C18 analytical column (2.1 × 100 mm, 3.5 μm) (Waters Corporation, Milford, MA). 8 μL of each sample was injected onto the column. The flow rate was 0.3 mL min-1 using 10 mM ammonium acetate in water (pH 6) as mobile phase A and acetonitrile as mobile phase B. The column temperature was maintained at room temperature. The following gradient was applied:

0–2 min: 0% B isocratic, 2–9 min: 0–21% B, 9–9.1 min: 21–85% B, 9.1–11.6 min: 85% B isocratic, 11.6–11.7 min: 85–0% B, 11.7–17.2 min: 0% B isocratic.

Simultaneous analysis of SAM, 5′-dA, and MTA by LC–MS was carried out on an

Agilent 1290 Infinity UHPLC system (Agilent Technologies, Palo Alto, CA) coupled to a maXis impact UHR time-of-flight mass spectrometer system (Bruker Daltonics Inc, Billerica, MA) equipped with an electrospray ionization (ESI) source. The UHPLC system included a G4220A binary pump with a built-in vacuum degasser and thermostatted G6226A high performance autosampler. Data were acquired with Bruker Daltonics HyStar software version 3.2 for UHPLC and Compass OtofControl software version 3.4 for mass spectrometry and processed with Bruker

Compass DataAnalysis software version 4.2. For the MS system, the ESI mass spectra data were recorded on a positive ionization mode for a mass range of m/z 50 to 1200; calibration mode,

138

HPC; spectra rate, 1.00 Hz; capillary voltage, 3800 V; nebulizer pressure, 25.0 psi; drying gas

-1 (N2) flow and temperature, 9.3 L min and 220 °C respectively. A mass window of ±0.005 Da was used to extract the ion of [M+H]+. Targets were considered detected when the mass accuracy was less than 5 ppm, there was a match of isotopic pattern between the observed and the theoretical patterns, and a match of retention time between those in experimental samples and standards.

Table 3.2: LC–MS analysis of standards used for assays. Standard Formula Charge State Calc. Mass Obs. Mass Error (ppm) + 5′-Deoxyadenosine C10H14N5O3 [M+1] 252.1091 252.1084 2.8 + Methylthioadenosine C11H16N5O3S [M+1] 298.0968 298.0965 1.2

3.3.7: Gel filtration chromatography for analysis of oligomeric state of PD

PD was purified as previously described using Ni–NTA affinity chromatography.2 The protein was further purified by preparative size exclusion chromatography on a HiLoad 26/600

Superdex 200 pg column (GE Healthcare) using a Biologic DuoFlow Chromatography System

(BioRad). A 5 mL loop was used to inject the protein onto the column. An isocratic method was used (buffer was 50 mM Tris-HCl at pH 7, 150 mM NaCl, and 10 mM MgCl2), the flow rate was kept at 0.5 mL min-1, and 3 mL fractions were collected. After FPLC purification, fractions corresponding to the major peak were combined and concentrated using a Spin-X® UF 20 mL centrifugal concentrator with a 30 kDa MWCO membrane (Corning®), frozen in liquid nitrogen, and stored at –80 °C. This procedure yielded about 3.5 mg L-1 of PD. Protein concentration was determined according to the method of Bradford43 with a BSA standard or by using a NanoDrop

2000 UV-Vis Spectrophotometer (Thermo Scientific) with an extinction coefficient calculated by

Geneious Pro 7.1.5, with both methods of quantification yielding similar results.

139

3.3.8: GC–MS assays for accessing the substrate scope of PD

All GC–MS experiments were conducted with an inlet helium carrier gas flow rate of

2.3 mL min–1 in the constant flow mode with a fused-silica capillary column of cross-linked DB-

624UI (30 m × 0.32 mm × 1.80 µm, Agilent, Santa Clara, CA) on a Quattro micro GC Mass

Spectrometer (Waters, Milford, MA) equipped with a Combi PAL autosampler (CTC Analytics,

Zwingen, Swtizerland) and a split/splitless injector. For headspace-GC/MS experiments, 1 mL was injected into the GC–MS via a transfer syringe held at 120 °C, the needle flush time was

120 s, a 1 mm straight single taper Ultra Inert liner (Agilent) was used, the inlet and transfer line temperatures were set at 220 °C, and the ion source temperature was set at 200 °C. All GC–MS data were acquired and analyzed with Waters MassLynx V4.1 software package.

Activated PD was prepared by first incubating 25 mM Tris-HCl at pH 8, 50 mM NaCl,

10 mM DTT, 70 µM 5-deazariboflavin, and 70 µM PD-AE for 20 min under ambient light inside an anaerobic chamber (Coy Laboratory Products). Then, 70 µM PD (sparged with argon for 30 min) and 750 µM SAM were added, and the solution was incubated under ambient light for 2 h.

Assays for each substrate were prepared by incubating 25 mM Tris-HCl at pH 8, 50 mM NaCl, 5 mM of the diol substrate, and 1 µM of the activated PD for 6 h. Following this, the 200 µL assays were transferred to 10 mL headspace vials containing 2.3 mL of water and 1.8 g NaCl and quickly sealed and capped tightly. The vials were kept at 4 °C until analyzed further.

Headspace (HS)-GC/MS was used for the simultaneous detection of acetaldehyde, propionaldehyde, acetone, butyraldehyde, 2-butanone, and 3-hydroxypropionaldehyde. The headspace extraction was carried out at 50 °C under agitation at 500 rpm for 15 min. The GC conditions were as follows: oven temperature program, 30 °C for 3 min, 13 °C min–1 to 90 °C,

50 °C min–1 to 250 °C; split ratio, 20:1. The electron impact (EI)-MS conditions were as follows:

140 full scan m/z range, 10-100 Da at 1.31-10.82 min; selected ion recording (SIR) mode at 0-10.82 min, m/z 29 for detection of acetaldehyde and propionaldehyde, m/z 43 for detection of acetone and 2-butanone, m/z 44 for detection of butyraldehyde, and m/z 46 for detection of 3- hydroxypropionaldehyde. The SIR peaks for the ions indicated above, at a retention time of

∼1.44, 2.46, 2.57, 4.74, 4.51, and 5.50 min, was used for the final detection of acetaldehyde, propionaldehyde, acetone, 2-butanone, butyraldehyde, and 3-hydroxypropionaldehyde, respectively.

3.3.9: GC–MS assays for determination of 18O-enrichment of 1-propanol

Assays with PD and 18O-labeled substrates were prepared as follows. First, PD-AE

(70 µM), 5-deazariboflavin (70 µM), and DTT (10 mM) were incubated in anoxic buffer (25 mM Tris-HCl, pH 7.5, 50 mM NaCl) for 20 min under ambient light inside of an anaerobic chamber (Coy Laboratory Products). Then, PD (70 µM) (previously sparged with argon for 30 min) and SAM (500 µM) were added and the solution was incubated under ambient light for 2 h.

Assays for each substrate were prepared by incubating PD (0.7 µM), 1,2-propanediol (1 mM),

NADH (10 mM), and yeast alcohol dehydrogenase (35 µM) in buffer (25 mM Tris-HCl, pH 7.5,

50 mM NaCl). After 90 min, the 200 µL assay mixtures were transferred to 10 mL headspace vials containing 2.3 mL water and 1.8 g NaCl. The vials were immediately capped and sealed tightly, and they were kept at 4 °C until analyzed.

Assays with B12-PD were prepared as follows. Samples (total volume 200 µL) containing

B12-PD (0.8 µM), adenosylcobalamin (15 µM), 1,2-propanediol (1 mM), NADH (10 mM), and yeast alcohol dehydrogenase (35 µM) in buffer (35 mM potassium phosphate, pH 8, 50 mM

KCl) were incubated at 37 °C for 90 min. The assay mixtures were quenched by adding them to

141

10 mL headspace vials containing 2.3 mL water and 1.8 g NaCl. The vials were immediately capped and sealed tightly, and they were kept at 4 °C until analyzed.

Assays with cell-free extracts derived from Klebsiella oxytoca ATCC 8724 were prepared in the same manner as the B12-PD samples, except lyophilized cell-free extract (2 mg) was added instead of B12-PD. Assay mixtures were quenched by adding them to 10 mL headspace vials containing 2.3 mL water and 1.8 g NaCl. The vials were immediately capped and sealed tightly, and they were kept at 4 °C until analyzed.

Headspace GC–MS experiments were used for measuring the isotopic enrichment of

1-propanol. These experiments were performed with a Quattro micro GC Mass Spectrometer

(Waters, Milford, MA) on a TRACE 1310 Gas Chromatograph with Q Exactive GC Orbitrap

(Thermo Scientific, Waltham, MA). Experiments performed on the Quattro micro GC Mass

Spectrometer were carried out as follows. The instrument was equipped with a Combi PAL autosampler (CTC Analytics, Zwingen, Switzerland) and split/splitless injector. An inlet helium carrier gas flow rate of 2.3 mL min-1 in the constant flow mode was used with a fused-silica capillary column of cross-lined DB-624UI (30 m × 0.32 mm × 1.80 μm, Agilent, Santa Clara,

CA). 1 mL was injected into the GC–MS via a transfer syringe held at 120 °C, the needle flush time was 120 s, a 1 mm straight single taper Ultra Inert liner (Agilent) was used, the inlet and transfer line temperatures were set at 220 °C, and the ion source temperature was set at 200 °C.

The headspace extraction was carried out at 105 °C under agitation at 500 rpm for 15 min. The

GC conditions were as follows: oven temperature program, 30 °C for 6 min, 50 °C min–1 to

250 °C for 0.6 min; split ratio, 10:1. The electron impact (EI)-MS conditions were as follows: full scan m/z range, 16-100 Da at 1.43-6.5 min; SIR mode at 0-6.5 min, m/z 29, 31, and 33, with

+ 31 and 33 representing [M-C2H5] for unlabeled and labeled 1-propanol respectively and 29 and

142

31 representing fragment ion of unlabeled and labeled 1-propanol respectively.44 Data was acquired and analyzed with Waters MassLynx V4.1 software package.

Experiments performed on the TRACE 1310 Gas Chromatograph with a Q Exactive GC

Orbitrap were performed as follows. The headspace extraction was carried out at 98 °C under agitation on an autosampler (Thermo Scientific TriPlus RSH) for 10 min. Then, 1 mL of the headspace sample was injected into the GC–MS system via a transfer syringe held at 120 °C. A fused-silica capillary column of cross-linked DB-624UI (30 m × 0.32 mm × 1.80 µm, Agilent) and a 1 mm straight single taper Ultra Inert liner (Agilent) were used. The GC conditions were as follows: inlet and transfer line temperatures, 220 °C; oven temperature program, 30 °C for 6 min,

50 °C/min to 250 °C; inlet helium carrier gas flow rate, 2.3 mL/min; split ratio, 5:1. The electron impact (EI)-MS conditions were as follows: ion source temperature, 200 °C; full scan m/z range,

30 – 450 Da from 1.7 to 6.5 min; resolution, 60,000; AGC target, 1e6; maximum IT, 200 ms;

SIM scan m/z range, 58 – 62 Da from 1.7 to 6.5 mins resolution, 60,000; AGC target, 1e6; maximum IT, auto. A mass window of ±5 ppm was used to extract the EI fragment ion [M-H]+ of m/z 59.0491 and 61.0534 under the SIM scan mode for the detection of the unlabeled and labeled 1-propanol. The peak areas of the EIC were used to estimate the isotope enrichment.44

Data were acquired and analyzed with Thermo TraceFinder 4.1 software package.

3.3.10: Cloning of pduCDE for hetereologous overexpression

To overexpress B12-PD, the genes encoding it, pduCDE, were amplified from the pUCDD11 plasmid,45 provided by Prof. Takamasa Tobimatsu (Okayama University), using

PCR. The primers used are shown in Table 3.3. Reactions (total volume 50 µL) contained pUCDD11 (10 ng), forward primer (pdu-29b-F) (0.5 µM), reverse primer (pdu-29b-R) (0.5 µM),

DMSO (3%), and Phusion-HF Master Mix (25 µL) (New England Biolabs). PCR parameters

143 were as follows: initial denaturation (98 °C for 30 s), 30 cycles of denaturation (98 °C for 10 s), annealing (60 °C for 30 s), and extension (72 °C for 90 s), and a final extension (72 °C for

10 min). The vector pET-29b was also amplified by PCR. Reactions (total volume 50 µL) contained pET-29b (5 ng), forward primer (29b_pdu_NdeI_XhoI_F) (0.5 µM), reverse primer

(29b_pdu_NdeI_XhoI_R) (0.5 µM), and Phusion-HF Master Mix (25 µM). PCR parameters were as follows: Initial denaturation (98 °C for 30 s), 30 cycles of denaturation (98 °C for 10 s), annealing (63 °C for 30 s), and extension (72 °C for 2.75 min), and a final extension (72 °C for

10 min). DpnI (1 µL, 20 U total) (New England Biolabs) was added to the amplified vector and the solution was kept at 37 °C for 3 h. The amplified insert and vector were analyzed by agarose gel electrophoresis and purified using an Illustra GFX PCR DNA and Gel Band Purification Kit

(GE Healthcare).

Table 3.3: Primers used for cloning and sequencing. Primer Name Sequence (5′ to 3′) GTTTAACTTTAAGAAGGAGATATACATATGAGATCGAA pdu-29b-F AAGATTTGAAGCACTGGC GGATCTCAGTGGTGGTGGTGGTGGTGCTCGAGTTAATC pdu-29b-R GTCGCCTTTGAGTTTTTTACGC GCGTAAAAAACTCAAAGGCGACGATTAACTCGAGCACC 29b_pdu_NdeI_XhoI_F ACCACCACCACCACTGAGATCC GCCAGTGCTTCAAATCTTTTCGATCTCATATGTATATCT 29b_pdu_NdeI_XhoI_R CCTTCTTAAAGTTAAAC pduCDE-610-F GAAGAAGCCACCGAGCTGA pduCDE-1206-F GGTTCGCGAAGAGGACGT pduCDE-1709-F ATTGAAGACGTGCTCAGCGA pduCDE-2228-F CCGACGCTGAATGACCAGAT pduCDE-2588-F CCGCCCAGGATATGCGTATT

Gibson assembly was used to ligate the insert and vector. The reaction (total volume

20 µL) contained amplified pET-29b (75 ng), amplified pduCDE (103 ng), and Gibson

Assembly Master Mix (10 µL) (New England Biolabs). The reaction was kept at 50 °C for 1 h.

The assembled plasmid was used to transform chemically competent Escherichia coli TOP10

144 cells (Invitrogen). The sequence of the resulting plasmid was verified by DNA sequencing using the primers given in Table 3.3.

3.3.11: Heterologous overexpression and purification of B12-PD

29 B12-PD was overexpressed and purified following a reported protocol. Chemically competent E. coli BL21(DE3) was transformed with the B12-PD expression vector. Starter cultures of this strain were grown in LB containing kanamycin (50 µg/mL). A starter culture (20 mL) was used to inoculate 2 L of LB medium containing kanamycin (50 µg/mL) and

1,2-propanediol (0.1%). This culture was incubated at 37 °C with shaking at 175 rpm until reaching an OD600 = 0.9, at which point isopropyl β-D-1-thiogalactopyranoside (IPTG)

(Teknova) was added (final concentration 1 mM). The culture was incubated at 37 °C with shaking at 175 rpm for a further 6 h, and then the cells were harvested, washed with buffer (50 mM potassium phosphate, pH 8, 2% 1,2-propanediol), frozen with liquid nitrogen, and stored at

–80 °C.

The cells were thawed on ice and resuspended in lysis buffer (50 mM potassium phosphate, pH 8, 2 mM EDTA, 2 mM PMSF, 2% 1,2-propanediol). The cells were lysed by sonication, and the lysate was clarified by centrifugation (20 min, 17,400×g). The pellet was resuspended in lysis buffer, sonicated, and centrifuged (30 min, 17,400×g) two additional times to remove soluble impurities. Next, the pellet was resuspended in lysis buffer containing 0.2%

Brij 35 (Acros Organics), and then sonicated and centrifuged to remove additional impurities.

The 0.2% Brij 35 treatment was repeated one additional time. Finally, the pellet was resuspended in 20 mL of buffer (10 mM potassium phosphate, pH 8, 1% Brij 35, 2% 1,2-propanediol, 2 mM

EDTA, 1 mM PMSF), sonicated, and centrifuged (30 min, 100,000×g) to solubilize the B12-PD.

5 mL of the supernatant was transferred into a 10 kDa 12 mL Slide-A-Lyzer cassette (Thermo

145

Fisher) and dialyzed against buffer (50 mM potassium phosphate, pH 8, 0.1% Brij 35). SDS-

PAGE analysis (4-15% Tris-HCl gel, BioRad) was employed to determine the presence and purity of protein after dialysis was complete (Figure 3.17). Protein concentration was determined using a NanoDrop 2000 UV-Vis Spectrophotometer (Thermo Scientific) with an extinction coefficient calculated by Geneious Pro 11.0.5 or by using a Pierce BCA Protein Assay with a BSA standard (Thermo Fisher). The activity of the dialyzed B12-PD was verified using 3- methyl-2-benzothiazolinone hydrazone hydrochloride hydrate (MBTH) as previously described.46

Figure 3.17: SDS-PAGE analysis of heterologously expressed B12-PD. Precision Plus Protein All Blue Standards (BioRad) (lane 1) and purified B12-PD (Lane 2).

3.3.12: Preparation of cell-free extracts from Klebsiella oxytoca ATCC 8724

Cell-free extracts of Klebsiella oxytoca ATCC 8724 were prepared following a reported protocol.30 Lyophilized K. oxytoca ATCC 8724 (Microbiologics) was streaked onto nutrient agar and grown overnight at 37 °C. A single colony was used to inoculate 50 mL of nutrient broth, which was grown for 8 h at 37 °C. 10 mL of the starter culture was used to inoculate 2 L of anaerobic glycerol media (0.54% KH2PO4, 0.12% (NH4)2SO4, 0.04% MgSO4•7H2O, 1.5% glycerol, 0.2% yeast extract, adjusted to pH 7.1 with KOH, made to volume with tap water). The

146 culture was sealed tightly in a Fernbach flask and grown for 18 h at 37 °C with gentle shaking

(60 rpm). The cells were harvested by centrifugation (10 min, 6,000×g) resuspended in 20 mL of buffer (10 mM Tris-HCl, pH 8.0), and centrifuged once again (10 min, 6,000×g). Of the ~9 g of cells collected, 4 g of cells were resuspended in 10 mL of buffer (10 mM Tris-HCl at pH 8.0) and lysed by passage through a cell disruptor (Avestin Emulsiflex-C3) three times at ~10,000 psi. The lysate was treated with 0.5 g of Darco G-60 charcoal for 15 min with occasional stirring, and then centrifuged (30 min, 28,000×g). The charcoal treatment was repeated once more. After the final centrifugation, the supernatant was flash frozen in liquid nitrogen and lyophilized.

Activity of the lyophilized cell-free extracts towards 1,2-propanediol dehydration was verified with the previously reported MBTH assay.46

3.3.13: Synthetic methods and characterization data

5-Deazariboflavin (4.4 mg) was prepared as previously described47 and 1H NMR and

HRMS matched previously reported spectra.47-48 Unlabeled (S)-1,2-propanediol and

(R)-1,2-propanediol were prepared as described28 ((S)-1,2-propanediol: 4.71 g, 36%; (R)-1,2- propanediol: 4.97 g, 38%) and 1H NMR matched spectra of previously prepared compounds.

The enantiomeric excess of the synthesized 1,2-propanediols was determined by derivatizing samples with N-methyl-bis(trifluoroacetamide) (MBTFA) and using GC–FID. 1 μL of 1,2-propanediol was dissolved in 100 μL of N-methyl-bis(trifluoroacetamide) and heated at 80

°C for 30 min. After cooling to room temperature, the samples were diluted with 700 μL of acetonitrile. The GC–FID was performed with an Agilent 7890A gas chromatograph with a commercially available Chiraldex column (γ-TA; 30 m × 0.25 mm × 0.12 μm). The following method was used: at 60 °C and 7 psi, 1 μL was injected with a split ratio of 25:1, the temperature kept at 60 °C for 2 min, then increased to 75 °C at a rate of 1 °C min-1.

147

Liquid injection GC–MS was used for the measurement of isotopic enrichment of

1,2-(1-18O)propanediol by positive chemical ionization (PCI) mode with ammonia as the reagent gas and for the measurement of isotopic enrichment of 1,2-(2-18O)propanediol by electron impact (EI) mode. These experiments were conducted with an inlet helium carrier gas flow rate of 2.3 mL min–1 in the constant flow mode with a fused-silica capillary column of cross-linked

DB-624UI (30 m × 0.32 mm × 1.80 μm) on a Quattro micro GC Mass Spectrometer equipped with a Combi PAL autosampler and a split/splitless injector. 1 μL of the sample dissolved in 1 mL methanol was injected into the GC–MS. The GC conditions were as follows: inlet and transfer line temperatures, 240 °C; oven temperature program, 50 °C for 0.5 min, 10 °C min–1 to

105 °C, 50 °C min–1 to 250 °C for 3 min; split ratio, 15:1(PCI) and 200:1(EI). The MS conditions were as follows: electron energy, 45 eV (PCI), 70 eV (EI); ion source temperature,

120 °C (PCI), 200 °C (EI); full scan m/z range, 16-300 Da (PCI) & 16-100 Da (EI) at 3.75-11.9 min; SIR mode at 3.75–11.9 min, m/z of 92, 94, and 96 for PCI, m/z of 43, 45, and 47 for EI. m/z

+ were interpreted as follows: 94 and 96 represent [M+NH4] for unlabeled and labeled 1,2-

+ propanediol respectively by PCI, 45 and 47 represent [CH3CHOH] for unlabeled and labeled

1,2-propanediol respectively by EI, 92 and 94 represent fragment ion of unlabeled and labeled

1,2-propanediol respectively by PCI, and 43 and 45 represent fragment ion of unlabeled and labeled 1,2-propanediol respectively by EI.44 All GC/MS data were acquired and analyzed with the Waters MassLynx V4.1 software package.

148

Representative small scale hydrolytic kinetic resolution to synthesize 1,2-(18O)propanediols

Figure 3.18: Representative small scale hydrolytic kinetic resolution to synthesize 1,2-(18O)propanediols .

The following was adapted from a previously reported procedure. The (R,R)-Co2+-salen catalyst 1 (11.4 nmol, 0.002 equiv) was dissolved in toluene (143 µL) and acetic acid (7 µL) and stirred in air for 30 min. The catalyst was concentrated in vacuo to give a residue which was dissolved in the epoxide (5.72 mmol, 1.0 equiv). This solution was cooled to 0 °C and H2O (2.57 mmol, 0.45 equiv) was added. This mixture was left to stir at 23 °C for 12 h at which point residual epoxide was removed in vacuo. The remaining material was dissolved in 5:3 hexanes:EtOAc and filtered. The filtrate was extracted with 15 mL of water three times.

Removal of the water in vacuo gave the final product. Yields are reported out of a maximum possible yield of 50%.

(S)-1,2-(1-18O)Propanediol

The procedure used was identical to the representative small scale hydrolytic kinetic

18 resolution using propylene oxide (332 mg) as the epoxide, except that H2 O (Cambridge

149

Isotopes, 97%) (51.5 mg, 2.57 mmol, 0.45 equiv) was added instead of H2O. This yielded 144 mg (35%) of (S)-1,2-(1-18O)propanediol with an ee of 99% and an isotopic enrichment of 96%.

1 H NMR (400 MHz, CDCl3): δ 3.96–3.87 (m, 1H), 3.64 (dd, J = 11, 3 Hz, 1H), 3.40 (dd, J = 11,

8 Hz, 1H), 1.17 (d, J = 7 Hz, 3H). The 1H NMR matched that of unlabeled 1,2-propanediol.49

(R)-1,2-(1-18O)Propanediol

The procedure used was identical to the representative small scale hydrolytic kinetic

18 resolution of propylene oxide (332 mg) as the epoxide, except that H2 O (Cambridge Isotopes,

97%) (51.5 mg, 2.57 mmol, 0.45 equiv) was added instead of H2O and that (S,S)-1 was used.

This procedure yielded 145 mg (36%) of (R)-1,2-(1-18O)propanediol with an ee of 99% and an isotopic enrichment of 96%.

1 H NMR (400 MHz, CDCl3): δ 3.96–3.87 (m, 1H), 3.64 (dd, J = 11, 3 Hz, 1H), 3.40 (dd, J = 11,

8 Hz, 1H), 1.17 (d, J = 7 Hz, 3H). The 1H NMR matched that of unlabeled 1,2-propanediol.49

Synthesis of 1,2-(2-18O)propanediols

Figure 3.19: Synthesis of 1,2-(2-18O)propanediols .

150

Chloroacetone dimethyl ketal

Chloroacetone (9.25 g, 100 mmol, 1 equiv), trimethyl orthoformate (10.7 g, 101 mmol,

1.01 equiv), and p-toluenesulfonic acid monohydrate (190 mg, 1.00 mmol, 0.01 equiv) were combined in a flame dried flask and refluxed for 6 h. After cooling to 23 °C, the solution was washed with 10 mL of a 10% KOH solution to remove residual chloroacetone, dried with

K2CO3, and used without further purification. This yielded 9.15 g (66%) of chloroacetone dimethyl ketal.

1 1 H NMR (400 MHz, CDCl3): δ 3.49 (s, 3H), 3.23 (s, 6H), 1.41 (s, 3H). H NMR is consistent with previously reported spectra.50

1-Chloropropan-2-(18O)ol

18 Chloroacetone dimethyl ketal (5.37 g, 33.3 mmol, 1 equiv), H2 O (Cambridge Isotopes,

97%) (1.00 g, 50.0 mmol, 1.5 equiv), and p-toluenesulfonic acid monohydrate (63 mg, 0.33 mmol, 0.01 equiv) were combined and stirred at 80 °C for 5 h. After cooling to 23 °C, the solution was added to 20 mL methanol. The temperature was lowered to 0 °C and NaBH4 (0.945 g, 25.0 mmol, 0.75 equiv) was slowly added. The reaction was stirred for 2.5 h at 23 °C. Most of the methanol was removed in vacuo, leaving about 3 mL of solution. The remaining solution was dissolved in 10 mL of water and then extracted with 10 mL of ether three times. The organic layers were combined and solvent was removed in vacuo to give 1.33 g (42%) of 1-

151 chloropropan-2-(18O)ol over two steps. Oxygen-18 enrichment was not determined for this intermediate.

1 H NMR (400 MHz, CDCl3): δ 4.00 (m, 1H), 3.59 (dd, J = 11, 4 Hz, 1H), 3.45 (dd, J = 11, 7 Hz,

1H), 1.26 (d, J = 6 Hz, 3H). This result is consistent with the spectrum of unlabeled 1-chloro-2- propanol.51

Propylene (18O)oxide

This procedure was adapted from a previous method.52 To an oven dried 3 mL vial with a stir bar was added KOH (0.898 g, 16.0 mmol, 1.16 equiv). 1.2 mL of water was added and a

Hickman still was attached. A rubber septum was attached to the still, closing the system to the atmosphere. The vial was heated to 50 °C and 1-chloropropan-2-(18O)ol was added dropwise via a needle through the septum. 233.7 mg (28%) of propylene (18O)oxide was collected from the still. The oxygen-18 enrichment was not determined for this intermediate.

1 H NMR (400 MHz, CDCl3) δ: 3.02–2.95 (m, 1H), 2.75 (t, J = 5 Hz, 1H), 2.42 (dd, J = 5 Hz, J =

3 Hz, 1H), 1.31 (d, J = 5 Hz, 3H). This result is consistent with the spectrum of unlabeled propylene oxide.53

(S)-1,2-(2-18O)Propanediol

The procedure used was identical to the representative small scale hydrolytic kinetic resolution, except that 80 mg (1.33 mmol) of propylene (18O)oxide, 4 mg (R,R)-1 (0.007 mmol,

0.005 equiv), 83 µL toluene, 4 µL acetic acid, and 11 µL water (0.600 mmol, 0.45 equiv) were

152 used instead of the amounts listed above. This yielded 40.0 mg (43%) of (S)-1,2-(2-

18O)propanediol with an ee of 98% and an isotopic enrichment of 96 ± 3%.

1 H NMR (400 MHz, CDCl3): δ 3.97–3.85 (m, 1H), 3.64 (dd, J = 11, 3 Hz, 1H), 3.40 (dd, J = 11,

8 Hz, 1H), 1.17 (d, J = 7 Hz, 3H). The 1H NMR matched that of unlabeled 1,2-propanediol.49

(R)-1,2-(2-18O)Propanediol

The procedure used was identical to the representative small scale hydrolytic kinetic resolution, except that 75 mg (1.25 mmol) of propylene (18O)oxide, 4 mg (S,S)-1 (0.007 mmol,

0.005 equiv), 83 µL toluene, 4 µL acetic acid, and 10 µL water (0.561 mmol, 0.45 equiv) were used instead of the amounts listed above. This yielded 34.2 mg (39%) of (R)-1,2-(2-

18O)propanediol with an ee of 99% and an isotopic enrichment of 94 ± 5%.

1 H NMR (400 MHz, CDCl3): δ 3.96–3.86 (m, 1H), 3.64 (dd, J = 11, 3 Hz, 1H), 3.40 (dd, J = 11,

8 Hz, 1H), 1.17 (d, J = 7 Hz, 3H). The 1H NMR matched that of unlabeled 1,2-propanediol.49

153

GC-MS (CI) of 1,2-(1-18O)propanediols to determine oxygen-18 enrichment

1,2-Propanediol Standard

) 100 +

% [M+NH4]

e

(

v

i

y

t t

i 50

a

l

s

e

n

e

R

t n

I 0 88 89 90 91 92 93 94 95 96 97 98 99100 m/z

(S)-1,2-(1-18O)Propanediol

+

) 100

[M+NH4]

%

e

(

v

i

y

t t

i 50

a

l

s

e

n

e

R

t n

I 0 88 89 90 91 92 93 94 95 96 97 98 99100 m/z

(R)-1,2-(1-18O)Propanediol

+

) 100

[M+NH4]

%

e

(

v

i

y

t t

i 50

a

l

s

e

n

e

R

t n

I 0 88 89 90 91 92 93 94 95 96 97 98 99100 m/z

Figure 3.20: GC-MS (CI) of 1,2-(1-18O)propanediols to determine oxygen-18 enrichment .

154

GC-MS (EI) of 1,2-(2-18O)propanediols to determine oxygen-18 enrichment

1,2-Propanediol Standard

) 100

%

e

(

v

i

y

t t

i 50

a

l

s

e

n

e

R

t n

I 0 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 m/z

(S)-1,2-(2-18O)Propanediol

) 100

%

e

(

v

i

y

t t

i 50

a

l

s

e

n

e

R

t n

I 0 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 m/z

(R)-1,2-(2-18O)Propanediol

) 100

%

e

(

v

i

y

t t

i 50

a

l

s

e

n

e

R

t n

I 0 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 m/z

Figure 3.21: GC-MS (EI) of 1,2-(2-18O)propanediols to determine oxygen-18 enrichment.

155

GC-FID

Racemic 1,2-Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.144 MM 0.3937 141.01578 5.98531 49.50659 2 9.188 MM 0.3982 143.82669 6.02022 50.49341

(S)-1,2-Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.166 MM 0.2370 1.95086 1.37208e-1 0.58663 2 9.133 MM 0.4175 330.60010 13.19849 99.41337

(R)-1,2-Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.108 MM 0.4072 424.82587 17.38660 99.63594 2 9.233 MM 0.2003 1.55229 1.29157e-1 0.36406

Figure 3.22: GC-FID analysis to measure enantioenrichment of 1,2-propanediols.

156

(S)-1,2-(1-18O)Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.152 MM 0.2844 3.21198 1.88213e-1 1.07783 2 9.143 MM 0.4034 294.79245 12.17993 98.92217

(R)-1,2-(1-18O)Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.120 MM 0.4002 296.24286 12.33627 98.70665 2 9.225 MM 0.2587 3.88167 2.5119e-1 1.29335

Figure 3.23: GC-FID analysis to measure enantioenrichment of 1,2-(1- 18O)propanediols.

157

(S)-1,2-(2-18O)Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.117 MM 0.2883 3.79368 2.19298e-1 1.56787 2 9.155 MM 0.4110 238.17026 9.65728 98.43213

(R)-1,2-(2-18O)Propanediol

Peak Width RetTime [min] Type Area [pA*s] Height [pA] Area % # [min] 1 7.130 MM 0.3967 219.38962 9.21725 98.60419 2 9.229 MM 0.2033 3.10560 2.54554e-1 1.39581

Figure 3.24: GC-FID analysis to measure enantioenrichment of 1,2-propanediols.

3.4: References

(1) Buckel, W.; Golding, B. T. Radical Enzymes in Anaerobes. Annu. Rev. Microbiol. 2006, 60, 27-49.

(2) Levin, B. J.; Huang, Y. Y.; Peck, S. C.; Wei, Y.; Martínez-del Campo, A.; Marks, J. A.; Franzosa, E. A.; Huttenhower, C.; Balskus, E. P. A prominent glycyl radical enzyme in human gut microbiomes metabolizes trans-4-hydroxy-L-proline. Science 2017, 355, eaai8386.

(3) Levin, B. J.; Balskus, E. P. Characterization of 1,2-Propanediol Dehydratases Reveals Distinct Mechanisms for B12-Dependent and Glycyl Radical Enzymes. Biochemistry 2018, 57, 3222-3226.

158

(4) Hooper, L. V.; Xu, J.; Falk, P. G.; Midtvedt, T.; Gordon, J. I. A molecular sensor that allows a gut commensal to control its nutrient foundation in a competitive ecosystem. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 9833-9838.

(5) Goto, Y.; Obata, T.; Kunisawa, J.; Sato, S.; Ivanov, I. I.; Lamichhane, A.; Takeyama, N.; Kamioka, M.; Sakamoto, M.; Matsuki, T.; Setoyama, H.; Imaoka, A.; Uematsu, S.; Akira, S.; Domino, S. E.; Kulig, P.; Becher, B.; Renauld, J.-C.; Sasakawa, C.; Umesaki, Y.; Benno, Y.; Kiyono, H. Innate lymphoid cells regulate intestinal epithelial cell glycosylation. Science 2014, 345, e1254009.

(6) Kashyap, P. C.; Marcobal, A.; Ursell, L. K.; Smits, S. A.; Sonnenburg, E. D.; Costello, E. K.; Higginbottom, S. K.; Domino, S. E.; Holmes, S. P.; Relman, D. A.; Knight, R.; Gordon, J. I.; Sonnenburg, J. L. Genetically dictated change in host mucus carbohydrate landscape exerts a diet-dependent effect on the gut microbiota. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 17059-17064.

(7) Chen, Y.-M.; Zhu, Y.; Lin, E. C. C. The organization of the fuc regulon specifying L- fucose dissimilation in Escherichia coli K12 as determined by gene cloning. Mol. Gen. Genet. 1987, 210, 331-337.

(8) Bobik, T. A.; Havemann, G. D.; Busch, R. J.; Williams, D. S.; Aldrich, H. C. The Propanediol Utilization (pdu) Operon of Salmonella enterica Serovar Typhimurium LT2 Includes Genes Necessary for Formation of Polyhedral Organelles Involved in Coenzyme B12-Dependent 1,2-Propanediol Degradation. J. Bacteriol. 1999, 181, 5967-5975.

(9) Reichardt, N.; Duncan, S. H.; Young, P.; Belenguer, A.; McWilliam Leitch, C.; Scott, K. P.; Flint, H. J.; Louis, P. Phylogenetic distribution of three pathways for propionate production within the human gut microbiota. ISME J. 2014, 8, 1323-1335.

(10) Hosseini, E.; Grootaert, C.; Verstraete, W.; Van de Wiele, T. Propionate as a health- promoting microbial metabolite in the human gut. Nutr. Rev. 2011, 69, 245-258.

(11) Toraya, T. Cobalamin-dependent dehydratases and a deaminase: Radical catalysis and reactivating chaperones. Arch. Biochem. Biophys. 2014, 544, 40-57.

(12) Zagalak, B.; Frey, P. A.; Karabatsos, G. L.; Abeles, R. H. The Stereochemistry of the Conversion of D and L 1,2-Propanediols to Propionaldehyde. J. Biol. Chem. 1966, 241, 3028-3035.

(13) Rétey, J.; Umani-Ronchi, A.; Seibl, J.; Arigoni, D. Zum Mechanismus der Propandioldehydrase-Reaktion. Experientia 1966, 22, 502-503.

(14) Scott, K. P.; Martin, J. C.; Campbell, G.; Mayer, C.-D.; Flint, H. J. Whole-Genome Transcription Profiling Reveals Genes Up-Regulated by Growth on Fucose in the Human Gut Bacterium “Roseburia inulinivorans”. J. Bacteriol. 2006, 188, 4340-4349.

(15) LaMattina, J. W.; Keul, N. D.; Reitzer, P.; Kapoor, S.; Galzerani, F.; Koch, D. J.; Gouvea, I. E.; Lanzilotta, W. N. 1,2-Propanediol Dehydration in Roseburia inulinivorans:

159

STRUCTURAL BASIS FOR SUBSTRATE AND ENANTIOMER SELECTIVITY. J. Biol. Chem. 2016, 291, 15515-15526.

(16) O'Brien, J. R.; Raynaud, C.; Croux, C.; Girbal, L.; Soucaille, P.; Lanzilotta, W. N. Insight into the Mechanism of the B12-Independent Glycerol Dehydratase from Clostridium butyricum: Preliminary Biochemical and Structural Characterization. Biochemistry 2004, 43, 4635-4645.

(17) Feliks, M.; Ullmann, G. M. Glycerol Dehydratation by the B12-Independent Enzyme May Not Involve the Migration of a Hydroxyl Group: A Computational Study. J. Phys. Chem. B 2012, 116, 7076-7087.

(18) Kovačević, B.; Barić, D.; Babić, D.; Bilić, L.; Hanževački, M.; Sandala, G. M.; Radom, L.; Smith, D. M. Computational Tale of Two Enzymes: Glycerol Dehydration With or Without B12. J. Am. Chem. Soc. 2018, 140, 8487-8496.

(19) Demick, J. M.; Lanzilotta, W. N. Radical SAM Activation of the B12-Independent Glycerol Dehydratase Results in Formation of 5′-Deoxy-5′-(methylthio)adenosine and Not 5′-Deoxyadenosine. Biochemistry 2011, 50, 440-442.

(20) Broderick, J. B.; Duffus, B. R.; Duschene, K. S.; Shepard, E. M. Radical S- Adenosylmethionine Enzymes. Chem. Rev. 2014, 114, 4229-4317.

(21) Zhang, Y.; Zhu, X.; Torelli, A. T.; Lee, M.; Dzikovski, B.; Koralewski, R. M.; Wang, E.; Freed, J.; Krebs, C.; Ealick, S. E.; Lin, H. Diphthamide biosynthesis requires an organic radical generated by an iron–sulphur enzyme. Nature 2010, 465, 891-896.

(22) Wecksler, S. R.; Stoll, S.; Tran, H.; Magnusson, O. T.; Wu, S.-p.; King, D.; Britt, R. D.; Klinman, J. P. Pyrroloquinoline Quinone Biogenesis: Demonstration That PqqE from Klebsiella pneumoniae Is a Radical S-Adenosyl-L-methionine Enzyme. Biochemistry 2009, 48, 10151-10161.

(23) Craciun, S.; Marks, J. A.; Balskus, E. P. Characterization of Choline Trimethylamine- Lyase Expands the Chemistry of Glycyl Radical Enzymes. ACS Chem. Biol. 2014, 9, 1408-1413.

(24) Shisler, K. A.; Broderick, J. B. Glycyl radical activating enzymes: Structure, mechanism, and substrate interactions. Arch. Biochem. Biophys. 2014, 546, 64-71.

(25) Unkrig, V.; Neugebauer, F. A.; Knappe, J. The free radical of pyruvate formate-lyase. Eur. J. Biochem. 1989, 184, 723-728.

(26) Sun, X.; Ollagnier, S.; Schmidt, P. P.; Atta, M.; Mulliez, E.; Lepape, L.; Eliasson, R.; Gräslund, A.; Fontecave, M.; Reichard, P.; Sjöberg, B.-M. The Free Radical of the Anaerobic Ribonucleotide Reductase from Escherichia coli Is at Glycine 681. J. Biol. Chem. 1996, 271, 6827-6831.

160

(27) Backman, L. R. F.; Funk, M. A.; Dawson, C. D.; Drennan, C. L. New tricks for the glycyl radical enzyme family. Crit. Rev. Biochem. Mol. Biol. 2017, 52, 674-695.

(28) Schaus, S. E.; Brandes, B. D.; Larrow, J. F.; Tokunaga, M.; Hansen, K. B.; Gould, A. E.; Furrow, M. E.; Jacobsen, E. N. Highly Selective Hydrolytic Kinetic Resolution of Terminal Epoxides Catalyzed by Chiral (salen)CoIII Complexes. Practical Synthesis of Enantioenriched Terminal Epoxides and 1,2-Diols. J. Am. Chem. Soc. 2002, 124, 1307- 1315.

(29) Tobimatsu, T.; Sakai, T.; Hashida, Y.; Mizoguchi, N.; Miyoshi, S.; Toraya, T. Heterologous Expression, Purification, and Properties of Diol Dehydratase, an Adenosylcobalamin-Dependent Enzyme of Klebsiella oxytoca. Arch. Biochem. Biophys. 1997, 347, 132-140.

(30) Abeles, R. H.; Lee, H. A. An Intramolecular Oxidation-Reduction Requiring a Cobamide Coenzyme. J. Biol. Chem. 1961, 236, 2347-2350.

(31) Wetmore, S. D.; Smith, D. M.; Golding, B. T.; Radom, L. Interconversion of (S)- Glutamate and (2S,3S)-3-Methylaspartate: A Distinctive B12-Dependent Carbon-Skeleton Rearrangement. J. Am. Chem. Soc. 2001, 123, 7963-7972.

(32) Jeffrey, J. L.; Terrett, J. A.; MacMillan, D. W. C. O–H hydrogen bonding promotes H- atom transfer from α C–H bonds for C-alkylation of alcohols. Science 2015, 349, 1532- 1536.

(33) Blanksby, S. J.; Ellison, G. B. Bond Dissociation Energies of Organic Molecules. Acc. Chem. Res. 2003, 36, 255-263.

(34) da Silva, G.; Bozzelli, J. W. Enthalpies of Formation, Bond Dissociation Energies, and Molecular Structures of the n-Aldehydes (Acetaldehyde, Propanal, Butanal, Pentanal, Hexanal, and Heptanal) and Their Radicals. J. Phys. Chem. A 2006, 110, 13058-13067.

(35) Toraya, T.; Honda, S.; Mori, K. Coenzyme B12-Dependent Diol Dehydratase Is a Potassium Ion-Requiring Calcium Metalloenzyme: Evidence That the Substrate- Coordinated Metal Ion Is Calcium. Biochemistry 2010, 49, 7210-7217.

(36) Hayon, E.; Simic, M. Acid-base properties of free radicals in solution. Acc. Chem. Res. 1974, 7, 114-121.

(37) Bodea, S.; Funk, M. A.; Balskus, E. P.; Drennan, C. L. Molecular Basis of C–N Bond Cleavage by the Glycyl Radical Enzyme Choline Trimethylamine-Lyase. Cell Chem. Biol. 2016, 23, 1206-1216.

(38) Orman, M.; Bodea, S.; Funk, M. A.; Campo, A. M.-d.; Bollenbach, M.; Drennan, C. L.; Balskus, E. P. Structure-Guided Identification of a Small Molecule That Inhibits Anaerobic Choline Metabolism by Human Gut Bacteria. J. Am. Chem. Soc. 2018.

161

(39) Wang, Z.; Roberts, Adam B.; Buffa, Jennifer A.; Levison, Bruce S.; Zhu, W.; Org, E.; Gu, X.; Huang, Y.; Zamanian-Daryoush, M.; Culley, Miranda K.; DiDonato, Anthony J.; Fu, X.; Hazen, Jennie E.; Krajcik, D.; DiDonato, Joseph A.; Lusis, Aldons J.; Hazen, Stanley L. Non-lethal Inhibition of Gut Microbial Trimethylamine Production for the Treatment of Atherosclerosis. Cell 2015, 163, 1585-1595.

(40) Roberts, A. B.; Gu, X.; Buffa, J. A.; Hurd, A. G.; Wang, Z.; Zhu, W.; Gupta, N.; Skye, S. M.; Cody, D. B.; Levison, B. S.; Barrington, W. T.; Russell, M. W.; Reed, J. M.; Duzan, A.; Lang, J. M.; Fu, X.; Li, L.; Myers, A. J.; Rachakonda, S.; DiDonato, J. A.; Brown, J. M.; Gogonea, V.; Lusis, A. J.; Garcia-Garcia, J. C.; Hazen, S. L. Development of a gut microbe–targeted nonlethal therapeutic to inhibit thrombosis potential. Nat. Med. 2018, 24, 1407-1417.

(41) Kennedy, M. C.; Kent, T. A.; Emptage, M.; Merkle, H.; Beinert, H.; Münck, E. Evidence for the formation of a linear [3Fe-4S] cluster in partially unfolded aconitase. J. Biol. Chem. 1984, 259, 14463-14471.

(42) Beinert, H. Semi-micro methods for analysis of labile sulfide and of labile sulfide plus sulfane sulfur in unusually stable iron-sulfur proteins. Anal. Biochem. 1983, 131, 373- 378.

(43) Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 1976, 72, 248-254.

(44) Wolfe, R. R., Radioactive and Stable Isotope Tracers in Biomedicine: Principles and Practice of Kinetic Analysis. Wiley-Liss: New York, 1992.

(45) Tobimatsu, T.; Hara, T.; Sakaguchi, M.; Kishimoto, Y.; Wada, Y.; Isoda, M.; Sakai, T.; Toraya, T. Molecular Cloning, Sequencing, and Expression of the Genes Encoding Adenosylcobalamin-dependent Diol Dehydrase of Klebsiella oxytoca. J. Biol. Chem. 1995, 270, 7142-7148.

(46) Toraya, T.; Ushio, K.; Fukui, S.; Hogenkamp, P. C. Studies on the mechanism of the adenosylcobalamin-dependent diol dehydrase reaction by the use of analogs of the coenzyme. J. Biol. Chem. 1977, 252, 963-970.

(47) Carlson, E. E.; Kiessling, L. L. Improved Chemical Syntheses of 1- and 5- Deazariboflavin. J. Org. Chem. 2004, 69, 2614-2617.

(48) Mansurova, M.; Koay, M. S.; Gärtner, W. Synthesis and Electrochemical Properties of Structurally Modified Flavin Compounds. Eur. J. Org. Chem. 2008, 2008, 5401-5406.

(49) Troev, K.; Koseva, N.; Hägele, G. Novel Routes to Aminophosphonic Acids: Interaction of Dimethyl H-Phosphonate with Hydroxyalkyl Carbamates. Heteroat. Chem. 2008, 19, 119-124.

162

(50) Zhong-shi, Z.; Li, L.; Xue-han, H. A one-pot method synthesis of ɑ-chloroketone dimethyl acetals. J. Chem. Res. 2013, 37, 633-635.

(51) Mathieu-Pelta, I.; Evans, S. A. Highly Regioselective and Stereospecific Functionalization of 1,2-Propanediol with Trimethyl(X)silanes Employing the 1,3,2λ5- Dioxaphospholane Methodology. J. Org. Chem. 1992, 57, 3409-3413.

(52) Levene, P. A.; Walti, A. NOTE ON THE ACTION OF AMMONIA ON PROPYLENE OXIDE. J. Biol. Chem. 1927, 71, 461-463.

(53) Tormena, C. F.; Rittner, R.; Contreras, R. H.; Peralta, J. E. Anomeric Effect on Geminal and Vicinal JHH NMR Coupling Constants. J. Phys. Chem. A 2004, 108, 7762-7768.

163

Chapter 4: Profiling glycyl radical enzymes and other enzyme families in metagenomes

4.1: Introduction

The ability to accurately characterizing the metabolic capabilities of microbial communities is essential to identifying the molecular mechanisms underlying interactions between gut microbes and their hosts. As described in Chapter 1, advances in DNA sequencing technology have proven invaluable for functional profiling of this complex ecosystem.1

However, accurately connecting metagenomic sequences to specific enzymatic functions is challenging. Many metagenomic functional profiling strategies rely on taxonomic profiling, inferring the metabolic capabilities of organisms present in a community based on microbial identity (16S rRNA sequences).2 However, taxonomic profiling cannot always reveal strain and species level differences, and functional profiling from taxonomic data is further complicated by horizontal gene transfer.3 Likewise, reliance on sequenced microbial reference genomes fails to capture the potential diversity of metabolic pathways and enzymes not found in sequenced, cultured organisms.4 Finally, approaches that use databases like the Kyoto Encyclopedia of

Genes and Genomes (KEGG) to rapidly annotate open reading frames (ORFs) in metagenomic data suffer from misannotation.5 Overall, these methods are not precise enough to accurately determine the activities of genes in metagenomes, and complementary strategies are needed to guide studies of how the biochemical functions encoded in microbiomes impact host health.

In Chapter 2, I described the development of chemically guided functional profiling

(CGFP), a bioinformatics method that integrates biochemical knowledge of a protein family, encapsulated in a sequence similarity network (SSN), with quantitative metagenomic analysis to measure the abundance of genes encoding functionally distinct enzymes in microbiomes.6 My collaborators and I used this approach to quantify the glycyl radical enzymes (GREs) in gut

164 metagenomes from healthy humans, including genes encoding the characterized GREs pyruvate formate-lyase (PFL), 4-hydroxyphenylacetate decarboxylase (4-HPAD), choline trimethylamine- lyase (CutC), and propanediol dehydratase (PD). This analysis also revealed the striking abundance and wide distribution of the previously uncharacterized GRE trans-4-hydroxy-L- proline dehydratase (HypD). This application of CGFP highlights a need to integrate bioinformatic and biochemical approaches to characterize the wealth of diverse enzymatic activities present in microbial communities.

Our initial application of CGFP highlighted additional opportunities to use this approach as well as some important limitations. CGFP was only to metagenomes from the first phase of the Human Microbiome Project, and it had not been demonstrated that CGFP could be valuable for studying datasets from alternative patient cohorts and microbial habitats. One could also envision using CGFP to compare the abundance of genes in gut microbiomes from healthy cohorts and patients with disease. Finally, it was not clear whether our original analysis captured all of the GRE diversity present in the human gut microbiota as the SSN only included GREs found in sequenced microbial genomes.

In this Chapter, I describe our initial efforts to expand CGFP, including the use of this method to quantify the abundance and distribution of the GREs and other enzyme families in diverse metagenomic datasets. These results illustrate that CGFP is useful for comparative metagenomic profiling and reveal differences in gene abundance between healthy and disease cohorts. I also profiled different types of enzymes in metagenomes, reinforcing that quantifying individual genes in metagenomes is useful for proteins beyond the GREs. In addition, I mined metagenomic assemblies for proteins that are absent from sequenced microbial genomes.

Applying this method to the GRE superfamily identified hundreds of contigs that encode novel

165

GREs. These functional profiling strategies should prove useful for guiding future studies that aim to characterize the molecular mechanisms of host-gut microbial interactions.

4.2: Results and discussion

4.2.1: Quantifying glycyl radical enzymes in gut metagenomes from healthy and cirrhotic patient cohorts

As our previous efforts to characterize GREs in gut microbiomes had focused on healthy humans,6 we sought to expand on these experiments by examining the GREs present in disease cohorts. We began by quantifying the GREs present in the gut microbiomes of humans with cirrhosis. Qin et al. collected 114 metagenomes from healthy patients and 123 metagenomes from cirrhotic patients in order to compare their gut microbiomes.7 They identified novel genes in and biomarkers for the cirrhosis cohort, ultimately developing a strategy to diagnose this disease based on genes present in patient metagenomes.7 We used the metagenomes generated during that study for two primary reasons. First, the dataset was large with consistent metagenome generation across samples, which is valuable for comparative profiling to enable robust and reliable statistical analyses.8 Second, gut microbial choline metabolism, which is mediated by GRE choline trimethylamine-lyase (CutC), has been linked to nonalcoholic fatty liver disease (NAFLD).9-10 Cirrhosis can be a complication from NAFLD,11 and so examining whether the abundance of CutC and other GREs are altered with liver dysfunction is of great interest.

We used CGFP to compare the abundances of GREs encoded in the gut metagenomes of the healthy and cirrhosis cohorts. We began with a list of GREs and a corresponding SSN generated previously,6 and we used ShortBRED to quantify these genes in each of the 237 sequenced metagenomes.7,12 Summing the abundances of GREs in the same SSN cluster

166 provided the abundance of each putative isofunctional group of enzymes in the two sets of metagenomes.

Principal component analysis (PCA) was used to visualize and detect differences in GRE abundance and distribution between the healthy and cirrhotic gut microbiomes (Figure 4.1).13

PCA is used to reduce the number of dependent variables in multidimensional data. In this case,

PCA can be applied to visualize abundance data for all of the GREs (or SSN clusters) in fewer dimensions than there are GREs (or SSN clusters). PCA using the abundance of each GRE

(Figure 4.1A) results in a PCA where the first two principal components only account for 62% of the total variation. However, using the abundances of each cluster in the GRE SSN instead results in PCA plot where the first principle component accounts for 92% of the total variation

(Figure 4.1B). This difference suggests that although there may be significant variability of the species in the gut microbiota, there is less variation in the functional capabilities between individuals. In the PCA plots, there is no significant clustering of healthy or cirrhotic individuals, suggesting that the differences in GRE abundances are not the primary difference between the two cohorts.

167

A 100 B 200 Healthy )

0 ) 100 Cirrhotic

%

%

2

4

1

(

( -100 0

2

2 C

C Healthy P

P -200 -100 Cirrhotic -300 -200 -100 0 100 200 300 -400 -200 0 200 400 600 PC1 (50%) PC1 (92%)

Figure 4.1: Principal component analysis (PCA) of GREs encoded in human gut metagenomes from a previous study of cirrhosis. (A) PCA generated from all GRE abundances clustered at 85%, as computed by ShortBRED. No clustering of samples from either cohort suggests there is no significant difference in overall GRE distribution, although individual GREs may have differing abundance. (B) PCA generated using GRE SSN cluster abundances suggests the overall distribution of GRE functions are similar between the two cohorts.

Although the overall GRE distribution may be similar, individual GREs exhibited significant differences between the two cohorts. Mann-Whitney U-tests (MW) were used to compare the abundance of each GRE cluster between the two cohorts, using the Benjamini-

Hochberg procedure to control the false discovery rate arising from performing multiple comparisons.14 Although most GREs did not show significantly different abundances between cohorts, the abundance of CutC was significantly lower in metagenomes from patients with healthy livers compared to the subjects with cirrhotic livers (P < 0.0026, MW) (Figure 4.2). In addition, the percentages of healthy and diseased patients with detected gut microbial CutC were

70.2% and 74.0%, respectively. Although these samples were from Chinese individuals, the percentage of healthy American individuals with detected gut microbial CutC was similar

(65.0%).6 Genes encoding CutC were identified in oral microbiomes of healthy individuals and the presence of oral bacteria in the gut is correlated with cirrhosis,6-7 in line with our observation that CutC is present in elevated abundance in gut metagenomes from cirrhosis patients relative to

168 healthy individuals. Although this correlation hints at a possible association between CutC and cirrhosis, more mechanistic work is needed to precisely determine the connection between gut microbial metabolism and liver dysfunction.

Figure 4.2: Abundance of CutC in metagenomes from healthy and cirrhosis cohorts. There is a significantly greater amount of CutC in the cirrhosis samples compared to the healthy samples. Abundance values are in units of reads per kilobase per million reads (RPKM) and are presented as Tukey boxplots.

The other characterized GRE found in significantly different abundance in gut metagenomes between humans with healthy and cirrhotic livers is isethionate sulfite-lyase (IslA)

(Figure 4.3).15 IslA originally identified in Bilophila wadsworthia, a gut microbial pathobiont linked to colitis and inflammatory bowel disease.16-18 B. wadsworthia can liberate taurine from conjugated bile acids, convert it to isethionate, and then use its IslA to access acetaldehyde and sulfite for further processing. Intriguingly, the abundance of IslA is significantly lower in the gut metagenomes of the cirrhosis cohort (P < 2.6×10−5, MW), in agreement with the decreased levels of B. wadsworthia observed in these samples.7 The much lower percentage of cirrhosis cohort members’ microbiomes encoding IslA corresponds to decreased levels of bile acid-metabolizing microbes in their gut microbiotas, indicating that fewer gut microbes can rely on host-derived

169 bile acids in this environment. Humans with cirrhosis seem to have decreased intestinal bile acid concentrations due to their biosynthesis being inhibited by liver inflammation, and this observation may be consistent with decreased gut B. wadsworthia abundance.19-20

A

B IslA, % of samples > 0: )

M 57.9% 32.5% K

P 10

R (

8

s 6

e 4

m

o n

e 3

g

a t

e 2

M

n

i

e 1

c

n a

d 0 n

u y ic b h t lt o A a h e r H ir C

Figure 4.3: Isethionate sulfite-lyase (IslA) in gut metagenomes from healthy and cirrhotic patients. (A) The reaction catalyzed by the GRE IslA. (B) The abundance of IslA is significantly greater in healthy individuals than cirrhosis patients. This difference is directly correlated with the decrease in B. wadsworthia observed in the disease cohort. Abundance values are in units of reads per kilobase per million reads (RPKM) and are presented as Tukey boxplots.

4.2.2: Quantifying glycyl radical enzymes in pig gut metagenomes

The bioinformatics strategies described in Chapter 2 are not limited to studies of human microbiomes. The pig gut microbiota has been studied extensively due to this animal’s role in agriculture and as a model for human physiology.21-24 We were particularly interested in studying

GREs encoded in pig gut microbiomes because one such enzyme, indole-3-acetate decarboxylase

(IAD), is involved in skatole biosynthesis in this environment.25 Skatole is a cause of boar taint,

170 an adverse smell and odor found in some intact male pigs.26 The importance of skatole and the

GRE that produces it are described in further detail in Chapter 5. A large collection of swine gut metagenomes were recently reported,27 and I aimed to extend CGFP to quantify the GREs in these samples. The metagenomes were analyzed in the same manner as the cirrhosis datasets described previously, except that an additional parameter was required to account for varying average read lengths in these metagenomes. Characterized GREs were found in high abundance in all samples (Figure 4.4). PFL and PD were found in 100% of samples, HypD in all but two of the 295 samples, and CutC in 97% of samples. IslA was also found in 26% of these pig gut metagenomes. However, no GRE decarboxylases, including 4-HPAD and a recently disclosed indole-3-acetate decarboxylase,25 could be identified in these datasets despite their predicted roles in p-cresol and skatole production in this microbial habitat.28-29 This result may indicate that different methods of metagenome profiling that are more sensitive may be necessary to detect these low-abundance GREs in these samples. Alternatively, as ShortBRED is highly specific genes present in the inputted sequence file, it may be possible that decarboxylases present in this habitat do not closely resemble enzymes encoded in microbial genomes and will not be detected by ShortBRED. It is also possible GREs catalyzing skatole formation in this environment share low sequence similarity with known IADs, hindering our ability to identify them. All together, these results highlight the utility of CGFP for investigating microbiomes from other environments and show that the pig gut microbiota contains many GREs. However, this analysis also illustrates the limitations of ShortBRED for detecting low-abundance and previously unsequenced genes.

171

% of samples > 0: % of samples > 0: % of samples > 0:

)

)

) M

100 M

A B 100 99 C M 97 26

K

K

K

P P 800 20 P

R 4

R

R

(

(

(

s

s

s

e

e

e m

600 m

15 m 3

o

o

o

n

n

n

e

e

e

g

g

g

a

a t

400 a

t 10

t 2

e

e

e

M

M

M

n

n

i

n

i i

200

e 5 1

e

e

c

c

c

n

n

n a

a

a d

0 d d n 0

n 0

u n

L u b F D D u P b p C A P b l A t y s A u I H A C

Figure 4.4: Abundances of characterized GREs in pig gut metagenomes. The abundances of (A) PFL, (B) PD and HypD, and (C) CutC and IslA in pig gut metagenomes. Abundance values are in units of reads per kilobase per million reads (RPKM) and are presented as Tukey boxplots.

4.2.3: Comparing folate biosynthetic gene abundance in infant gut metagenomes

Certain gut microbes are known to produce vitamins that can be taken up by the host.30-31

In particular, folate produced by gut bacteria belonging to the genera Bifidobacterium can be absorbed and used by the host, and a deficiency in gut bacterial folate biosynthesis has been linked to malnutrition and cancer.32-34 Taxonomic profiling of humans at different ages revealed that Bifidobacterium longum dominates the gut microbiota during the first year of infancy.35

Notably, this study also found much less B. longum in the adult gut microbiota, which also contains significantly more genes involved in metabolizing dietary folate and tetrahydrofolate.

This result suggests that de novo folate biosynthesis by gut microbes is important for infant health and that after the first year of life, the microbial metabolism of dietary folate and related compounds is much more significant.

Dr. Lauren Rajakovich (Harvard) and I wished to quantify genes essential to folate biosynthesis in the infant gut microbiome as well as compare their abundances in healthy infants and infants with kwashiorkor, a form of severe acute malnutrition. Characterized by generalized

172 edema, dermatitis, anorexia, and hepatomegaly (enlarged liver), kwashiorkor is most common in poor, rural areas with a limited food supply.36 The gut microbiota has been proposed to be a causal factor in kwashiorkor,37 and we sought to analyze if folate biosynthesis by gut microbes was linked this condition. Previous efforts to study folate biosynthesis genes in gut metagenomes relied on taxonomic profiling or the KEGG database to rapidly annotate genes from shotgun sequencing data sets.37 We hoped to use our knowledge of genes essential to folate biosynthesis to empower functional profiling strategies to clarify how gut microbial folate biosynthesis is linked to infant health.

We began with a list of 11,735 homologs of the enzymes aminodeoxychorismate synthase and aminodeoxychorismate lyase, which are encoded by pabB and pabC, respectively.

These enzymes are involved in the conversion of chorismate to p-aminobenzoate, a process that is essential for tetrahydrofolate biosynthesis.38-40 Initial attempts quantify pabB and pabC in metagenome data from healthy infants and infants with kwashiorkor were unsuccessful because these datasets were sequenced using 454 pyrosequencing, which cannot be analyzed by

ShortBRED due to the low sequencing depth this method provides.12,35,37 Instead, a strategy was developed using USEARCH, a bioinformatics tool for searching sequence databases.41 The

USEARCH algorithm is able to rapidly find sequences in a database that align locally or globally with query sequences of interest. To begin, the protein sequences of interest were clustered at

85% sequence identity to reduce the redundancy and complexity of this dataset; 4,289 clusters were obtained. For each cluster, a centroid sequence representing each cluster and a consensus sequence derived from a multiple sequence alignment of all cluster members were determined; similar quantification results were obtained regardless of whether centroids or consensus sequences were used. Next, USEARCH databases were made for the centroid (and consensus)

173 sequences of interest. Then, every read from each of the metagenomes of interest were aligned against the USEARCH databases to determine which metagenomic reads contained pabB or pabC. Reads were matched to these genes if they shared ≥95% sequence identity over a minimum alignment length of 30 amino acids, using parameters similar to a previous study.12

Finally, abundance values were normalized by sequencing depth and gene length so that comparisons could be made across samples.

Using the above strategy, the abundance of pabB and pabC were analyzed in 308 metagenomes.35,37 200 of these metagenomes were from healthy infants (aged 0.6 to 2.5 years), while the remaining 108 were from infants with kwashiorkor. Despite the low sequencing depth in these samples, pabB and pabC could be detected. In fact, the majority of the pabBC genes found in these metagenomes were from B. longum and related Bifidobacterium species, consistent with the previous taxonomic profiling of these datasets (Figure 4.5A).37 These results further suggest that Bifidobacterium is the major source of pabBC in the infant gut.42

Additionally, similar levels of pabB and pabC from Bifidobacterium were found in each metagenome (Figure 4.5B). Because these two genes are encoded adjacent to one another in the

B. longum genome and are involved in the same metabolic pathway, similar abundances would be expected.43 This finding suggests that the quantification of these genes is accurate.

174

)

) M A B M

100000 2000

K

K

P

P

R

R

(

(

80000 e

e 1500

c

c

n

n a

60000 a

d

d n

n 1000

u

u b

40000 b

a

a

d

d e

e 500

z

z i

20000 i

l

l

a

a

m

m r

0 r 0

o

o N m s N B C u m b b ri is a a e n p p ct a a rg b o o r fid e i th B O

Figure 4.5: Abundances of pabBC from Bifidobacterium and other organisms in infant gut metagenomes. (A) Total pabBC abundances from all microbes in infant gut metagenomes. pabBC from B. longum is present in higher abundance than homologous genes in other organisms. (B) Comparing pabB and pabC abundances. These two genes are present at similar levels, as expected. Abundance values are in units of reads per kilobase per million reads (RPKM) and are presented as Tukey boxplots.

We were particularly interested in comparing pabB and pabC abundances between the healthy and malnourished cohorts. Most immediately, the abundance of pabBC in the healthy infants is significantly greater than in infants with kwashiorkor (P < 0.04, MW) (Figure 4.6A).

This correlation underscores that in the disease cohort there are fewer microbes capable of folate biosynthesis and suggests that differences in folate production may contribute to or be caused by malnutrition linked to kwashiorkor. Furthermore, the abundance of pabBC decreases as the infants aged (Spearman r = −0.681, P < 10−4), in agreement with gut microbial folate biosynthesis becoming less important as the host consumes more solid food (Figure 4.6B).42,44-45

Further quantification of these genes in healthy adult stool metagenomes from the HMP46 using

ShortBRED verified that pabBC from B. longum was absent in these samples and that pabBC from other microbes was significantly less abundant (by approximately two orders of magnitude). Abundance values obtained from ShortBRED and from USEARCH quantification

175 can be compared despite coming from metagenomes obtained in different projects because both methods include a normalization step that accounts for differences in sequencing depth.12

However, other factors can complicate direct comparisons between samples, including

differences in DNA extraction, sequencing, and data processing.

)

) M A B M

2500 2500

K

K

P

P

R

R

(

(

2000 2000

e

e

c

c

n

n a

1500 a 1500

d

d

n

n

u

u b 1000 b

a 1000

a

d

d

e

e

z

z i

500 i 500

l

l

a

a

m

m r

0 r 0

o

o N y r N 0 10 20 30 40 h o lt k a r e io Age (months) h H s a w K

Figure 4.6: Comparative analyses of pabBC abundances in infant gut metagenomes. (A) pabBC was significantly more abundant in the healthy subset of the population than in the kwashiorkor cohort. (B) The abundance of pabBC in infant gut metagenomes decreased as the host aged, consistent with Bifidobacterium decreasing in abundance over time. Abundance values are in units of reads per kilobase per million reads (RPKM) and are presented as Tukey boxplots.

4.2.4: Mining assembled metagenomes for new glycyl radical enzymes

The applications described in this Chapter and in Chapter 2 illustrate how CGFP can be used to effectively quantify enzyme-encoding genes in microbiomes. However, CGFP requires that all proteins of interest be present in the inputted SSN, as ShortBRED is highly specific for target sequences. The sequences used to construct an SSN are typically from an existing protein database.47 Databases like InterPro48 and Pfam49 sort and classify the millions of different proteins identified by genome sequencing projects and can be used to find homologs of proteins of interest. However, these tools are limiting in that SSNs constructed with these databases only

176 contain proteins encoded in sequenced genomes. Quantifying these previously sequenced genes in metagenomes is certainly valuable, but there are likely additional genes in microbiomes that encode proteins that have important functions but are not encoded in any sequenced genomes.

CGFP and ShortBRED are unable to detect these proteins due to the latter program’s high specificity for target proteins, and so novel methods are needed to find novel sequences.

To identify genes in metagenomes, metagenomes must first be assembled. In 2010, Qin et al. constructed a comprehensive list of genes present in the human gut.4 The read lengths for the metagenomes they studied were ≤75 base pairs. Because sequences of this size only rarely encode for full-length genes, reads with overlapping ends must be assembled into longer contigs and scaffolds for further analysis of gene sequence and genomic context. They used the program

SOAPdenovo to assemble these metagenomes, generating 6.58 million contigs >500 base pairs in length.50 Identifying open reading frames (ORFs) and genes within this data requires automated tools such as MetaGene.51 Qin et al. relied solely on the KEGG database to classify predicted genes and did not attempt to precisely functionally characterize them.52 Relying on the gene catalog prepared by Qin et al., the Babbitt laboratory searched for acid-sugar dehydratases using existing data in the Structure-Function Linkage Database.53-54 The discovered sequences were compared to known acid-sugar dehydratases, and they predicted that some of the encoded enzymes catalyze reactions different from the acid-sugar dehydratases previously characterized.

However, they did not speculate as to what their function might be. Ultimately, strategies are needed both to identify genes in metagenomes and to determine their functions.

We developed a bioinformatics strategy to identify genes in encoding GREs in metagenomes that have not been previously found in any sequenced organism. We specifically wanted to locate GREs that are distinct from GREs encoded in sequenced microbial genomes,

177 and so we directed our efforts to finding sequences sharing <62% amino acid identity with these

GREs. This value corresponds to the similarity cut off used previously in SSN construction, meaning these new GREs may have distinct activities. Our approach begins with metagenomic assemblies. The raw sequencing data for every metagenome obtained by the HMP consists of millions of reads that are each ~100 base pairs in length,46,55 and metagenomic assemblies for all of the samples they collected were generated using SOAPdenovo and these data used to construct a gene catalog for the human microbiome.50,55

We took advantage of their previously generated stool metagenomic assemblies to mine for novel GREs. I began with the protein sequences in our previously constructed GRE SSN and clustered them at 65% amino acid identity using USEARCH. This reduced the redundancy of this dataset while retaining representatives of all GREs encoded in sequenced genomes.6,41 Each of the resulting 329 GRE sequences was used to search BLAST databases of the 139 HMP stool metagenomic assemblies that passed quality control.56 In brief, a BLAST database was constructed for each assembly, and these translated nucleotide databases were searched with each representative GRE sequence. These computations outputted a list of every GRE in these metagenomes; the resulting lists were dereplicated (redundant contigs found to contain GREs were removed) and short GRE fragments (<350 amino acids) were removed, as our goal was to identify full length sequences. Next, each hit was then compared to the GRE SSN to determine if it was similar to previously sequenced GREs (i.e. could be placed in an existing SSN cluster) or was potentially novel (would form a new cluster on the SSN). Finally, two of the contigs encoding for previously unrecognized GREs were extracted from the assemblies and manually annotated; more complex automated tools are necessary to analyze all of the sequence fragments containing novel GREs.

178

In all, 338 contigs were identified that encode putative GREs that are significantly different from GREs found in previously sequence microbial genomes and are longer than 350 amino acids. Manual annotation of two such contigs illustrates the information that can be gained from this approach. In one sample (SRS011134), a 20 kb contig (contig-100_52.301538) contains a putative GRE at most 53% identical to GREs encoded in sequenced microbial genomes, as well as its putative activating enzyme (Figure 4.7). This enzyme’s function is unknown; it lacks the conserved glutamate in the “dehydratase motif” found in all characterized dehydrating GREs, instead coding for lysine. In addition, there are numerous other enzymes predicted to be involved in sugar metabolism encoded in this contig, hinting that the GRE may be part of a gene cluster. The functions of these other genes cannot be verified without further genetic and/or biochemical characterization, but several can be linked to carbohydrate and polysaccharide metabolism, including acylneuraminate cytidylyltransferase, an N- acetylneuraminate synthase, and two glycosyltransferases (Table 4.1). These annotations suggest that this gene cluster is involved in metabolizing a neuraminic acid derivative such as N- acetylneuraminic acid (sialic acid). Sialic acid is known to be important for host-gut microbial interactions,57 and GREs of unknown function in Escherichia coli have been previously linked to the metabolism of mucins, which line the intestinal epithelial surface and frequently contain sialic acid.58 The discovery of this previously unrecognized enzyme confirms that metagenomic sequencing data encodes GREs that are absent from sequenced microbial genomes, and even though this GRE’s function is unknown, its potential link to carbohydrate utilization in the gut suggests that the involvement of GREs in sugar metabolism by gut microbes is underexplored.

179

1,000 bp

Figure 4.7: Example contig from a metagenomic assembly encoding a GRE. This contig is 20 kb large and encodes for a putative GRE, its cognate activating enzyme, and numerous enzymes predicted to be involved in sugar metabolism.

Table 4.1: Predicted ORFs in a GRE-encoding contig from an HMP metagenome assembly. Predicted functions or broad annotations are deduced by sequence homology. Accession codes are GenBank identifiers. Size Most similar Amino acid Protein (amino Annotation protein sequence identity/similarity (%) acids) PWL77291.1; G1 450 Helicase Clostridiales 98/99 bacterium OKZ28528.1; G2 96 Acyltransferase Bacteroides 98/97 uniformis WP_117710107.1; Lipopolysaccharide G3 490 Bacteroides 99/99 biosynthesis protein vulgatus WP_117710106.1; NAD-dependent G4 369 Bacteroides 99/99 epimerase/dehydratase vulgatus [4Fe–4S] dicluster WP_117710105.1; G5 413 domain-containing Bacteroides 99/99 protein vulgatus RIB31608.1; Maltose O- G6 189 Bacteroides 100/100 acetyltransferase vulgatus WP_118236402.1; G7 336 Acyltransferase Bacteroides 99/99 vulgatus WP_117710103.1; G8 412 Hypothetical Bacteroides 99/99 vulgatus Polysaccharide WP_117710102.1; G9 404 pyruvyl transferase Bacteroides 99/99 family vulgatus 180

Table 4.1 (Continued) WP_117710101.1; G10 384 Glycosyltransferase Bacteroides 94/97 vulgatus WP_117710100.1; G11 368 Glycosyltransferase Bacteroides 84/93 vulgatus RIB31614.1; Serine G12 168 Bacteroides 92/98 acetyltransferase vulgatus WP_118327969.1; N-Acetylneuraminate G13 290 Bacteroides 99/100 synthase vulgatus WP_087406520.1; Acylneuraminate G14 384 Bacteroides sp. 92/95 cytidylyltransferase An279 G15 373 No homolog identified WP_013063575.1; GRE 670 Hypothetical GRE Prevotella 53/69 ruminicola WP_117596626.1; AE 285 Radical SAM protein Bacteroides 40/55 vulgatus AII62346.1; G16 234 Hypothetical 71/80 Bacteroides dorei G17 97 No homolog identified G18 114 No homolog identified

A 13.5 kb long contig in a different metagenome (contig 41475 in metagenome

SRS012273) also encoded a GRE not found in any sequenced genome (Figure 4.8). Unlike the prior example, this GRE does not appear to be part of a readily identifiable gene cluster, as many of the ORFs in this contig are predicted to have unrelated roles (Table 4.2). However, the genes encoding for the GRE and GRE-AE are immediately adjacent to genes encoding a putative transcriptional regulator and alcohol dehydrogenase. This GRE contains the histidine and glutamate residues of the dehydratase motif, suggesting that it may catalyze dehydration of an unknown substrate. The role of the adjacent alcohol dehydrogenase could be to reduce a carbonyl-containing product of this GRE. Unfortunately, this contig does not include genes

181 beyond the putative alcohol dehydrogenase, so the boundaries of this potential gene cluster remain undefined.

1,000 bp Figure 4.8: Second contig encoding a GRE of unknown function. This 13.5 kb contig does not appear to contain a complete gene cluster, but the GRE is encoded immediately adjacent to putative transcriptional regulator, and the GRE-AE is adjacent to a putative alcohol dehydrogenase.

182

Table 4.2: Predicted ORFs in a second GRE-encoding contig. Predicted functions and annotations are deduced by sequence homology. Accession codes are GenBank identifiers. Size Most similar Amino acid Protein (amino Annotation protein sequence identity/similarity (%) acids) ABC-F type WP_053984093.1; H1 507 ribosomal protection Niameybacter 65/78 protein massiliensis WP_016440508.1; D-Tyrosyl-tRNA H2 144 Coprococcus sp. 67/88 deacylase HPP0048 WP_091825957.1; 2-Hydroxyacid H3 333 Butyrivibrio sp. 71/84 dehydrogenase ob235 WP_105203535.1; H4 103 Thioredoxin Neobitarella 56/83 massiliensis WP_009265120.1; Lachnospiraceae H5 126 Cytidine deaminase 56/74 bacterium 5_1_63FAA WP_090014977.1; Biotin-acetyl-CoA- H6 251 Clostridium sp. 44/60 carboxylase ligase DSM 8431 WP_118798666.1; Lachnospiraceae H7 210 Biotin transporter 54/74 bacterium OM04- 12BH WP_070086450.1; H8 404 Hypothetical Merdimonas 33/50 faecis WP_087398501.1; NAD-dependent H9 258 Flavonifractor sp. 60/72 deacylase An9 AraC family WP_112331163.1; H10 276 transcriptional Ruminococcaceae 27/49 regulator bacterium WP_112331162.1; GRE 703 Hypothetical GRE Ruminococcaceae 50/68 bacterium PWM20629.1; GRE activating AE 270 Clostridiales 70/82 enzyme bacterium Alcohol PHX93693.1; H11 366 35/53 dehydrogenase Pedosphaera sp.

183

Future work will involve improving methods to mine for GREs encoded in metagenomes and to develop strategies to determine their functions. We likely identified most, if not all, of the

GREs encoded in these metagenomes, but optimization of the parameters used for BLAST searches could lead to identifying additional contigs encoding GREs. Furthermore, performing these searches in better assembled metagenomes with longer contigs would lead to obtaining more full-length GREs as well as a better understanding of nearby encoded genes. The large number of identified contigs makes prioritizing them challenging, and the GREs in these contigs should be classified and sorted to identify common enzymes and conserved genomic contexts.

SSN or phylogenetic analysis would be valuable for determining how they compare to characterized GREs. ShortBRED could also be used to quantify them in metagenomes.

Furthermore, metagenomic assemblies from other studies could also be analyzed to determine if their encoded GREs are similar to the HMP metagenomes analyzed here. Determining the activities of these GREs will likely be a major challenge. However, many GREs have already been biochemically and structurally characterized, and adopting strategies used for other family members should prove useful.59-60 In particular, the discovery of the gene encoding CutC relied on predicting the function of its gene cluster,61 and this approach may be useful for analyzing contigs encoding GREs. Strategies that leverage existing knowledge of characterized GREs with genomic context will perhaps be most effective.

4.2.5: Conclusions

These results illustrate how CGFP can be used for comparative metagenomic profiling and how alternative strategies can identify GREs in metagenomic assemblies that are not found in sequenced microbial genomes. Using CGFP, the abundance of GREs encoded in the gut microbiomes of healthy and cirrhotic individuals could be measured and compared. Although

184

PCA did not reveal significant differences in abundance of the entire GRE superfamily between the two patient cohorts, individual GREs were present in significantly different amounts. For example, CutC was found in significantly higher abundance in the disease cohort, while IslA was more abundant in the healthy individuals. Importantly, the methods presented here expand on our previous work, which focused only metagenomes from healthy individuals. The links between

GRE abundance and cirrhosis are intriguing, but further research is needed to determine whether these are simply correlations resulting from changes in the gut microbiota composition due to disease or if GREs are playing important roles in cirrhosis development.

Quantification of the GREs encoded in pig gut microbiomes illustrates how CGFP can be extended to metagenomes from sources beyond the human gut. Some of the most abundant

GREs in the human gut microbiota, including PFL, PD, and HypD, were also found in high levels in the pig gut microbiota. Other enzymes were more prevalent in the pig gut metagenomes than the HMP stool metagenomes, including the choline-metabolizing CutC. GRE decarboxylases involved in the production of p-cresol and skatole, compounds generated by pig gut microbes, were not found with CGFP, indicating that additional methods for detecting low- abundance and previously unsequenced genes are needed.

Comparative metagenomics studies were also performed to determine if genes involved in folate biosynthesis were present in differing abundances in infants that were healthy or that had kwashiorkor. Though existing metagenomes lacked the sequencing depth required for

ShortBRED, we quantified pabBC abundance using an alternate approach. The genera of gut microbes encoding these genes were determined, and differences in gene abundances was measured between the healthy and malnourished cohorts. A decrease in pabBC abundance as the

185 infant host aged could also be observed. These results further support a proposed role of

Bifidobacterium in producing folate in the infant gut.

Finally, human gut metagenomic assemblies were mined for GREs significantly different from those in sequenced microbial genomes. Over 300 contigs from these metagenomic assemblies were found to encode for such GREs, hinting at a wide range of GRE-mediated biochemistry that has not yet been explored. Although additional analyses of these novel enzymes are needed, manual annotation of two contigs encoding novel GREs illustrates how the genomic context of such GREs may facilitate their functional characterization. Future efforts towards optimizing assembly searches, analyzing contigs encoding putative GREs, and developing strategies for functional prediction will be crucial for exploring the role GREs play in the gut microbiota.

The human gut microbiota is extraordinarily complex, and functional profiling strategies that can provide insights into new host-gut microbial interactions are essential for guiding future studies. Extending CGFP to additional metagenomic datasets and protein families further highlights how biochemical knowledge can be used to develop bioinformatics methods to study microbial communities from the human gut and other environments. The discovery that there are many GREs encoded in the human gut microbiome that are not found in sequenced microbial genomes underscores how much chemistry remains to be characterized in this habitat.

4.3: Materials and methods

4.3.1: General materials and methods

Computations were run on the Odyssey cluster supported by the FAS Division of

Science, Research Computing Group at Harvard University. All graphs were generated with

Prism 7.05 (GraphPad Software Inc.). Gene cluster figures were constructed with SnapGene

186

Viewer 4.2 (GSL Biotech LLC). All boxplots are Tukey boxplots, with boxes illustrating the median, first quartile (Q1), and third quartile (Q3). Whiskers are extended to include data points between Q1−1.5(Q3−Q1) and Q1 and between Q3 and Q3+1.5(Q3−Q1) (the lower and upper inner fences, respectively). Values outside of this range are individually marked with dots.

4.3.2: Bioinformatics methods for comparative metagenomic profiling

GREs were quantified in the gut metagenomes from healthy and cirrhotic individuals by adapting a reported protocol.6 The list of GREs and the SSN group these GREs into putatively isofunctional clusters were generated previously.6 The 237 metagenomes analyzed were also previously reported.7 ShortBRED was used to quantify the genes of interest in these metagenomes using the default parameters.12

PCA was performed using MATLAB R2018b (MathWorks Inc.). The PCA command used is a component of the standard Statistics and Machine Learning Toolbox. The default parameters were used, with the inputted data matrix consisting of rows corresponding to each metagenome and columns corresponding to abundances of GREs (or GRE clusters).

Comparative analyses and Mann-Whitney U-tests were performed with Prism 7.05 (GraphPad,

Inc.).

4.3.3: Quantifying GREs in pig gut metagenomes

The 295 pig gut metagenomes analyzed were reported previously.27 The GRE SSN used for studying the cirrhosis datasets above was also used for this analysis. The CGFP bioinformatics methodology used for analyzing the cirrhosis datasets was also used for analyzing the pig gut metagenomes, except that two flags were added to the ShortBRED-Quantify program: the aveBP flag was set by using a BioPython script to calculate the average read length of each pig gut metagenome and the minBP flag was then set equal to 0.9 × aveBP.62

187

4.3.4: Quantifying genes in metagenomes using USEARCH

CGFP was used to quantify pabBC in gut metagenomes, except that USEARCH was used instead of ShortBRED for gene quantification.41 The 308 metagenomes analyzed, including 200 from healthy infants and 108 from infants with kwashiorkor, were previously reported.35,37 The list of 11,735 protein sequences encoded by pabB and pabC sequences was generated by Dr.

Lauren Rajakovich (Harvard) by compiling sequences from two InterPro families (version 65.0,

Nov. 6, 2017): IPR005802 (p-aminobenzoate synthase, component I) and IPR001544

(aminotransferase, class IV). The former family includes the translation of the characterized pabB from E. coli (UniProt ID: P05041) and the predicted pabB from B. longum (UniProt ID:

D6ZY47), while the latter family includes translations of the characterized and predicted pabC from E. coli (UniProt ID: P28305) and B. longum (UniProt ID: A0A1S2W2H0), respectively.

Redundant sequences were removed in Geneious 7.1.5 (Biomatters) to generate the list of 11,735 sequences.63 These sequences were clustered using USEARCH to obtain centroid and consensus sequences with the following command: usearch -cluster_fast ./171106_PabBPabChomologs.fasta -id 0.85 -sort length -centroids ./Clustering/PabBCCentroids.fasta -consout ./PabBCConsensus.fasta -alnout ./Alignments/PabBCHuman.aln -msaout ./Alignments/PabBCAlignment.aln -clusters ./Clusters/Cluster.fasta -uc ./clusters.uc

The sets of centroid sequences and consensus sequences were both quantified in the metagenomes of interest. Results were nearly identical; the results from centroid searches are presented in this Chapter. Next, a USEARCH database file was constructed for each set of sequences using the following command: usearch -makeudb_usearch PabBCcentroids.fasta -output PabBCCentroids.db usearch -makeudb_usearch PabBCconsensus.fasta -output PabBCconsensus.db

Local USEARCH was used to quantify the number of contigs in each metagenome encoding for homologs of these genes. The following command was run against each

188 metagenome. Note the accept criteria required ≥95% shared amino acid sequence identity over a minimum alignment length of 30 amino acids: usearch -usearch_local ${metagenome} -db ./PabBCCentroids.db –id 0.95 -mincols 30 –blast6out ./Output/${metagenome}output.txt

The following script was executed for the centroid and consensus sequence searches to tabulate the number and identities of the hits in each metagenome:

#!/bin/bash touch output.tsv echo “Centroid\Metagenome” > Output.tsv for i in./Output/*; do sed -i "s|$|\t${i##*/}|" Output.tsv #Note pipe as separator done cat ./Output/* | awk ‘{print $2}’ | sort –u >> Output.tsv

#Output file is inputted, read discards first line #While loop makes list of centroids from first column of output file #For centroid, each metagenome is searched to count number of hits #The resulting value added to the end of the correct line sed 1d ./Output.tsv | while read centroid; do for metagenome in ./Output/*; do value="$(grep ${centroid} ${metagenome} | wc -l )" sed -i "/${centroid}/ s/$/\t${value}/" Output.tsv done done

The resulting abundance values are not normalized and so were converted to units of reads per kilobase per million reads (RPKM). For each gene and metagenome, the number of hits was divided by the length of the gene and the number of reads in each metagenome and then multiplied by 109. These normalized units enable comparisons between genes and metagenomes of differing size and sequencing depth.12

The pabB and pabC genes were also quantified in high-quality HMP stool metagenomes following a previously reported protocol, except that the list of sequences generated here was as the inputted proteins of interest.6,46

189

4.3.5: Identifying genes in metagenomic assemblies

To begin, all of the GREs collected in a previous study6 were clustered at 65% amino acid identity using USEARCH: usearch -cluster_smallmem GRE_List.fasta -id 0.65 -usersort -centroids 65RepSeq.fasta -uc 65RepSeq.uc -clusters Clusters/c_

Next, BLAST databases were made for all of the metagenomes studied, and then the resulting translated nucleotide databases were searched using BLAST for GREs of interest.56 makeblastdb -in /scratch/${metagenome}/${metagenome}.fna -input_type fasta - parse_seqids -dbtype 'nucl' tblastn -query ./65RepSeq.fasta –db /scratch/${metagenome}/${metagenome}.fna -evalue 1e-20 -max_hsps 1 -outfmt "6 std sseq" -out ./Output/${metagenome}Output.txt

Each output file contained many redundant samples; many contigs were identified by several GRE queries to encode for GREs and so the output files were dereplicated. Repeated rows in output files (represented multiple hits to the same contig) were removed. Contigs whose alignments with GREs were less than 350 amino acids were also removed. Finally, the GRE (or partial GRE) encoded by each contig was compared a BLAST database constructed with the

GREs present in the original SSN. GREs encoded in contigs were then sorted as either similar to known GREs (i.e. ≥62% shared sequence identity), a new GRE (<62% and ≥35%), or too low identity to accurately access (<35%). The resulting 338 contigs encoded for the putative GREs of interest. ORFs were identified manually in SnapGene Viewer 4.2 (GSL Biotech LLC), and the protein sequence in sequenced genomes most similar to each ORF was found using the protein

BLAST service at the National Center for Biotechnology Information (NCBI).

4.4: References

(1) Wang, J.; Jia, H. Metagenome-wide association studies: fine-mining the microbiome. Nat. Rev. Microbiol. 2016, 14, 508-522.

190

(2) Jovel, J.; Patterson, J.; Wang, W.; Hotte, N.; O'Keefe, S.; Mitchel, T.; Perry, T.; Kao, D.; Mason, A. L.; Madsen, K. L.; Wong, G. K.-S. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics. Front. Microbiol. 2016, 7, 459.

(3) Smillie, C. S.; Smith, M. B.; Friedman, J.; Cordero, O. X.; David, L. A.; Alm, E. J. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 2011, 480, 241-244.

(4) Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K. S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; Mende, D. R.; Li, J.; Xu, J.; Li, S.; Li, D.; Cao, J.; Wang, B.; Liang, H.; Zheng, H.; Xie, Y.; Tap, J.; Lepage, P.; Bertalan, M.; Batto, J.-M.; Hansen, T.; Le Paslier, D.; Linneberg, A.; Nielsen, H. B.; Pelletier, E.; Renault, P.; Sicheritz-Ponten, T.; Turner, K.; Zhu, H.; Yu, C.; Li, S.; Jian, M.; Zhou, Y.; Li, Y.; Zhang, X.; Li, S.; Qin, N.; Yang, H.; Wang, J.; Brunak, S.; Doré, J.; Guarner, F.; Kristiansen, K.; Pedersen, O.; Parkhill, J.; Weissenbach, J.; MetaHIT Consortium; Antolin, M.; Artiguenave, F.; Blottiere, H.; Borruel, N.; Bruls, T.; Casellas, F.; Chervaux, C.; Cultrone, A.; Delorme, C.; Denariaz, G.; Dervyn, R.; Forte, M.; Friss, C.; van de Guchte, M.; Guedon, E.; Haimet, F.; Jamet, A.; Juste, C.; Kaci, G.; Kleerebezem, M.; Knol, J.; Kristensen, M.; Layec, S.; Le Roux, K.; Leclerc, M.; Maguin, E.; Melo Minardi, R.; Oozeer, R.; Rescigno, M.; Sanchez, N.; Tims, S.; Torrejon, T.; Varela, E.; de Vos, W.; Winogradsky, Y.; Zoetendal, E.; Bork, P.; Ehrlich, S. D.; Wang, J. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59- 65.

(5) Schnoes, A. M.; Brown, S. D.; Dodevski, I.; Babbitt, P. C. Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. PLoS Comput. Biol. 2009, 5, e1000605.

(6) Levin, B. J.; Huang, Y. Y.; Peck, S. C.; Wei, Y.; Martínez-del Campo, A.; Marks, J. A.; Franzosa, E. A.; Huttenhower, C.; Balskus, E. P. A prominent glycyl radical enzyme in human gut microbiomes metabolizes trans-4-hydroxy-L-proline. Science 2017, 355, eaai8386.

(7) Qin, N.; Yang, F.; Li, A.; Prifti, E.; Chen, Y.; Shao, L.; Guo, J.; Le Chatelier, E.; Yao, J.; Wu, L.; Zhou, J.; Ni, S.; Liu, L.; Pons, N.; Batto, J. M.; Kennedy, S. P.; Leonard, P.; Yuan, C.; Ding, W.; Chen, Y.; Hu, X.; Zheng, B.; Qian, G.; Xu, W.; Ehrlich, S. D.; Zheng, S.; Li, L. Alterations of the human gut microbiome in liver cirrhosis. Nature 2014, 513, 59-64.

(8) Knight, R.; Vrbanac, A.; Taylor, B. C.; Aksenov, A.; Callewaert, C.; Debelius, J.; Gonzalez, A.; Kosciolek, T.; McCall, L.-I.; McDonald, D.; Melnik, A. V.; Morton, J. T.; Navas, J.; Quinn, R. A.; Sanders, J. G.; Swafford, A. D.; Thompson, L. R.; Tripathi, A.; Xu, Z. Z.; Zaneveld, J. R.; Zhu, Q.; Caporaso, J. G.; Dorrestein, P. C. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 2018, 16, 410-422.

(9) Dumas, M.-E.; Barton, R. H.; Toye, A.; Cloarec, O.; Blancher, C.; Rothwell, A.; Fearnside, J.; Tatoud, R.; Blanc, V.; Lindon, J. C.; Mitchell, S. C.; Holmes, E.;

191

McCarthy, M. I.; Scott, J.; Gauguier, D.; Nicholson, J. K. Metabolic profiling reveals a contribution of gut microbiota to fatty liver phenotype in insulin-resistant mice. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 12511-12516.

(10) Zeisel, S. H.; Warrier, M. Trimethylamine N-Oxide, the Microbiome, and Heart and Kidney Disease. Annu. Rev. Nutr. 2017, 37, 157-181.

(11) Chalasani, N.; Younossi, Z.; Lavine, J. E.; Diehl, A. M.; Brunt, E. M.; Cusi, K.; Charlton, M.; Sanyal, A. J. The Diagnosis and Management of Non-alcoholic Fatty Liver Disease: Practice Guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association. Am. J. Gastroenterol. 2012, 107, 811-826.

(12) Kaminski, J.; Gibson, M. K.; Franzosa, E. A.; Segata, N.; Dantas, G.; Huttenhower, C. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLoS Comput. Biol. 2015, 11, e1004557.

(13) Dinsdale, E.; Edwards, R.; Bailey, B.; Tuba, I.; Akhter, S.; McNair, K.; Schmieder, R.; Apkarian, N.; Creek, M.; Guan, E.; Hernandez, M.; Isaacs, K.; Peterson, C.; Regh, T.; Ponomarenko, V. Multivariate Analysis of Functional Metagenomes. Front. Genet. 2013, 4, 41.

(14) Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Series B Stat. Methodol. 1995, 57, 289-300.

(15) Peck, S. C.; Denger, K.; Burrichter, A.; Irwin, S. M.; Balskus, E. P.; Schleheck, D. A glycyl radical enzyme enables hydrogen sulfide production by the human intestinal bacterium Bilophila wadsworthia. Proc. Natl. Acad. Sci. U. S. A. 2019, accepted.

(16) David, L. A.; Maurice, C. F.; Carmody, R. N.; Gootenberg, D. B.; Button, J. E.; Wolfe, B. E.; Ling, A. V.; Devlin, A. S.; Varma, Y.; Fischbach, M. A.; Biddinger, S. B.; Dutton, R. J.; Turnbaugh, P. J. Diet rapidly and reproducibly alters the human gut microbiome. Nature 2013, 505, 559-563.

(17) Natividad, J. M.; Lamas, B.; Pham, H. P.; Michel, M.-L.; Rainteau, D.; Bridonneau, C.; da Costa, G.; van Hylckama Vlieg, J.; Sovran, B.; Chamignon, C.; Planchais, J.; Richard, M. L.; Langella, P.; Veiga, P.; Sokol, H. Bilophila wadsworthia aggravates high fat diet induced metabolic dysfunctions in mice. Nat. Commun. 2018, 9, 2802.

(18) Devkota, S.; Wang, Y.; Musch, M. W.; Leone, V.; Fehlner-Peach, H.; Nadimpalli, A.; Antonopoulos, D. A.; Jabri, B.; Chang, E. B. Dietary-fat-induced taurocholic acid promotes pathobiont expansion and colitis in Il10−/− mice. Nature 2012, 487, 104-108.

(19) Ridlon, J. M.; Alves, J. M.; Hylemon, P. B.; Bajaj, J. S. Cirrhosis, bile acids and gut microbiota. Gut Microbes 2013, 4, 382-387.

(20) Chiang, J. Y. L. Bile acids: regulation of synthesis. J. Lipid Res. 2009, 50, 1955-1966.

192

(21) Looft, T.; Johnson, T. A.; Allen, H. K.; Bayles, D. O.; Alt, D. P.; Stedtfeld, R. D.; Sul, W. J.; Stedtfeld, T. M.; Chai, B.; Cole, J. R.; Hashsham, S. A.; Tiedje, J. M.; Stanton, T. B. In-feed antibiotic effects on the swine intestinal microbiome. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 1691-1696.

(22) Crespo-Piazuelo, D.; Estellé, J.; Revilla, M.; Criado-Mesas, L.; Ramayo-Caldas, Y.; Óvilo, C.; Fernández, A. I.; Ballester, M.; Folch, J. M. Characterization of bacterial microbiota compositions along the intestinal tract in pigs and their interactions and functions. Sci. Rep. 2018, 8, 12727.

(23) Mach, N.; Berri, M.; Estellé, J.; Levenez, F.; Lemonnier, G.; Denis, C.; Leplat, J.-J.; Chevaleyre, C.; Billon, Y.; Doré, J.; Rogel-Gaillard, C.; Lepage, P. Early-life establishment of the swine gut microbiome and impact on host phenotypes. Environ. Microbiol. Rep. 2015, 7, 554-569.

(24) Lunney, J. K. Advances in Swine Biomedical Model Genomics. Int. J. Biol. Sci. 2007, 3, 179-184.

(25) Liu, D.; Wei, Y.; Liu, X.; Zhou, Y.; Jiang, L.; Yin, J.; Wang, F.; Hu, Y.; Nanjaraj Urs, A. N.; Liu, Y.; Ang, E. L.; Zhao, S.; Zhao, H.; Zhang, Y. Indoleacetate decarboxylase is a glycyl radical enzyme catalysing the formation of malodorant skatole. Nat. Commun. 2018, 9, 4224.

(26) Zamaratskaia, G.; Squires, E. J. Biochemical, nutritional and genetic effects on boar taint in entire male pigs. Animal 2009, 3, 1508-1521.

(27) Xiao, L.; Estellé, J.; Kiilerich, P.; Ramayo-Caldas, Y.; Xia, Z.; Feng, Q.; Liang, S.; Pedersen, A. Ø.; Kjeldsen, N. J.; Liu, C.; Maguin, E.; Doré, J.; Pons, N.; Le Chatelier, E.; Prifti, E.; Li, J.; Jia, H.; Liu, X.; Xu, X.; Ehrlich, S. D.; Madsen, L.; Kristiansen, K.; Rogel-Gaillard, C.; Wang, J. A reference gene catalogue of the pig gut microbiome. Nat. Microbiol. 2016, 1, 16161.

(28) Yokoyama, M. T.; Carlson, J. R.; Holdeman, L. V. Isolation and characteristics of a skatole-producing Lactobacillus sp. from the bovine rumen. Appl. Environ. Microbiol. 1977, 34, 837-842.

(29) Jensen, M. T.; Cox, R. P.; Jensen, B. B. 3-Methylindole (skatole) and indole production by mixed populations of pig fecal bacteria. Appl. Environ. Microbiol. 1995, 61, 3180- 3184.

(30) Nicholson, J. K.; Holmes, E.; Kinross, J.; Burcelin, R.; Gibson, G.; Jia, W.; Pettersson, S. Host-Gut Microbiota Metabolic Interactions. Science 2012, 336, 1262-1267.

(31) LeBlanc, J. G.; Milani, C.; de Giori, G. S.; Sesma, F.; van Sinderen, D.; Ventura, M. Bacteria as vitamin suppliers to their host: a gut microbiota perspective. Curr. Opin. Biotechnol. 2013, 24, 160-168.

193

(32) Pompei, A.; Cordisco, L.; Amaretti, A.; Zanoni, S.; Matteuzzi, D.; Rossi, M. Folate Production by Bifidobacteria as a Potential Probiotic Property. Appl. Environ. Microbiol. 2007, 73, 179-185.

(33) Klipstein, F. A.; Samloff, I. M. Folate Synthesis by Intestinal Bacteria. Am. J. Clin. Nutr. 1966, 19, 237-246.

(34) Rossi, M.; Amaretti, A.; Raimondi, S. Folate Production by Probiotic Bacteria. Nutrients 2011, 3, 118.

(35) Yatsunenko, T.; Rey, F. E.; Manary, M. J.; Trehan, I.; Dominguez-Bello, M. G.; Contreras, M.; Magris, M.; Hidalgo, G.; Baldassano, R. N.; Anokhin, A. P.; Heath, A. C.; Warner, B.; Reeder, J.; Kuczynski, J.; Caporaso, J. G.; Lozupone, C. A.; Lauber, C.; Clemente, J. C.; Knights, D.; Knight, R.; Gordon, J. I. Human gut microbiome viewed across age and geography. Nature 2012, 486, 222-227.

(36) Williams, C. D. Kwashiorkor: A Nutritional Disease of Children Associated with a Maize Diet. Lancet 1935, 226, 1151-1152.

(37) Smith, M. I.; Yatsunenko, T.; Manary, M. J.; Trehan, I.; Mkakosya, R.; Cheng, J.; Kau, A. L.; Rich, S. S.; Concannon, P.; Mychaleckyj, J. C.; Liu, J.; Houpt, E.; Li, J. V.; Holmes, E.; Nicholson, J.; Knights, D.; Ursell, L. K.; Knight, R.; Gordon, J. I. Gut Microbiomes of Malawian Twin Pairs Discordant for Kwashiorkor. Science 2013, 339, 548-554.

(38) He, Z.; Toney, M. D. Direct Detection and Kinetic Analysis of Covalent Intermediate Formation in the 4-Amino-4-deoxychorismate Synthase Catalyzed Reaction. Biochemistry 2006, 45, 5019-5028.

(39) Ye, Q. Z.; Liu, J.; Walsh, C. T. p-Aminobenzoate synthesis in Escherichia coli: Purification and characterization of PabB as aminodeoxychorismate synthase and enzyme X as aminodeoxychorismate lyase. Proc. Natl. Acad. Sci. U. S. A. 1990, 87, 9391-9395.

(40) Green, J. M.; Nichols, B. P. p-Aminobenzoate biosynthesis in Escherichia coli. Purification of aminodeoxychorismate lyase and cloning of pabC. J. Biol. Chem. 1991, 266, 12971-12975.

(41) Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26, 2460-2461.

(42) Koenig, J. E.; Spor, A.; Scalfone, N.; Fricker, A. D.; Stombaugh, J.; Knight, R.; Angenent, L. T.; Ley, R. E. Succession of microbial consortia in the developing infant gut microbiome. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 4578-4585.

(43) Wei, Y.-X.; Zhang, Z.-Y.; Liu, C.; Zhu, Y.-Z.; Zhu, Y.-Q.; Zheng, H.; Zhao, G.-P.; Wang, S.; Guo, X.-K. Complete Genome Sequence of Bifidobacterium longum JDM301. J. Bacteriol. 2010, 192, 4076-4077.

194

(44) Sela, D. A.; Chapman, J.; Adeuya, A.; Kim, J. H.; Chen, F.; Whitehead, T. R.; Lapidus, A.; Rokhsar, D. S.; Lebrilla, C. B.; German, J. B.; Price, N. P.; Richardson, P. M.; Mills, D. A. The genome sequence of Bifidobacterium longum subsp. infantis reveals adaptations for milk utilization within the infant microbiome. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 18964-18969.

(45) Palmer, C.; Bik, E. M.; DiGiulio, D. B.; Relman, D. A.; Brown, P. O. Development of the Human Infant Intestinal Microbiota. PLoS Biol. 2007, 5, e177.

(46) The Human Microbiome Project Consortium; Huttenhower, C. Structure, function and diversity of the healthy human microbiome. Nature 2012, 486, 207-214.

(47) Gerlt, J. A.; Bouvier, J. T.; Davidson, D. B.; Imker, H. J.; Sadkhin, B.; Slater, D. R.; Whalen, K. L. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta 2015, 1854, 1019-1037.

(48) Mitchell, A. L.; Attwood, T. K.; Babbitt, P. C.; Blum, M.; Bork, P.; Bridge, A.; Brown, S. D.; Chang, H.-Y.; El-Gebali, S.; Fraser, M. I.; Gough, J.; Haft, D. R.; Huang, H.; Letunic, I.; Lopez, R.; Luciani, A.; Madeira, F.; Marchler-Bauer, A.; Mi, H.; Natale, D. A.; Necci, M.; Nuka, G.; Orengo, C.; Pandurangan, A. P.; Paysan-Lafosse, T.; Pesseat, S.; Potter, S. C.; Qureshi, M. A.; Rawlings, N. D.; Redaschi, N.; Richardson, L. J.; Rivoire, C.; Salazar, G. A.; Sangrador-Vegas, A.; Sigrist, C. J. A.; Sillitoe, I.; Sutton, G. G.; Thanki, N.; Thomas, P. D.; Tosatto, S. C. E.; Yong, S.-Y.; Finn, R. D. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2018, gky1100.

(49) Finn, R. D.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Mistry, J.; Mitchell, A. L.; Potter, S. C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A.; Salazar, G. A.; Tate, J.; Bateman, A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016, 44, D279-D285.

(50) Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G.; Kristiansen, K.; Li, S.; Yang, H.; Wang, J.; Wang, J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20, 265-272.

(51) Noguchi, H.; Park, J.; Takagi, T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 2006, 34, 5623-5630.

(52) Kanehisa, M.; Sato, Y.; Furumichi, M.; Morishima, K.; Tanabe, M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 2018, gky962-gky962.

(53) Brown, S. D.; Babbitt, P. C. Inference of Functional Properties from Large-scale Analysis of Enzyme Superfamilies. J. Biol. Chem. 2012, 287, 35-42.

(54) Pegg, S. C. H.; Brown, S. D.; Ojha, S.; Seffernick, J.; Meng, E. C.; Morris, J. H.; Chang, P. J.; Huang, C. C.; Ferrin, T. E.; Babbitt, P. C. Leveraging Enzyme Structure−Function

195

Relationships for Functional Inference and Experimental Design: The Structure−Function Linkage Database. Biochemistry 2006, 45, 2545-2555.

(55) The Human Microbiome Project Consortium; Methé, B. A. A framework for human microbiome research. Nature 2012, 486, 215-221.

(56) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403-410.

(57) Tailford, L. E.; Crost, E. H.; Kavanaugh, D.; Juge, N. Mucin glycan foraging in the human gut microbiome. Front. Genet. 2015, 6, 81.

(58) Chang, D.-E.; Smalley, D. J.; Tucker, D. L.; Leatham, M. P.; Norris, W. E.; Stevenson, S. J.; Anderson, A. B.; Grissom, J. E.; Laux, D. C.; Cohen, P. S.; Conway, T. Carbon nutrition of Escherichia coli in the mouse intestine. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 7427-7432.

(59) Levin, B. J.; Balskus, E. P. Discovering radical-dependent enzymes in the human gut microbiota. Curr. Opin. Chem. Biol. 2018, 47, 86-93.

(60) Backman, L. R. F.; Funk, M. A.; Dawson, C. D.; Drennan, C. L. New tricks for the glycyl radical enzyme family. Crit. Rev. Biochem. Mol. Biol. 2017, 52, 674-695.

(61) Craciun, S.; Balskus, E. P. Microbial conversion of choline to trimethylamine requires a glycyl radical enzyme. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 21307-21312.

(62) Cock, P. J. A.; Antao, T.; Chang, J. T.; Chapman, B. A.; Cox, C. J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, M. J. L. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422-1423.

(63) Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; Thierer, T.; Ashton, B.; Meintjes, P.; Drummond, A. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647-1649.

196

Chapter 5: Discovery of a glycyl radical enzyme catalyzing indole-3-acetate decarboxylationi

5.1: Introduction

The gut microbiotas of animals other than humans have been studied thoroughly over the past decade,1 including from macaques,2 bats,3 flying squirrels,4 sea lions,5 dolphins,6 alligators,7 zebrafish,8 honey bees,9-10 termites,11 and other organisms. In particular, swine and ruminant gut microbiomes have been repeatedly studied due to these animals’ important roles in agriculture.12-

18 For example, the addition of subtherapeutic doses of antibiotics in pig diets has been shown to alter their gut microbiotas and increase the abundance of both antibiotic resistance genes and pathogenic bacteria.12 Other studies have found correlations between pig gut microbiota composition and feed efficiency and other host phenotypes.19-20 In cattle, characterization of rumen microbes is critical for deciphering how otherwise indigestible plant fibers are fermented, for improving the efficiency of raising livestock, and for reducing their greenhouse gas emissions.13,16-17 While both swine and ruminant gut microbiotas are known to impact host physiology, the underlying biochemical mechanisms of many microbe-host interactions and their implications for agriculture remain unidentified.18,21

In pigs, gut microbes have been linked to a condition known as boar taint (Figure

5.1A).22-23 Boar taint is an unpleasant odor or taste found in pork products derived primarily from intact (uncastrated) male pigs, and it is the main reason that male pigs are not used for pork production in most countries. Two compounds that accumulate in pig meat are responsible for boar taint: androstenone24 and skatole (3-methylindole).25 The former compound is a steroid produced in the testis of male pigs.26 Because castration removes the source of androstenone, this

iBeverly Fu (Harvard) was responsible for collecting some of the data presented in this Chapter. Her contributions are highlighted in the Materials and methods section. 197 procedure is the primary method of preventing boar taint. The role of skatole in boar taint is more complex. Skatole is a tryptophan-derived metabolite produced by pig gut microbes.27

Although fecal skatole concentrations are similar in all pigs,28 skatole is only found in high concentrations in adipose tissue from intact male pigs.22 Boar taint caused by skatole is correlated with the increased production, more rapid intestinal absorption, and faster hepatic metabolism of this metabolite, but the precise mechanisms by which these factors might affect skatole accumulation and their link to castration are largely unknown.22

Figure 5.1: Biological relevance of skatole production in livestock. (A) In boars, host- or diet-derived tryptophan is metabolized by gut microbes to skatole, which accumulates within adipose tissue. This process gives rise to an unpleasant taste and odor in pork. (B) Dietary tryptophan, found in lush pastures, is processed by the cattle rumen microbiota to skatole, which is oxidized by host enzymes to form compounds that cause respiratory distress.

198

In cattle, skatole production by gut microbes has been linked to acute bovine pulmonary emphysema and edema (ABPEE) (Figure 5.1B).29-30 ABPEE is characterized by acute respiratory distress and mainly affects cattle over two years old. Morbidity rates are near 50%, while mortality rates are closer to 10%.29 ABPEE typically occurs after cattle are moved to better, often lush, pastures that are rich in protein. The increased concentrations of L-tryptophan in such grasses are converted by gut microbes to indole-3-acetate (I3A) and ultimately to skatole.31 The skatole enters the bloodstream and eventually reaches the lungs, where it is oxidized by host proteins form 3-methyleneindolenine and other uncharacterized reactive metabolites that cause pneumotoxicity.32-34 Treating cattle with antibiotics before transitioning to lush pastures reduces rates of ABPEE by decreasing rates of skatole production.35 The efficacy of such an intervention also confirms a causal role for gut microbes in this disease.

Despite the connections between skatole production and both boar taint and ABPEE, the gut microbial genes and enzymes responsible for skatole production were unknown at the beginning of these studies. Several gut microbes found in animals had been reported to produce skatole from tryptophan or indole-3-acetate (I3A), but the enzymes responsible were not identified. The first bacterial isolate found to convert I3A to skatole was a Lactobacillus species; interestingly this strain could not convert tryptophan to I3A, suggesting that multiple organisms could be required for the full conversion of tryptophan to skatole.36-37 Subsequent experiments led to the identification of two Clostridium species, Clostridium scatologenes ATCC 25775 and

Clostridium drakei SL1 DSM 12750, that produced skatole from I3A.38 It was also reported that the indole derivatives indole-3-carbinol and indole-3-acetonitrile inhibit the decarboxylation of

I3A to skatole in pig fecal slurries.27 More recent studies have focused on microbiological

199 characterization of C. scatologenes and Olsenella scatoligenes, which can both produce skatole.38-42

In this Chapter, I describe the discovery of the GRE indole-3-acetate decarboxylase

(IAD) from two gut microbes. Based on parallels with the chemistry performed by characterized

GREs, we hypothesized that the conversion of I3A to skatole was likely performed by a new member of this enzyme family. Comparative genomics was used to identify putative skatole- forming GREs in Olsenella uli and C. scatologenes. Both enzymes were verified to catalyze the decarboxylation of I3A to produce skatole in vitro. Intriguingly, the IAD from C. scatologenes only produced skatole in the presence of both a canonical GRE activating enzyme (IAD-AE) and a second enzyme encoded adjacent to the GRE in C. scatologenes genome. This additional protein resembles members of the radical S-adenosylmethionine (SAM) enzyme superfamily but lacks significant sequence similarity to other GRE-AEs. The discovery of these enzymes establishes a molecular mechanism for skatole formation in gut microbiotas and expands the scope of transformations performed by GREs. Moreover, the observation that an additional enzyme is required for C. scatologenes IAD activity suggests that distinct processes for GRE activation and/or catalysis remain to be characterized.

5.2: Results and discussion

5.2.1: Identification of a putative indole-3-acetate decarboxylase (IAD) in Olsenella uli

Our search for the enzymes responsible for I3A decarboxylation was guided by the similarities between this reaction the transformation catalyzed by the characterized GRE 4- hydroxyphenylacetate decarboxylase (4-HPAD) (Figure 5.2A). Biological decarboxylation reactions typically involve β-ketoacid substrates and can proceed via two-electron mechanisms.

However, as 4-hydroxyphenylacetate lacks an electron-accepting functional group, its

200 decarboxylation requires radical chemistry.43 A reaction mechanism for 4-HPAD was initially proposed that involved the thiyl radical of the enzyme abstracting the phenolic hydrogen atom from the substrate.44-45 With less electron density in its aromatic system, this substrate-based radical intermediate could undergo decarboxylation; the polarity inversion, or “Umpolung”, resulting from hydrogen atom abstraction makes the typically electron-donating 4- hydroxyphenyl group significantly more electrophilic. After decarboxylation, protonation of this radical anion’s benzylic carbon followed by hydrogen atom abstraction by the phenolic oxygen atom would generate p-cresol. However, a crystal structure of 4-HPAD with substrate bound seemingly indicated that the carboxylate group, instead of the hydroxyl group, was potentially situated near the catalytic cysteine.46 Computational work guided by this structure suggested that

4-HPAD operates via a Kolbe decarboxylation mechanism.47 The Kolbe decarboxylation involves the one-electron oxidation of carboxylate groups, which decarboxylate to produce radical intermediates.48 These studies have led to a proposed mechanism for 4-HPAD that involves an initial proton coupled electron transfer from the thiyl radical of the GRE to the substrate carboxylate (Figure 5.2B).43 The resulting substrate-centered radical can then undergo decarboxylation, with the subsequent benzylic radical stabilized by overlap with the adjacent aromatic system. Finally, the benzylic radical intermediate abstracts a hydrogen atom from the conserved cysteine, which produces p-cresol and regenerates the thiyl radical that initiated catalysis. As discussed in more detail below, the details of this mechanism are potentially inconsistent with the reactivity of other decarboxylating GREs, so additional biochemical studies of 4-HPAD are needed to clarify how this enzyme mediates catalysis.

201

Figure 5.2: Similarities between I3A and 4-hydroxyphenylacetate decarboxylation. (A) The reaction catalyzed by the GRE 4-HPAD resembles the proposed decarboxylation of I3A to skatole, suggesting that I3A decarboxylation may be performed by a GRE. (B) Proposed mechanism for the decarboxylation of 4-hydroxyphenylacetate by 4-HPAD. Proton coupled electron transfer leads to oxidation of the carboxylate group of the substrate, generating a substrate-based radical. This facilitates decarboxylation, and a hydrogen atom transfer to the resulting product-based radical generates p-cresol.

The similarities between 4-hydroxyphenylacetate and I3A led us to hypothesize that a

GRE might be responsible for converting I3A to skatole. 4-Hydroxyphenylacetate and I3A both contain an electron-rich aromatic ring bonded to a carboxylate group, and factors similar to those complicating the decarboxylation of the former compound also apply to the decarboxylation of

202

I3A. 4-HPAD circumvents these challenges with radical chemistry, and we predicted a similar enzyme may act on I3A. Notably, I3A lacks the phenolic hydrogen atom that is present in 4- hydroxyphenylacetate and predicted to be important for decarboxylation in both previously proposed mechanisms for 4-HPAD. However, the recent discovery of a GRE catalyzing phenylacetate decarboxylation suggests that no such hydroxyl group is necessary, which contradicts the key role this group has in the previously proposed 4-HPAD mechanisms.49

We therefore searched the genomes of skatole-producing organisms for candidate I3A decarboxylating GREs. We began by examining the genome of Olsenella scatoligenes, a Gram- positive, anaerobic bacterium that was isolated from pig feces and was found to decarboxylate

I3A to skatole.41 The ability of three related Olsenella species to produce skatole was also determined.41 In an effort to identify pathways for skatole production in these microbes, Li et al. sequenced the O. scatoligenes genome, but no enzymes involved to skatole production were identified.42 Overall, when we began our search, four sequenced Olsenella species had been characterized for their ability to produce skatole: O. scatoligenes SK9K4T, O. uli DSM 7084T, O. profusa DSM 13989T, and O. umbonata DSM 22620T.41-42,50 O. scatoligenes and O. uli can decarboxylate I3A, while the other two organisms cannot perform this reaction. With this knowledge, I identified all of the GREs encoded by these four organisms by performing BLAST searches using the sequences of known GREs, including 4-HPAD.51 Crucially, the O. scatoligenes and O. uli genomes encode four and five GREs respectively, while the O. profusa and O. umbonata genomes each encode only three GREs. Comparing the GREs in the skatole- producers to those in the non-producers revealed one GRE that was found only in the skatole- producers. Therefore this enzyme became a prime candidate for catalyzing I3A decarboxylation.

203

5.2.2: Identification of a putative IAD in Clostridium scatologenes

We sought to complement our search for IAD in Olsenella strains by mining the genome of Clostridium scatologenes ATCC 25775, another strain known to produce skatole, for an additional putative IAD. Isotope labeling studies with deuterated I3A established that C. scatologenes decarboxylates I3A to yield skatole,38 and a GRE was previously hypothesized to be responsible for this activity, but previous efforts to identify and isolate this enzyme were unsuccessful.44 We relied on the published genome sequence of C. scatologenes in order to identify a putative IAD.52 Using a search strategy similar to that employed for the Olsenella genomes,51 we identified five GREs encoded by C. scatologenes: choline trimethylamine-lyase

53 (shared 79% amino acid identity to a characterized representative) , trans-4-L-hydroxyproline dehydratase (81% identical)54, 4-HPAD,55 and two putative GREs of unknown function. One of the uncharacterized GREs was 33% identical to 4-HPAD and 51% identical to the putative IAD in O. uli, suggesting that it was a promising candidate for skatole production.

Unexpectedly, the gene encoding this putative IAD in C. scatologenes is colocalized with two genes encoding putative radical SAM enzymes (Figure 5.3). The enzyme encoded immediately downstream of the putative IAD is expected to be the IAD activating enzyme (IAD-

AE), as it is more similar to characterized GRE-AEs than the other enzyme (32% amino acid identity with pyruvate formate-lyase activating enzyme vs. 8% identity). The other protein

(rSAM2), encoded upstream of the GRE, is significantly longer than typical GRE-AEs (475 amino acids vs. <300 amino acids for GRE-AEs), and it lacks a CX3CX2C motif near its N- terminus, which is found in all GRE-AEs and other radical SAM enzymes.56 This enzyme does contain an internal CX3CX2C motif predicted to bind an [4Fe–4S] cluster, as well as a C2X4C motif that could also be involved in binding an iron–sulfur cluster. The characterized GREs 4-

204

HPAD and benzylsuccinate synthase (BSS) require small subunits that bind [4Fe–4S] clusters. In

4-HPAD the small γ-subunit was found to bind two [4Fe–4S] clusters, while BSS contains two small subunits that each bind one [4Fe–4S] cluster, but the roles of all of these clusters remain unknown.43,57 At this point in our studies, the relationship between rSAM2 and IAD was unclear, but we discovered later that the presence of rSAM2 is necessary to observe skatole formation by

CsIAD.

1,000 bp

Figure 5.3: Genomic context for the putative IAD in C. scatologenes. This GRE is encoded next to two radical SAM enzymes. The enzyme immediately downstream of the GRE shares greater similarity to characterized GRE-AEs, and is predicted to be activating enzyme for this GRE. The upstream enzyme can be annotated as a putative radical SAM enzyme, but its function is unknown.

5.2.3: Construction of expression hosts constitutively expressing genes involved in iron-sulfur cluster assembly

Radical SAM enzymes, including GRE-AEs, require at least one and sometimes multiple

[4Fe–4S] clusters for activity.58-59 However, heterologous expression and purification of these enzymes can be challenging, as some radical SAM enzymes do not fold properly if the [4Fe–4S] clusters are not present.60-61 The native system for assembling iron–sulfur clusters in E. coli is encoded by the isc operon.62 Coexpression of the isc pathway with iron–sulfur cluster containing proteins can sometimes improve the solubility, yield, and purity of heterologously expressed proteins in E. coli.63 Similar work has also been performed with the suf gene cluster, which is involved in iron–sulfur cluster assembly under conditions of oxidative stress or iron limitation.63-

64 GRE-AEs have been overexpressed from E. coli coexpressing the isc cluster successfully.53-54

205

However, in practice these methods rely on utilization of a second plasmid that encodes the isc

(or suf) cluster, and it is unclear how well the bacterial host accommodates this second plasmid.65

To potentially improve expression of GRE-AEs, I constructed an E. coli expression host that constitutively expresses its native isc system. The construction of this E coli mutant has been performed previously.66 The isc operon is regulated by the regulatory protein encoded by iscR.

An expression host with this gene deleted should therefore constitutively express iron–sulfur cluster assembly proteins.67 An E. coli mutant with an antibiotic resistance cassette inserted into the iscR gene had been previously constructed as a part of the Keio collection.68 I used P1 transduction to clone this mutation into the expression strain E. coli BL21(DE3) to generate E. coli BL21(DE3) ΔiscR::kan, and then used the FLP-recombinase system to remove the antibiotic resistance cassette,69 resulting in the E. coli BL21(DE3) ΔiscR expression host. This expression host has been used previously to improve the activity of enzymes containing iron–sulfur clusters heterologously expressed in E. coli when plasmid-based overexpression of the isc operon had no effect.66 To our knowledge, this expression system has not been used for the in vitro purification of GRE-AEs previously. SDS-PAGE analyses of the lysates from overexpressions of IAD-AEs seemed to indicate that more soluble protein was present in samples derived from the iscR knockout strain, but rigorous comparisons were not performed.

5.2.4: Confirmation of the activity of O. uli IAD (OuIAD)

The predicted IAD from Olsenella uli (OuIAD) was heterologously overexpressed and purified from E. coli BL21(DE3) following protocols similar to those used previously.54 The predicted IAD-AE (OuIAD-AE) was expressed and purified under anoxic conditions from E. coli BL21(DE3) ΔiscR. Initially, combining the GRE, GRE-AE, S-adenosylmethionine (SAM), dithiothreitol, and 5-deazariboflavin did not result in glycyl radical formation as measured by

206

EPR spectroscopy. However, use of a halogen lamp to accelerate the photoreduction of the 5- deazariboflavin, as previously reported for pyruvate formate-lyase,70 did generate a glycyl radical on the GRE (Figure 5.4). The increased light from the halogen lamp likely leads to more rapid one-electron reduction of the [4Fe–4S]2+ in OuIAD-AE, which is necessary for glycyl radical formation in OuIAD-AE. Reductants besides 5-deazariboflavin have not been tested, and additional screening may indicate other reductants can more efficiently reduce OuIAD-AE.

Reconstitution of [4Fe–4S] clusters of OuIAD-AE had no effect on EPR signal or catalytic activity.

g = 2.0035

Sim.

Exp.

3300 3350 3400 Field (G)

Figure 5.4: EPR spectrum of activated OuIAD. This experiment was performed in triplicate; representative and simulated spectra are shown. Spin quantification of all spectra revealed that 17 ± 7% (mean ± standard deviation) of OuIAD monomers contained glycyl radical species.

Conversion of I3A to skatole by the activated GRE was measured using HPLC. We found that the predicted IAD was able to decarboxylate I3A to skatole (Figure 5.5). As with other GREs, excluding either the GRE-AE, SAM, or the external reductant resulted in no conversion. The presence of oxygen also resulted in loss of activity due to the rapid and irreversible quenching of the glycyl radical species. Finally, OuIAD activity is dependent on

207 light from a halogen lamp, likely accelerates the rate of OuIAD-AE reduction and OuIAD activation. Together, these results confirm our hypothesis that a GRE catalyzes the

decarboxylation of I3A and indicate that we have correctly identified the IAD in O. uli.

)

% (

100

e

l

o t

a 80

k

s

o t

60

A

3

I

f 40

o

n o

i 20

s

r

e v

n 0 o y t 2 C a in E E M h s v A R A g O s a – G S i + fl L A o – – – ll b u i F r za a e D –

Figure 5.5: Skatole production by OuIAD. Samples were prepared by combining dithiothreitol (10 mM), 5-deazariboflavin (200 µM), OuIAD-AE (50 µM), OuIAD (50 µM), and S-adenosylmethionine (500 µM) in anoxic buffer containing HEPES (100 mM, pH 8.0) and NaCl (50 mM). Assay mixtures were illuminated by a 500 W halogen lamp for 1 hour prior to addition of I3A (500 µM). After 3 hours, sample mixtures were quenched and analyzed by HPLC.

These results are consistent with an independent study characterizing a similar IAD.71 In their report, Liu et al. used a similar comparative genomics strategy to identify the IAD in O. scatologenes (OsIAD). They expressed and purified OsIAD with an N-terminal His6 tag and used size exclusion chromatography to show that it exists predominantly as a homodimer, which is similar to most other GREs. OsIAD-AE was expressed and purified with an N-terminal maltose-binding protein tag and was shown to cleave SAM to 5ʹ-deoxyadenosine after being reduced by titanium(III) citrate. EPR spectroscopy demonstrated that OsIAD-AE could install a

208 glycyl radical on OsIAD, with an average 15% radical species per OsIAD monomer observed.

Activated OsIAD was found to convert I3A to skatole as detected by gas chromatography–mass spectrometry, and this enzyme was found to be highly specific for I3A and unable to decarboxylate 4-hydroxyphenylacetate or phenylacetate. OsIAD and OuIAD are highly similar

(89% shared amino acid identity), and our results with OuIAD thus far are consistent with the conclusions drawn from this study.

5.2.5: Confirmation of the activity of C. scatologenes IAD (CsIAD)

We also sought to determine if the predicted IAD in C. scatologenes (CsIAD) could decarboxylate I3A. The putative CsIAD and CsIAD-AE were heterologously overexpressed and purified in a similar manner to the O. uli analogs, except the expression hosts used were E. coli

BL21(DE3)-Gold-CodonPlus-RIL and E. coli BL21(DE3)-Gold-CodonPlus-RIL ΔiscR, respectively. The CodonPlus-RIL strains contain extra copies of tRNA genes frequently used by organisms with AT-rich genomes, including C. scatologenes. All attempts to measure glycyl radical formation and skatole production by these two enzymes were unsuccessful (Figure 5.6).

We heterologously overexpressed and purified rSAM2 in the same manner as CsIAD-AE.

Adding this enzyme to assays containing CsIAD and CsIAD-AE resulted in conversion of I3A to skatole. Control experiments omitting CsIAD, CsIAD-AE, rSAM2, and other compounds required for glycyl radical formation indicated that all assay components are necessary for activity.

209

)

% (

80

e

l

o

t a

k 60

s

o

t

A

3 40

I

f

o

n

o 20

i

s

r

e v

n 0 o 2 C y in T E 2 D D D M te a v T A M M A A A A a s a D - A A I I I r s fl D – – – S t a – A rS rS ; ; – s ll o I . 2 b u ib – – t E u F r c -A M S za a D A – a in S e t- IA r a + + D e – H

Figure 5.6: Skatole production by CsIAD. Samples were prepared by combining dithiothreitol (DTT) (10 mM), 5-deazariboflavin (200 µM), CsIAD-AE (50 µM), CsIAD (50 µM), CsrSAM2 (50 µM), and S-adenosylmethionine (500 µM) in anoxic buffer containing HEPES (100 mM, pH 8.0) and NaCl (50 mM). Assay mixtures were illuminated by a 500 W halogen lamp for 1 hour prior to addition of I3A (500 µM). After 3 hours, sample mixtures were quenched and analyzed by HPLC.

The apparent requirement of rSAM2 for I3A decarboxylation in vitro is highly unusual, as no other characterized GREs depends an additional protein of this size for activity. This result could explain why previous efforts to isolate CsIAD were unsuccessful.44 Experiments with heat- inactivated rSAM2 suggest that this enzyme must be properly folded for CsIAD activity, but its precise role in catalysis is unclear. As highlighted earlier, the small subunits of 4-HPAD and

BSS contain [4Fe–4S] clusters, and it is unknown if the additional cluster(s) predicted to be present in rSAM2 have a similar function. In addition, because OuIAD does not require any proteins aside from OuIAD-AE for activity, enzymatic I3A decarboxylation does not inherently require this extra protein.

210

5.2.6: Conclusions

In conclusion, we have identified GREs catalyzing I3A decarboxylation in two organisms. We find that while OuIAD requires only its partner AE, CsIAD appears to depend on an additional protein, rSAM2, for activity. This mechanism of skatole production is relevant to the gut microbiotas of livestock, and inhibition of this enzyme could be valuable for mitigating the effects of boar taint and ABPEE. The decarboxylation reaction performed by IAD is mechanistically distinct from the reaction performed by 4-HPAD, as I3A lacks the phenolic hydrogen atom proposed to be involved in 4-hydroxyphenylacetate decarboxylation. Further characterization of these IADs will further elucidate how GREs are able to facilitate challenging reactions as well as clarify their roles in skatole production in the gut microbiotas of livestock.

5.3: Materials and methods

5.3.1: General materials and methods

All chemicals and solvents were purchased from Sigma-Aldrich, except where otherwise noted. Olsenella uli DSM 7084T and Clostridium scatologenes ATCC 25775 strains were purchased from DSMZ. Luria-Bertani Lennox (LB) medium was purchased from Alfa Aesar.

DNA sequencing results and multiple sequence alignments were analyzed with Geneious Pro

11.0.4 (Biomatters).72 Primers were purchased from Sigma Aldrich. All restriction enzymes, polymerases, and PCR mixes were obtained from New England Biolabs. SDS-PAGE (4–15%

Tris-HCl gel, Bio-Rad) was routinely used to visualize fractions from protein purifications following staining (EZBlue Gel Staining Reagent, Sigma-Aldrich). Isopropyl β-D-1- thiogalactopyranoside (IPTG) was obtained from Teknova. Ni-NTA resin was obtained from

Qiagen.

211

Samples were made anaerobic as follows. Solids were brought into anaerobic chambers

(MBraun and Coy Laboratory Products) in perforated 1.7 mL microcentrifuge tubes. Protein solutions with volumes greater than 1 mL were made anaerobic on a Schlenk line with 20 cycles of evacuation on vacuum followed by filling with argon. Buffers and other solutions were made anaerobic in 1 to 20 mL volumes by bubbling argon through the liquid for 20 min.

5.3.2: Cloning of IAD expression vectors

I performed cloning of IAD and IAD-AE expression vectors and Beverly Fu was responsible for cloning the rSAM expression vector. The genes encoding IAD and IAD-AE were

PCR amplified from O. uli DSM 7084 and C. scatologenes ATCC 25775 genomic DNA

(DSMZ, Braunschweig, Germany), as well as the gene encoding for rSAM2 from the latter genomic DNA, using the primers shown in Table 5.1. All reactions (total volume 25 µL) contained genomic DNA (25 ng), forward primer (0.5 µM), reverse primer (0.5 µM), and

Phusion-HF Master Mix (12.5 µL). PCR parameters were as follows: initial denaturation (98 °C for 30 s); 30 cycles of denaturation (98 °C for 10 s), annealing (60 °C for 30 s), and extension

(72 °C for 90 s); and a final extension (72 °C for 10 min). The amplified inserts were analyzed by agarose gel electrophoresis and purified using an Illustra GFX PCR DNA and Gel Band

Purification Kit (GE Healthcare).

212

Table 5.1: Primers used for cloning IADs, IAD-AE, and rSAM2. Primer Name Sequence (5ʹ to 3ʹ) GCCGCGCGGCAGCCATATGGAATTTAAAAAGAATCAAA Cs-IAD-pET28a-F CAC TTTAAGAAGGAGATATACATATGGAATTTAAAAAGAAT Cs-IAD-pET29b-F CAAACAC Cs-IAD1-IAD2-R TGAACTTACCTGGACTCCAAGGGTTGG Cs-IAD1-IAD2-F GAGTCCAGGTAAGTTCAAAGACCAGCAATAAG CAGTGGTGGTGGTGGTGGTGCTCGAGTTATACTGATGTT Cs-IAD-pET28a-R CCGTATTCTG CAGTGGTGGTGGTGGTGGTGCTCGAGTGATACTGATGTT Cs-IAD-pET29b-R CCGTATTC GCCGCGCGGCAGCCATATGAAAATAACAAATAAAACAG Cs-IADAE-pET28a-F G CAGTGGTGGTGGTGGTGGTGCTCGAGTTATCCGACTATA Cs-IADAE-pET28a-R CATTTGATACC TTTAAGAAGGAGATATACATATGAAAATAACAAATAAA Cs-IADAE-pET29b-F ACAGG CAGTGGTGGTGGTGGTGGTGCTCGAGTGATCCGACTATA Cs-IADAE-pET29b-R CATTTGATAC Cs-IAD-568-F ATGCCTGATGCAGAAACCTT Cs-IAD-1216-F CTGCCAGCAGAAAATGACCT GGCCTGGTGCCGCGCGGCAGCCATATGCATCTAATGAC Cs-rSAM2-F GGTGGTAAATTATAAAGGAGAC CTCAGTGGTGGTGGTGGTGGTGCTCGAGTTATATTTTGT Cs-rSAM2-R TGGTGCCCACACATAGTGAAC Ou-IAD-pET28a-F GCCGCGCGGCAGCCATATGGAGGAGTCTCTGGTTCTCG CAGTGGTGGTGGTGGTGGTGCTCGAGCTAGAGAGCGTC Ou-IAD-pET28a-R GTACTCGGTACG CTTTAAGAAGGAGATATACATATGGAGGAGTCTCTGGTT Ou-IAD-PET29b-F CTCGAG CAGTGGTGGTGGTGGTGGTGCTCGAGGAGAGCGTCGTA Ou-IAD-PET29b-R CTCGGTACG Ou-IADAE-pET28a-F GCCGCGCGGCAGCCATATGGACGGCAAGGGCAAAGAGC CAGTGGTGGTGGTGGTGGTGCTCGAGTCAGTTGATAAC Ou-IADAE-pET28a-R GCACTCGATCCC TTTAAGAAGGAGATATACATATGGACGGCAAGGGCAAA Ou-IADAE-pET29b-F GAGC CAGTGGTGGTGGTGGTGGTGCTCGAGGTTGATAACGCA Ou-IADAE-pET29b-R CTCGATCCC Ou-IAD-538-F GAGTATTGGAAGACGCGCTG Ou-IAD-1201-F TTCATGAACCTTGCCGTTGG Ou-IAD-1802-F CGGTTGACAATCTCGGCTG

213

Restriction digests of pET28a and pET29b were made by incubating (total volume

50 µL) intact plasmid (3 µg), NdeI (5 µL), XhoI (5 µL), and 10× CutSmart buffer (New England

Biolabs) (5 µL) at 37 °C for 3 h. The plasmid digests were purified using the Illustra GFX PCR

DNA and Gel Band Purification Kit (GE Healthcare) and visualized using agarose gel electrophoresis to confirm digestion was complete.

Gibson assembly was used to ligate the PCR-amplified inserts and digested vectors together. Reactions (total volume 20 µL) contained digested vector (50 ng), insert (56 ng for

GREs, 23 ng for AEs, 31 ng for rSAM2), and Gibson Assembly Master Mix (5 µL), and were kept at 50 °C for 1 h. The assembled plasmids were used to transform chemically competent E. coli TOP10 cells (Invitrogen). The sequences of the resulting plasmids were verified by DNA sequencing (Eton Bio.).

The amino acid sequences of OuIAD (UniProt accession ID E1QXZ2), OuIAD-AE

(E1QXZ4), CsIAD (A0A0E3M8P3), CsIAD-AE (UniParc accession ID UPI0004902F0C), and rSAM2 (UniParc accession ID UPI0009E3B466) are as follows:

>OuIAD MEESLVLEMLQTGKTATWPAKNQLQESGEVVDKEVKGTPSTDRTRRMKERFMNAKCKMDM EAPIAYTKAWREHEGKPLYVRRGLAYKYMLEHLTPAIREDELITMSKTRYDRGATQVPQF ATDFMISFLTQAEDQKEEAKLYSVEGKDEAHTVEEEGWTKVGQLFSIREEEVKPMLEVLE YWKTRCVENVSDDWMKTSFPYYQDYVNAKKVGLFPGSGLHAGCDGRWIPAYDVALGGLNR VIEECREKIEKTVVTTKEVADKVFFWQGCIYACEGAIAWAHNYAVEARRLAETAVEPRKT ELLEMAERLDRVPAEAPRNFMEAVQALWTTQILVISDSLALGVSPGRWGKFLEPYYEKDL AEGRITKGQALEVMELLRIKFSTEEYITPSLWAAMASSNSFMNLAVGGLDPKTGKCTDNE IEDLILEAGINMPTPQPTLSILLSDKTTDHLAIKAAECTKAGNGYPAWFNYDMMVQHNLA MYRDEGITLEDARNCALSGCVENGLAGTGHPIAHPAFYNEGKTIELACNEGVDPRTGIKV MDGIVPIKTYEDLWDNFIKIREHFMHVYMQYWNEVVACQRDIHPKIMGSVLMHDCIESGR PVDNLGCRYNGSVTLLDSGTVNVVNGLAAIKKLVFEDHKYTWDEFKEAMDNNFGFVLGAE KGNFSMLNQEIDPEKHMKYAHIHRDVLNAPKFGNDDDFVDDIFVDLWHDYDRVTASETTY NGYRWITAALSISAHGPHGRVTGATPDGRLSGVTLCDGILSASPGTDVNGPIALIRSGVK LDPTEFASVQLNMKFHPTAIRGDEGSRNFVDFIHSYFQMGGYHVQFNIVDSKMLRDAQDH PQNYRDLMVRVAGFSAYWNELGKPIQDEVIARTEYDAL

>OuIADAE MDGKGKELRAVIFDVQSFSTHDGPGIRTNIFFKGCALRCPWCANPESQKFSPQLLYTKMK CIGCMCCARACPHGAVTAYTAPEDIERYGHVRYDRGKCDKCTTHECVDACFQEALAVSGK EMTVDDVMDKIRRDSPVFRGKGGVTVSGGDPLLYPEFLAELLGRCRDEGYSVALESELCV PTRNLETVMPFVSYYLTDCKIIDSAEHRRITGVDNEIILRNLRLIGDVCPERMCLRIPII PGYTDSDENVGGIAAFAASCQFTKINILPYHKLGVPKHERLGTVYQLPHVQPPDDGHMRH 214

LADVIEAHGIECVIN

>CsIAD MEFKKNQTPTWPPTTEEKEESGFIDREVKGQPSTERNKKIKQRYLDARLMLDPEFSILFT KKWRECDGQPVLIRHAKAYAYALENVTPSILPDELIVMQKTRYTRGAPVHLQYSQQFYPI MLSHAESLEDKKIYDIGMGGGRKHVEIKGLKQCGIYAIKDEDVQPLLDACNYWKGKCIDE SAEKFINENMPDAETFNNGYKVNMWPLSVVSIMEGRWVPAYDIIVERGLEDVINECKEHI ANTLPTTYDVAEKILFWRASIISCEAVINWAKNYAKKAREDADTETDITRKKELLNIAEM LEWVPAKPARNFMEALQSAWIGHIAVGQDCSVVGLSPGRWGQLLYPYYKKDLEKGKLTRA QVIEAMEQIRIKFSGNEYIAPRAWSAMASGNAYQHLVVGGVNKNGLPAENDLEFDILQAG INMQTIQPTLGVQVSSKTSNKLMMKAAECCKSGGGYPAFFNNDVSIQHLLIDESEEDITL EDARDVAIAGCVEIGTQGTSHGITHPAFFNEPKILEIVLNDGVDPRTNVRCYDPLGEIDS YEKLWDAWCKVESKYLKFYMDSWNYTVQMRREINPLVFSSVLMKDCIKTGRPMDENGCRY NKSVTLLNSGMVNVANSFAAIKKCVFEENLFTMDELKQSLKENFGYEKSDNRTSMLEQKR IDMKWAKIHKLCLDAPKFGNDDDYVDSIFVDLWQHYKDVVSKQTTYLGYHWVPAALSISS HGPFGRVCGATPDGRLAGVTLTDGILSATPGTDVNGPIALLNSGIKLDCTDMRSVQLNMK FHPNAVKGTEGSHHLVDLIRGYFSKGGYHIQFNIVDSKMLRDAQAHPENYRDLIVRVAGF SAYWVELGKPIQDEIIARTEYGTSV

>CsIADAE MKITNKTGTVFDIQSFSVHDGPGIRTIVFLKGCPLKCWWCSNPEGQDALPEVCYHVDKCQ HCMSCVIACKNKAIEEITELTSNEDYIKINKEKCRKCLTFDCVDACPNKGLVTWGNLKTV EDVMKYINRDISYFRKNGGVTLSGGEPLYQHEFALEILKACKEEYINTAIETTLYAPFEV IEPFIPFVDLFLCDIKQMDNSKHKEYTGVSNKIILSNICSLAQKSKNILIRIPLIPGCND DILNIKNTSKFAYDNGISRINILPYHNLGQSKYDKLGKEYKLKDTKSPEADKLEQLKKVV EEQGIKCIVG

>rSAM2 MHLMTVVNYKGDKRMKNDMMKALSQLNSEEKYESIISLTSTILIEHPDDIKVRYCLAMAY IAKGDKTKGLKHLQNLITQSLSTNKVDKLHRLFVNQVILQTVQITELTLHGGDVLGANTI QMVKEVKLYAHKLGEMDLEHSAENILKRWIHINSYNHPIVALIPDSPLTLQVEPTNACNL NCTMCPRSKMTRKVGFMDTAVFDEMLNGWKNRVIIKQVQHLIFGTTFPIIKKGSIKLFFM GEPLIHNQLDKLIESGKRAGCTVGIQTNGISLINKEVRQKLLSAKPSVIGISLDGINEMS YEAVRQGARLMDICKGLEALYKEREEMNLHRKIWIMISSIIPKWNQPSLERAQKFLEPIR PFVDHIGFIPLSRERDPKFYDENGNITLYSKQPITSVSKLQPLCVEPFTKLNVLWDGSIA PCCYDIDCDMPLGHIKDGIDNVWKSSKIKELQNALLNQDIKKYHLCSLCVGTNKI

5.3.3: Construction of expression hosts constitutively expressing the isc operon

The expression host E. coli BL21(DE3) ΔiscR was constructed as follows. The strain E. coli BL21(DE3) ΔiscR::kan was generated using P1 transduction with E. coli strain JW2515-3

(Coli Genetic Stock Center), which contains the ΔiscR777:::kan mutation, as the donor strain and E. coli BL21(DE3) as the recipient strain as previously described.66,73 The antibiotic resistance cassette was removed from E. coli BL21(DE3) ΔiscR::kan to produce E. coli

BL21(DE3) ΔiscR using the plasmid pCP20 (Coli Genetic Stock Center).69,74 Confirmation of

215 each strain’s identity was obtained by colony PCR with Phusion® HF PCR Master Mix with HF

Buffer (New England BioLabs) (Figure 5.7). Two colonies for each strain were picked and resuspended in 6 µL of nuclease-free water. PCR solutions (final volume of 10 µL) contained the following: the resuspended colony (1 µL), forward primer (Table 5.2) (0.5 µM), reverse primer

(0.5 µM), and Phusion-HF master mix (5 µL). PCR parameters were as follows: initial denaturation (95 °C for 5 min); 30 cycles of denaturation (98 °C for 10 s), annealing (60 °C for

30 s), and extension (72 °C for 60 s); and a final extension (72 °C for 10 min). The amplified inserts were analyzed by agarose gel electrophoresis to confirm that the antibiotic cassette was inserted into the iscR gene, and then that the cassette had been removed. The expression host E. coli BL21(DE3)-Gold-CodonPlus-RIL ΔiscR was generated in the same manner, except that the recipient strain used was E. coli BL21(DE3)-Gold-CodonPlus-RIL (Agilent).

Table 5.2: Primers used for colony PCR of ΔiscR expression hosts. Primer Name Sequence (5ʹ to 3ʹ) iscR-Primer-1-F TGCTATGCAATACCCCCACT iscR-Primer-2-R CTACCGGCTGGATGTACGAC

Figure 5.7: Colony PCR for characterization of ΔiscR expression hosts. Two colonies were picked for each construct and PCR was performed to confirm the identities of each E. coli strain. Lanes in the above gel are as follows: 1, 100 bp ladder (New England Biolabs); 2, KEIO collection parent strain (BW25113); 3, BL21(DE3); 4, BL21(DE3)-Gold-CodonPlus-RIL; 5, KEIO collection ΔiscR::kan (the donor strain for P1 transduction); 6, BL21(DE3) ΔiscR::kan; 7, CodonPlus- RIL ΔiscR::kan; 8, BL21 ΔiscR; 9, CodonPlus-RIL ΔiscR.

216

To prepare chemically competent cells for plasmid transformations, the appropriate E. coli strain was cultured from a single colony in 100 mL LB broth, harvested at OD600 of 0.4-0.5, and resuspended in 10 mL of sterile Transformation-Storage Solution (10% w/v PEG 6000, 5% v/v DMSO, 20 mM MgCl2 in LB broth). 100 µL aliquots were flash frozen in liquid nitrogen and stored at –80 °C for later use. Plasmid transformations were performed by adding one or more plasmid, 20 µL 5× KCM solution (0.5 M KCl, 0.15 M CaCl2, 0.25 M MgCl2), and 80 µL sterile water to a thawed competent cell aliquot of the appropriate strain. After 10 min incubation on ice, cells were heat shocked at 42 °C for 90 sec, diluted to 1 mL with LB broth, and incubated at

37 °C for 1 h before plating with sterile beads on LB agar containing appropriate antibiotics.

5.3.4: Heterologous overexpression and purification of IADs

Beverly Fu and I each performed the followed experiments. OuIAD and CsIAD were expressed and purified using the following protocol. Chemically competent E. coli BL21(DE3)

(for OuIAD) or E. coli BL21(DE3) CodonPlus-RIL (for CsIAD) were freshly transformed with the corresponding pET28a expression vector. An overnight culture of the appropriate strain was grown in LB containing 50 µg/mL kanamycin (CodonPlus-RIL strains also contained 50 µg/mL chloramphenicol). 2 L of LB medium in a 4 L Erlenmeyer baffled flask with 50 µg/mL of the appropriate antibiotic(s) was inoculated with 40 mL of the starter culture (2%). The culture was grown at 37 °C with shaking at 175 rpm to OD600 ≈ 0.6, at which point the cultures were cooled on ice for 20 min. Protein expression was induced with IPTG (500 µM), and the cultures were incubated at 16 °C with shaking at 175 rpm for 18 h. Cell pellets were harvested by centrifugation (6,730×g, 10 min), flash frozen in N2 (l), and stored at −80 °C.

All subsequent steps were performed at 4 °C unless otherwise specified. Frozen cell pellets were thawed and resuspended in lysis buffer (20 mL, 25 mM Tris-HCl, 200 mM NaCl,

217 pH 7.5). Cells were lysed by two passages through a cell disrupter (Avestin EmulsiFlex-C3) at

10,000 psi or by sonicating with a ½ inch horn at 30% amplitude for 6 min (10 s on followed by

20 s off) while being kept in an ice bath. Lysate was centrifuged (28,800×g, 30 min) to separate soluble and insoluble fractions. The soluble portion was supplemented with imidazole (5 mM final concentration) and incubated with Ni-NTA (3 mL, 30 min). The mixture was loaded onto a glass column. The column was washed with 15 column volumes of wash buffer (25 mM Tris-

HCl, 200 mM NaCl, 25 mM imidazole, pH 7.5) until no protein could be detected by Bradford

Protein Assay (Bio-Rad). Protein was eluted from the column with elution buffer (25 mM Tris-

HCl, 200 mM NaCl, 250 mM imidazole, pH 7.5) in 5 mL fractions. Fractions containing protein

(as detected by Bradford assay) were combined, transferred to a Slide-a-Lyzer™ dialysis cassette with a 20 kDa MWCO (Thermo Scientific), and dialyzed three times against 1 L of Dialysis

Buffer (25 mM Tris-HCl, 200 mM NaCl, pH 7.5). After dialysis, the solution was concentrated using a SpinX® UF 20 mL centrifugal concentrator with a 30 kDa MWCO membrane (Corning) to a protein concentration of ~350 µM. Protein concentrations were estimated with a using

NanoDrop 2000 UV-Vis Spectrophotometer (Thermo Scientific) using extinction coefficients

-1 -1 computed with Geneious as follows: OuIAD (ε280 = 149,660 M cm ) and CsIAD (ε280 =

151,150 M-1 cm-1). The concentrated protein solutions were made anaerobic on a Schlenk line with 20 cycles of evacuation on vacuum and followed by filling with argon. The protein solution was brought into an anaerobic chamber at 4 °C (Coy Laboratory Products), aliquoted into 0.5 mL cryogenic vials, and placed in 18×150 mm Hungate tubes (Chemglass). The tubes were sealed with butyl stoppers and aluminum seals, frozen in N2 (l), and stored at –80 °C.

218

5.3.5: Heterologous overexpression and purification of IAD-AEs and rSAM2

Beverly Fu was responsible for preparing rSAM2, while she and I both prepared IAD-AE at different times. The protocol used to overexpress and purify OuIAD-AE, CsIAD-AE, and

CsrSAM2 resembled a procedure used to purify the radical SAM enzymes.75 Chemically competent E. coli BL21(DE3) ∆iscR (for OuIAD-AE) or E. coli BL21(DE3)-CodonPlus-RIL

∆iscR (for CsIAD-AE and CsrSAM2) were transformed with the corresponding pET28a expression vector. Overnight cultures of the appropriate strains were grown in LB medium containing 50 µg/mL kanamycin (CodonPlus-RIL cultures also contained 50 µg/mL chloramphenicol). For each construct, 2 L of M9 medium (48 µM Na2HPO4, 330 µM KH2PO4,

34 µM NaCl, 19 µM NH4Cl, pH 7.4) in a screw-capped 2.8 L baffled Fernbach flask (Corning) was equilibrated overnight at 37 °C. The next day, the following compounds were added to the medium as filter-sterilized aqueous solutions: CaCl2 (final concentration 100 µM), MgSO4 (2 mM), glucose (20 M), (NH4)2Fe(SO4)2 (25 µM), and the appropriate antibiotics (50 µg/mL). The media was then inoculated with 40 mL of the starter culture. The culture was grown aerobically at 37 °C at 175 rpm to OD600 ≈ 0.3, at which point the following compounds were added as filter-sterilized solutions: L-cysteine hydrochloride (300 µM) and (NH4)2Fe(SO4)2 (25 µM). The culture was incubated at 37 °C with shaking at 175 rpm until OD600 ≈ 0.6, at which point the culture was cooled on ice for 20 min. After induction of protein expression with IPTG (500 µM), the flask was tightly capped and incubated for 18 h at 16 °C at 100 rpm. Cells were chilled on ice for 10 min before being moved into an anaerobic chamber at 4 °C (Coy Laboratory Products) and transferred into 250 mL polypropylene centrifuge . Cell pellets were harvested by centrifugation (6,730×g, 10 min) at 4 °C, flash frozen in N2 (l), and stored at –80 °C.

219

All subsequent purification steps were performed in an anaerobic chamber at 4 °C (Coy

Laboratory Products). Frozen cell pellets were thawed, resuspended in 20 mL anoxic lysis buffer

(25 mM Tris-HCl, 200 mM NaCl, pH 7.5) supplemented with 8 mg of chicken egg lysozyme and

5 mM dithiothreitol (DTT), and incubated for 15 min with occasional agitation. Cells were lysed by sonicating with a ½ inch horn at 30% amplitude for 6 min (10 s on followed by 20 s off). The lysate was centrifuged (28,800×g, 30 min) to separate soluble and insoluble fractions. The soluble fraction was incubated with anoxic Ni-NTA (3 mL) for 30 min with occasional manual agitation. The mixture was loaded into a glass column. The column was washed with 15 column volumes of wash buffer (25 mM Tris-HCl, 200 mM NaCl, 25 mM imidazole, pH 7.5) until no protein could be detected by a Bradford Protein Assay (Bio-Rad). Protein was eluted from the column with elution buffer (25 mM Tris-HCl, 200 mM NaCl, 250 mM imidazole, pH 7.5) in 3 mL fractions. Fractions containing protein were combined, transferred to a Slide-a-Lyzer™ dialysis cassette with a 10 kDa MWCO (Thermo Scientific), and dialyzed three times against

667 mL of dialysis buffer (25 mM Tris-HCl, 200 mM NaCl, pH 7.5). After dialysis, the solution was concentrated using a SpinX® UF 20 mL centrifugal concentrator with a 10 kDa MWCO membrane (Corning) to a protein concentration of ~200 µM. Protein concentrations were estimated with a using NanoDrop 2000 UV-Vis Spectrophotometer (Thermo Scientific) using

-1 -1 extinction coefficients computed with Geneious as follows: OuIAD-AE (ε280 = 21,890 M cm ),

-1 -1 -1 -1 CsIAD-AE (ε280 = 34,380 M cm ), and CsrSAM2 (ε280 = 50,880 M cm ). The protein solution was aliquoted into 0.5 mL cryogenic vials and placed in 18×150 mm Hungate tubes

(Chemglass). The tubes were sealed with butyl stoppers and aluminum seals, frozen in N2 (l), and stored at –80 °C.

220

5.3.6: Detection and quantification of glycyl radicals by EPR spectroscopy

Beverly Fu performed the following experiments. OuIAD was prepared for EPR spectroscopy as follows. Anoxic OuIAD and OuIAD-AE aliquots were brought into an anaerobic chamber (Coy Laboratory Products). DTT (final concentration 10 mM), 5-deazariboflavin (200

µM), OuIAD-AE (50 µM), OuIAD (50 µM) and S-adenosylmethionine (500 µM) were mixed together in buffer (100 mM HEPES, 50 mM NaCl, pH 8.0) to a final volume of 250 µL. The samples were illuminated by a 500 W halogen lamp for 1 h to photoreduce the [4Fe–4S] cluster of OuIAD-AE.70 All samples were loaded into EPR tubes with 4 mm outer diameter and 8 inch length (Wilmad LabGlass, 734-LPV-7), sealed, and frozen in N2 (l). Perpendicular mode X-band

EPR spectra were recorded on a Bruker ElexSysE500 EPR instrument as previously described.54

All EPR assays were performed in triplicate. Negative controls, in which GRE, GRE activating enzyme, or SAM was excluded, were performed and no signal was observed in any of these samples.

5.3.7: HPLC assay for detection of indole-3-acetate and skatole from enzyme assays

Beverly Fu and I each performed these experiments. Assay mixtures were prepared by combining DTT (final concentration 10 mM), 5-deazariboflavin (200 µM), OuIAD-AE or

CsIAD-AE (50 µM), OuIAD or CsIAD (50 µM), CsrSAM2 (50 µM; only with C. scatologenes constructs), and S-adenosylmethionine (500 µM) in anoxic buffer (100 mM HEPES, 50 mM

NaCl, pH 8.0) in an anaerobic chamber containing 97% N2 and 3% H2 (Coy Laboratory

Products). The samples were illuminated by a 500 W halogen lamp for 1 h prior to the potential addition of indole-3-acetate (I3A) (500 µM). After incubating at room temperature for 3 h, the 50

µL reaction mixtures were taken out of the anaerobic chamber and quenched with 17 µL of 5% trifluoroacetic acid in acetonitrile. The samples were incubated on ice for 10 min and centrifuged

221 twice at 16,100×g for 10 min. The supernatant was analyzed by HPLC on an Inspire C18 column

(5 µm particle size, 50×4.6 mm) (Dikma Technologies). 20 µL of each sample was injected onto the column. The flow rate was 1 mL min-1 using 1% TFA in water as mobile phase A and 1%

TFA in acetonitrile as mobile phase B. The column was maintained at room temperature. The following gradient was applied: 0–5 min: 25% B isocratic, 5–8 min: 25–60% B, 8–10 min: 60%

B isocratic, 10–10.5 min: 60–90% B, 10.5–15 min: 90% B isocratic, 15–15.5 min: 90–25% B,

15.5–20 min at 25% B isocratic. I3A, and 3-methylindole (skatole) were detected by measuring the absorbance at 280 nm. The I3A and skatole standards were run at 500 µM.

5.4: References

(1) Yoon, S. S.; Kim, E.-K.; Lee, W.-J. Functional genomic and metagenomic approaches to understanding gut microbiota–animal mutualism. Curr. Opin. Microbiol. 2015, 24, 38-46.

(2) McKenna, P.; Hoffmann, C.; Minkah, N.; Aye, P. P.; Lackner, A.; Liu, Z.; Lozupone, C. A.; Hamady, M.; Knight, R.; Bushman, F. D. The Macaque Gut Microbiome in Health, Lentiviral Infection, and Chronic Enterocolitis. PLoS Pathog. 2008, 4, e20.

(3) Dacheux, L.; Cervantes-Gonzalez, M.; Guigon, G.; Thiberge, J.-M.; Vandenbogaert, M.; Maufrais, C.; Caro, V.; Bourhy, H. A Preliminary Study of Viral Metagenomics of French Bat Species in Contact with Humans: Identification of New Mammalian Viruses. PLoS One 2014, 9, e87194.

(4) Lu, H.-P.; Wang, Y.-b.; Huang, S.-W.; Lin, C.-Y.; Wu, M.; Hsieh, C.-h.; Yu, H.-T. Metagenomic analysis reveals a functional signature for biomass degradation by cecal microbiota in the leaf-eating flying squirrel (Petaurista alborufus lena). BMC Genomics 2012, 13, 466.

(5) Lavery, T. J.; Roudnew, B.; Seymour, J.; Mitchell, J. G.; Jeffries, T. High Nutrient Transport and Cycling Potential Revealed in the Microbial Metagenome of Australian Sea Lion (Neophoca cinerea) Faeces. PLoS One 2012, 7, e36478.

(6) Bik, E. M.; Costello, E. K.; Switzer, A. D.; Callahan, B. J.; Holmes, S. P.; Wells, R. S.; Carlin, K. P.; Jensen, E. D.; Venn-Watson, S.; Relman, D. A. Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea. Nat. Commun. 2016, 7, 10516.

(7) Keenan, S. W.; Engel, A. S.; Elsey, R. M. The alligator gut microbiome and implications for archosaur symbioses. Sci. Rep. 2013, 3, 2877.

222

(8) Roeselers, G.; Mittge, E. K.; Stephens, W. Z.; Parichy, D. M.; Cavanaugh, C. M.; Guillemin, K.; Rawls, J. F. Evidence for a core gut microbiota in the zebrafish. ISME J. 2011, 5, 1595.

(9) Regan, T.; Barnett, M. W.; Laetsch, D. R.; Bush, S. J.; Wragg, D.; Budge, G. E.; Highet, F.; Dainat, B.; de Miranda, J. R.; Watson, M.; Blaxter, M.; Freeman, T. C. Characterisation of the British honey bee metagenome. Nat. Commun. 2018, 9, 4995.

(10) Engel, P.; Martinson, V. G.; Moran, N. A. Functional diversity within the simple gut microbiota of the honey bee. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 11002-11007.

(11) Warnecke, F.; Luginbühl, P.; Ivanova, N.; Ghassemian, M.; Richardson, T. H.; Stege, J. T.; Cayouette, M.; McHardy, A. C.; Djordjevic, G.; Aboushadi, N.; Sorek, R.; Tringe, S. G.; Podar, M.; Martin, H. G.; Kunin, V.; Dalevi, D.; Madejska, J.; Kirton, E.; Platt, D.; Szeto, E.; Salamov, A.; Barry, K.; Mikhailova, N.; Kyrpides, N. C.; Matson, E. G.; Ottesen, E. A.; Zhang, X.; Hernández, M.; Murillo, C.; Acosta, L. G.; Rigoutsos, I.; Tamayo, G.; Green, B. D.; Chang, C.; Rubin, E. M.; Mathur, E. J.; Robertson, D. E.; Hugenholtz, P.; Leadbetter, J. R. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 2007, 450, 560-565.

(12) Looft, T.; Johnson, T. A.; Allen, H. K.; Bayles, D. O.; Alt, D. P.; Stedtfeld, R. D.; Sul, W. J.; Stedtfeld, T. M.; Chai, B.; Cole, J. R.; Hashsham, S. A.; Tiedje, J. M.; Stanton, T. B. In-feed antibiotic effects on the swine intestinal microbiome. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 1691-1696.

(13) Jami, E.; Mizrahi, I. Composition and Similarity of Bovine Rumen Microbiota across Individual Animals. PLoS One 2012, 7, e33306.

(14) Crespo-Piazuelo, D.; Estellé, J.; Revilla, M.; Criado-Mesas, L.; Ramayo-Caldas, Y.; Óvilo, C.; Fernández, A. I.; Ballester, M.; Folch, J. M. Characterization of bacterial microbiota compositions along the intestinal tract in pigs and their interactions and functions. Sci. Rep. 2018, 8, 12727.

(15) Brulc, J. M.; Antonopoulos, D. A.; Berg Miller, M. E.; Wilson, M. K.; Yannarell, A. C.; Dinsdale, E. A.; Edwards, R. E.; Frank, E. D.; Emerson, J. B.; Wacklin, P.; Coutinho, P. M.; Henrissat, B.; Nelson, K. E.; White, B. A. Gene-centric metagenomics of the fiber- adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 1948-1953.

(16) Jami, E.; Israel, A.; Kotser, A.; Mizrahi, I. Exploring the bovine rumen bacterial community from birth to adulthood. ISME J. 2013, 7, 1069.

(17) Seshadri, R.; Leahy, S. C.; Attwood, G. T.; Teh, K. H.; Lambie, S. C.; Cookson, A. L.; Eloe-Fadrosh, E. A.; Pavlopoulos, G. A.; Hadjithomas, M.; Varghese, N. J.; Paez-Espino, D.; Hungate project, c.; Palevich, N.; Janssen, P. H.; Ronimus, R. S.; Noel, S.; Soni, P.; Reilly, K.; Atherly, T.; Ziemer, C.; Wright, A.-D.; Ishaq, S.; Cotta, M.; Thompson, S.; Crosley, K.; McKain, N.; Wallace, R. J.; Flint, H. J.; Martin, J. C.; Forster, R. J.; Gruninger, R. J.; McAllister, T.; Gilbert, R.; Ouwerkerk, D.; Klieve, A.; Jassim, R. A.; 223

Denman, S.; McSweeney, C.; Rosewarne, C.; Koike, S.; Kobayashi, Y.; Mitsumori, M.; Shinkai, T.; Cravero, S.; Cucchi, M. C.; Perry, R.; Henderson, G.; Creevey, C. J.; Terrapon, N.; Lapebie, P.; Drula, E.; Lombard, V.; Rubin, E.; Kyrpides, N. C.; Henrissat, B.; Woyke, T.; Ivanova, N. N.; Kelly, W. J. Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nat. Biotechnol. 2018, 36, 359- 367.

(18) Xiao, L.; Estellé, J.; Kiilerich, P.; Ramayo-Caldas, Y.; Xia, Z.; Feng, Q.; Liang, S.; Pedersen, A. Ø.; Kjeldsen, N. J.; Liu, C.; Maguin, E.; Doré, J.; Pons, N.; Le Chatelier, E.; Prifti, E.; Li, J.; Jia, H.; Liu, X.; Xu, X.; Ehrlich, S. D.; Madsen, L.; Kristiansen, K.; Rogel-Gaillard, C.; Wang, J. A reference gene catalogue of the pig gut microbiome. Nat. Microbiol. 2016, 1, 16161.

(19) Mach, N.; Berri, M.; Estellé, J.; Levenez, F.; Lemonnier, G.; Denis, C.; Leplat, J.-J.; Chevaleyre, C.; Billon, Y.; Doré, J.; Rogel-Gaillard, C.; Lepage, P. Early-life establishment of the swine gut microbiome and impact on host phenotypes. Environ. Microbiol. Rep. 2015, 7, 554-569.

(20) McCormack, U. M.; Curião, T.; Buzoianu, S. G.; Prieto, M. L.; Ryan, T.; Varley, P.; Crispie, F.; Magowan, E.; Metzler-Zebeli, B. U.; Berry, D.; O'Sullivan, O.; Cotter, P. D.; Gardiner, G. E.; Lawlor, P. G. Exploring a Possible Link between the Intestinal Microbiota and Feed Efficiency in Pigs. Appl. Environ. Microbiol. 2017, 83, e00380-17.

(21) Stewart, R. D.; Auffret, M. D.; Warr, A.; Wiser, A. H.; Press, M. O.; Langford, K. W.; Liachko, I.; Snelling, T. J.; Dewhurst, R. J.; Walker, A. W.; Roehe, R.; Watson, M. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 2018, 9, 870.

(22) Zamaratskaia, G.; Squires, E. J. Biochemical, nutritional and genetic effects on boar taint in entire male pigs. Animal 2009, 3, 1508-1521.

(23) Lundström, K.; Matthews, K. R.; Haugen, J.-E. Pig meat quality from entire males. Animal 2009, 3, 1497-1507.

(24) Patterson, R. L. S. 5α-androst-16-ene-3-one:—Compound responsible for taint in boar fat. J. Sci. Food Agric. 1968, 19, 31-38.

(25) Walstra, P. Fattening of young boars: Quantification of negative and positive aspects. Livest. Prod. Sci. 1974, 1, 187-196.

(26) Kwan, T. K.; Orengo, C.; Gower, D. B. Biosynthesis of androgens and pheromonal steroids in neonatal porcine testicular preparations. FEBS Lett. 1985, 183, 359-364.

(27) Jensen, M. T.; Cox, R. P.; Jensen, B. B. 3-Methylindole (skatole) and indole production by mixed populations of pig fecal bacteria. Appl. Environ. Microbiol. 1995, 61, 3180- 3184.

224

(28) Dehnhard, M.; Bernal-Barragan, H.; Claus, R. Rapid and accurate high-performance liquid chromatographic method for the determination of 3-methylindole (skatole) in faeces of various species. J. Chromatogr. B: Biomed. Sci. Appl. 1991, 566, 101-107.

(29) Constable, P. D.; Radostits, O. M.; Hinchcliff, K. W.; Done, S. H.; Gruenberg, W., Veterinary Medicine: A Textbook of the Diseases of Cattle, Horses, Sheep, Pigs and Goats. 11th ed.; Elsevier: St. Louis, MO, 2017.

(30) Hananeh, W. M.; Ismail, Z. B. Concurrent occurrence of acute bovine pulmonary edema and emphysema and endocardial fibroelastosis in cattle: A case history and literature review. Vet. World 2018, 11, 971-976.

(31) Carlson, J. R.; Yokoyama, M. T.; Dickinson, E. O. Induction of Pulmonary Edema and Emphysema in Cattle and Goats with 3-Methylindole. Science 1972, 176, 298-299.

(32) Benevenga, N. J.; Steele, R. D. Adverse Effects of Excessive Consumption of Amino Acids. Annu. Rev. Nutr. 1984, 4, 157-181.

(33) Bray, T. M.; Kirkland, J. B. The metabolic basis of 3-methylindole-induced pneumotoxicity. Pharmacol. Ther. 1990, 46, 105-118.

(34) Loneragan, G. H.; Gould, D. H.; Mason, G. L.; Garry, F. B.; Yost, G. S.; Lanza, D. L.; Miles, D. G.; Hoffman, B. W.; Mills, L. J. Association of 3-methyleneindolenine, a toxic metabolite of 3-methylindole, with acute interstitial pneumonia in feedlot cattle. Am. J. Vet. Res. 2001, 62, 1525-1530.

(35) Hammond, A.; Carlson; Breeze, R. Monensin and the prevention of tryptophan-induced acute bovine pulmonary edema and emphysema. Science 1978, 201, 153-155.

(36) Yokoyama, M. T.; Carlson, J. R.; Holdeman, L. V. Isolation and characteristics of a skatole-producing Lactobacillus sp. from the bovine rumen. Appl. Environ. Microbiol. 1977, 34, 837-842.

(37) Yokoyama, M. T.; Carlson, J. R. Production of Skatole and para-Cresol by a Rumen Lactobacillus sp. Appl. Environ. Microbiol. 1981, 41, 71-76.

(38) Whitehead, T. R.; Price, N. P.; Drake, H. L.; Cotta, M. A. Catabolic Pathway for the Production of Skatole and Indoleacetic Acid by the Acetogen Clostridium drakei, Clostridium scatologenes, and Swine Manure. Appl. Environ. Microbiol. 2008, 74, 1950- 1953.

(39) Doerner, K. C.; Mason, B. P.; Kridelbaugh, D.; Loughrin, J. Fe(III) stimulates 3- methylindole and 4-methylphenol production in swine lagoon enrichments and Clostridium scatologenes ATCC 25775. Lett. Appl. Microbiol. 2009, 48, 118-124.

(40) Doerner, K. C.; Cook, K. L.; Mason, B. P. 3-Methylindole production is regulated in Clostridium scatologenes ATCC 25775. Lett. Appl. Microbiol. 2009, 48, 125-132.

225

(41) Li, X.; Jensen, R. L.; Højberg, O.; Canibe, N.; Jensen, B. B. Olsenella scatoligenes sp. nov., a 3-methylindole- (skatole) and 4-methylphenol- (p-cresol) producing bacterium isolated from pig faeces. Int. J. Syst. Evol. Microbiol. 2015, 65, 1227-1233.

(42) Li, X.; Højberg, O.; Noel, S. J.; Canibe, N.; Jensen, B. B. Draft Genome Sequence of Olsenella scatoligenes SK9K4T, a Producer of 3-Methylindole (Skatole) and 4- Methylphenol (p-Cresol), Isolated from Pig Feces. Genome Announcements 2016, 4.

(43) Selvaraj, B.; Buckel, W.; Golding, B. T.; Ullmann, G. M.; Martins, B. M. Structure and Function of 4-Hydroxyphenylacetate Decarboxylase and Its Cognate Activating Enzyme. J. Mol. Microbiol. Biotechnol. 2016, 26, 76-91.

(44) Selmer, T.; Andrei, P. I. p-Hydroxyphenylacetate decarboxylase from Clostridium difficile. Eur. J. Biochem. 2001, 268, 1363-1372.

(45) Buckel, W.; Golding, B. T. Radical Enzymes in Anaerobes. Annu. Rev. Microbiol. 2006, 60, 27-49.

(46) Martins, B. M.; Blaser, M.; Feliks, M.; Ullmann, G. M.; Buckel, W.; Selmer, T. Structural Basis for a Kolbe-Type Decarboxylation Catalyzed by a Glycyl Radical Enzyme. J. Am. Chem. Soc. 2011, 133, 14666-14674.

(47) Feliks, M.; Martins, B. M.; Ullmann, G. M. Catalytic Mechanism of the Glycyl Radical Enzyme 4-Hydroxyphenylacetate Decarboxylase from Continuum Electrostatic and QC/MM Calculations. J. Am. Chem. Soc. 2013, 135, 14574-14585.

(48) Vijh, A. K.; Conway, B. E. Electrode Kinetic Aspects of the Kolbe Reaction. Chem. Rev. 1967, 67, 623-664.

(49) Beller, H. R.; Rodrigues, A. V.; Zargar, K.; Wu, Y.-W.; Saini, A. K.; Saville, R. M.; Pereira, J. H.; Adams, P. D.; Tringe, S. G.; Petzold, C. J.; Keasling, J. D. Discovery of enzymes for toluene synthesis from anoxic microbial communities. Nat. Chem. Biol. 2018, 14, 451-457.

(50) Göker, M.; Held, B.; Lucas, S.; Nolan, M.; Yasawong, M.; Rio, T. G. D.; Tice, H.; Cheng, J.-F.; Bruce, D.; Detter, J. C.; Tapia, R.; Han, C.; Goodwin, L.; Pitluck, S.; Liolios, K.; Ivanova, N.; Mavromatis, K.; Mikhailova, N.; Pati, A.; Chen, A.; Palaniappan, K.; Land, M.; Hauser, L.; Chang, Y.-J.; Jeffries, C. D.; Rohde, M.; Sikorski, J.; Pukall, R.; Woyke, T.; Bristow, J.; Eisen, J. A.; Markowitz, V.; Hugenholtz, P.; Kyrpides, N. C.; Klenk, H.-P.; Lapidus, A. Complete genome sequence of Olsenella uli type strain (VPI D76D-27CT). Stand. Genomic Sci. 2010, 3, 76-84.

(51) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403-410.

(52) Zhu, Z.; Guo, T.; Zheng, H.; Song, T.; Ouyang, P.; Xie, J. Complete genome sequence of a malodorant-producing acetogen, Clostridium scatologenes ATCC 25775T. J. Biotechnol. 2015, 212, 19-20.

226

(53) Craciun, S.; Marks, J. A.; Balskus, E. P. Characterization of Choline Trimethylamine- Lyase Expands the Chemistry of Glycyl Radical Enzymes. ACS Chem. Biol. 2014, 9, 1408-1413.

(54) Levin, B. J.; Huang, Y. Y.; Peck, S. C.; Wei, Y.; Martínez-del Campo, A.; Marks, J. A.; Franzosa, E. A.; Huttenhower, C.; Balskus, E. P. A prominent glycyl radical enzyme in human gut microbiomes metabolizes trans-4-hydroxy-L-proline. Science 2017, 355, eaai8386.

(55) Yu, L.; Blaser, M.; Andrei, P. I.; Pierik, A. J.; Selmer, T. 4-Hydroxyphenylacetate Decarboxylases: Properties of a Novel Subclass of Glycyl Radical Enzyme Systems. Biochemistry 2006, 45, 9584-9592.

(56) Shisler, K. A.; Broderick, J. B. Glycyl radical activating enzymes: Structure, mechanism, and substrate interactions. Arch. Biochem. Biophys. 2014, 546, 64-71.

(57) Heider, J.; Szaleniec, M.; Martins, B. M.; Seyhan, D.; Buckel, W.; Golding, B. T. Structure and Function of Benzylsuccinate Synthase and Related Fumarate-Adding Glycyl Radical Enzymes. J. Mol. Microbiol. Biotechnol. 2016, 26, 29-44.

(58) Broderick, J. B.; Duffus, B. R.; Duschene, K. S.; Shepard, E. M. Radical S- Adenosylmethionine Enzymes. Chem. Rev. 2014, 114, 4229-4317.

(59) Grell, T. A. J.; Goldman, P. J.; Drennan, C. L. SPASM and Twitch Domains in S- Adenosylmethionine (SAM) Radical Enzymes. J. Biol. Chem. 2015, 290, 3964-3971.

(60) Wong, K. K.; Murray, B. W.; Lewisch, S. A.; Baxter, M. K.; Ridky, T. W.; Ulissi- DeMario, L.; Kozarich, J. W. Molecular properties of pyruvate formate-lyase activating enzyme. Biochemistry 1993, 32, 14102-14110.

(61) Broderick, J. B.; Henshaw, T. F.; Cheek, J.; Wojtuszewski, K.; Smith, S. R.; Trojan, M. R.; McGhan, R. M.; Kopf, A.; Kibbey, M.; Broderick, W. E. Pyruvate Formate-Lyase- Activating Enzyme: Strictly Anaerobic Isolation Yields Active Enzyme Containing a [3Fe–4S]+ Cluster. Biochem. Biophys. Res. Commun. 2000, 269, 451-456.

(62) Zheng, L.; Cash, V. L.; Flint, D. H.; Dean, D. R. Assembly of Iron-Sulfur Clusters: Identification of an iscSUA-hscBA-fdx gene cluster from Azotobacter vinelandii. J. Biol. Chem. 1998, 273, 13264-13272.

(63) Hänzelmann, P.; Hernández, H. L.; Menzel, C.; García-Serres, R.; Huynh, B. H.; Johnson, M. K.; Mendel, R. R.; Schindelin, H. Characterization of MOCS1A, an Oxygen-sensitive Iron-Sulfur Protein Involved in Human Molybdenum Cofactor Biosynthesis. J. Biol. Chem. 2004, 279, 34721-34732.

(64) Nachin, L.; Loiseau, L.; Expert, D.; Barras, F. SufC: an unorthodox cytoplasmic ABC/ATPase required for [Fe—S] biogenesis under oxidative stress. EMBO J. 2003, 22, 427-437.

227

(65) Wecksler, S. R.; Stoll, S.; Tran, H.; Magnusson, O. T.; Wu, S.-p.; King, D.; Britt, R. D.; Klinman, J. P. Pyrroloquinoline Quinone Biogenesis: Demonstration That PqqE from Klebsiella pneumoniae Is a Radical S-Adenosyl-L-methionine Enzyme. Biochemistry 2009, 48, 10151-10161.

(66) Akhtar, M. K.; Jones, P. R. Deletion of iscR stimulates recombinant clostridial Fe–Fe hydrogenase activity and H2-accumulation in Escherichia coli BL21(DE3). Appl. Microbiol. Biotechnol. 2008, 78, 853-862.

(67) Fontecave, M.; Ollagnier-de-Choudens, S. Iron–sulfur cluster biosynthesis in bacteria: Mechanisms of cluster assembly and transfer. Arch. Biochem. Biophys. 2008, 474, 226- 237.

(68) Baba, T.; Ara, T.; Hasegawa, M.; Takai, Y.; Okumura, Y.; Baba, M.; Datsenko, K. A.; Tomita, M.; Wanner, B. L.; Mori, H. Construction of Escherichia coli K‐12 in‐frame, single‐gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2006, 2, 2006.0008.

(69) Cherepanov, P. P.; Wackernagel, W. Gene disruption in Escherichia coli: TcR and KmR cassettes with the option of Flp-catalyzed excision of the antibiotic-resistance determinant. Gene 1995, 158, 9-14.

(70) Henshaw, T. F.; Cheek, J.; Broderick, J. B. The [4Fe-4S]1+ Cluster of Pyruvate Formate- Lyase Activating Enzyme Generates the Glycyl Radical on Pyruvate Formate-Lyase: EPR-Detected Single Turnover. J. Am. Chem. Soc. 2000, 122, 8331-8332.

(71) Liu, D.; Wei, Y.; Liu, X.; Zhou, Y.; Jiang, L.; Yin, J.; Wang, F.; Hu, Y.; Nanjaraj Urs, A. N.; Liu, Y.; Ang, E. L.; Zhao, S.; Zhao, H.; Zhang, Y. Indoleacetate decarboxylase is a glycyl radical enzyme catalysing the formation of malodorant skatole. Nat. Commun. 2018, 9, 4224.

(72) Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; Thierer, T.; Ashton, B.; Meintjes, P.; Drummond, A. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647-1649.

(73) Thomason, L. C.; Costantino, N.; Court, D. L. E. coli Genome Manipulation by P1 Transduction. Curr. Protoc. Mol. Biol. 2007, 79, 1.17.1-1.17.8.

(74) Datsenko, K. A.; Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 6640-6645.

(75) Lanz, N. D.; Grove, T. L.; Gogonea, C. B.; Lee, K.-H.; Krebs, C.; Booker, S. J., Chapter Seven - RlmN and AtsB as Models for the Overproduction and Characterization of Radical SAM Proteins. In Methods Enzymol., Hopwood, D. A., Ed. Academic Press: 2012; Vol. 516, pp 125-152.

228