Structure and Mechanism of the Formylglycine-generating Enzyme

By Mason J. Appel

A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Molecular and Cell Biology in the Graduate Division of the University of California, Berkeley

Committee in charge: Professor Carolyn R. Bertozzi, Co-Chair Professor Judith P. Klinman, Co-Chair Professor Jack F. Kirsch Professor Matthew B. Francis

Fall 2017

Structure and Mechanism of the Formylglycine-Generating Enzyme © 2017 By Mason J. Appel

Abstract Structure and Mechanism of the Formylglycine-generating Enzyme By Mason J. Appel Doctor of Philosophy in Molecular and Cell Biology University of California, Berkeley Professor Carolyn R. Bertozzi, Co-Chair Professor Judith P. Klinman, Co-Chair The ability of enzymes to make rapid the conversion between chemical entities represents an enabling feature for life. The strategies by which enzymes achieve this feat are manifold in their details and physiological consequences. Though the central importance of biological catalysis has been known for well over a century, the true breadth of enzyme function is still beginning to be appreciated. A fundamental description of enzyme catalysis is essential to understanding biological processes, and is also necessary in order to realize the promise of engineered enzymes for transforming medicine and manufacturing. The formylglycine-generating enzyme (FGE) is responsible for activating sulfatase enzymes in aerobic organisms by catalyzing the post-translational modification of a cysteine residue to formylglycine. Formylglycine permits sulfatase activity by acting as a covalent cofactor. Thus, the diverse roles of sulfatases in both animal and microbial biology underscore the importance of FGE. However, early investigations made clear that this enzyme operates by a novel mechanism with a unique protein fold. In turn, elucidation of the details of this biochemical transformation have remained enigmatic for almost two decades. This dissertation describes the structural and biochemical interrogation of FGE in pursuit of understanding its catalytic mechanism. Chapter 1 places into context the physiological importance and biochemical curiosities of formylglycine and FGE. In Chapter 2, I present the first structural and spectroscopic characterization of copper-bound FGE, defining it as a novel enzyme with intriguing features related to function. Finally, Chapter 3 reveals detailed structural and kinetic proof for several critical steps of catalysis by FGE, allowing the formulation of a mechanistic framework for further investigation.

1

This dissertation is dedicated to my mother and father.

i

Structure and Mechanism of the Formylglycine-generating Enzyme

Table of Contents

List of Figures iv

List of Tables v

List of Schemes v

Acknowledgements vi

Chapter 1. Formylglycine, a posttranslationally generated residue with unique catalytic capabilities and biotechnology applications 1

Introduction 2

fGly is an essential posttranslation modification of Type I sulfatases 3

fGly participates directly in sulfate hydrolysis 5

FGE catalyzes fGly formation and is dysfunctional in MSD 8

Mammalian FGE activity is regulated by protein partners 9

FGE defines a novel protein fold 11

FGE is a unique cysteine-dependent oxidase/oxygenase 11

anSME catalyzes O2-independent fGly formation on Ser/Cys sulfatases in anaerobes 14

Some organisms have as yet unidentified fGly producing systems 16

fGly provides a genetically encoded, site-specific protein bioconjugation handle 16

Conclusion 19

References 19

Chapter 2. Structural and spectroscopic characterization of the formylglycine-generating enzyme defines a singular class of copper-dependent oxidase 29

Introduction 30

ii

Results 31

X-ray crystal structure of the copper-bound S. coelicolor FGE holoenzyme 31

Unique spectroscopic features of a transient Cu(II)-FGE complex 34

Insight into the mechanism of Cu(II)-FGE autoreduction 36

Redox preference of the resting enzyme 37

X-ray absorption spectroscopy of Cu(I)-scFGE 38

Discussion 40

Materials and Methods 46

References 50

Chapter 3. Insights into the mechanism of FGE: elucidation of a Cu(I)-coordinated enzyme-substrate complex, and discovery of an intermediate during catalysis 55

Introduction 56

Results 57

X-ray absorption spectroscopy of the enzyme substrate complex 57

Identification of selenocysteine as an alternative substrate for FGE 60

Measurement of a substrate deuterium in scFGE 64

Detection of an intermediate by stopped-flow UV/vis spectroscopy 66

Single-turnover kinetic analysis of FGE reactions 68

Discussion 70

Materials and Methods 75

References 81

Appendix 86

iii

List of Figures

Figure 1.1 Overview of Type I sulfatase structure and fGly-mediated catalysis 3

Figure 1.2 Regulation and trafficking of mammalian FGE by protein partners 10

Figure 1.3 FGE structure, overall reaction, and speculative mechanistic proposal 12

Figure 1.4 Mechanism of fGly biogenesis by the anaerobic sulfatase maturing enzyme 15

Figure 1.5 Principle and application of tag engineering 17

Figure 2.1 Crystal structure of copper-bound S. coelicolor FGE 32

Figure 2.2 Characterization of a transient Cu(II)-scFGE complex by UV/vis spectroscopy 34

Figure 2.3 The transient Cu(II)-scFGE complex is autoreduced to Cu(I)-bound scFGE 36

Figure 2.4 Oxidative dimerization of apo-FGE following Cu(II) addition 37

Figure 2.5 Single-turnover activity assays of Cu(I)- and Cu(II)-loaded scFGE 38

Figure 2.6 X-ray absorption spectra of Cu(I)-scFGE 39

Figure 2.7 A redox-dependent copper binding pathway 42

Figure 2.8 Structural comparison of diverse protein copper centers 44

Figure 2.9 Proposed mechanisms of FGE catalysis 45

Figure 3.1 Structure of potential FGE copper-oxygen complexes capable of performing H-atom abstraction from substrate 56

Figure 3.2 Structure of FGE peptide substrates used in this study 57

Figure 3.3 X-ray absorption spectra of the cuprous enzyme-substrate complex 58

Figure 3.4 The FGE copper center undergoes a shift in coordination environment upon substrate binding 60

Figure 3.5 Selenocysteine inhibits FGE activity toward the native cysteine substrate 62

Figure 3.6 pH-rate profiles for the conversion of cysteine and selenocysteine substrates to fGly 63

iv

Figure 3.7 Proposed model for catalytic discrimination between cysteine and selenocysteine substrates 64

Figure 3.8 Steady-state kinetic analysis of heavy and light FGE substrates 65

Figure 3.9 Detection of an intermediate by stopped-flow UV/vis absorption spectroscopy 67

Figure 3.10 Kinetic isotope effect observed for decay of the 420 nm intermediate 68

Figure 3.11 Single-turnover kinetic analysis of FGE substantiates the 420 nm intermediate 69

Figure 3.12 Revised mechanistic proposal for FGE 73

List of Tables

Table 2.1 Crystallography data collection and refinement statistics 33

Table 2.2 Cu(I)-scFGE EXAFS Fitting Results 40

Table 3.1 EXAFS Fitting Results for Cu(I)-scFGE and Cu(I)-scFGE-scP14 59

Table 3.2 Activity of scFGE toward cysteine and selenocysteine substrates 61

Table 3.3 Kinetic parameters of scFGE with H-scP14 and D2-scP14 substrates 65

List of Schemes

Scheme 3.1 Synthesis of Fmoc-Sec(bzl)-OH 61

Scheme 3.2 Hypothetical kinetic mechanism of FGE 70

v

Acknowledgements Completing graduate school under the mentorship of my advisor, Professor Carolyn R. Bertozzi, has been without a doubt the foremost privilege of my life. I thank her immensely, but it is inadequate; I owe Carolyn an innumerable debt of gratitude for her support over so many years. She welcomed me into her lab, and provided me with opportunities that I could have never imagined. She has been unfailingly optimistic, understanding, and inspiring. Even within my comparatively brief tenure in the lab, watching her perpetually strive, and usually reach, the next level has been remarkable. Her unparalleled intellect is matched by her penchant for facing new challenges, and her absolute dissatisfaction with the comfort offered by the status quo. Being among those who moved with the Bertozzi lab to its new home at Stanford University was an unexpected but ultimately enriching experience. In preparation for the move, Carolyn tirelessly worked to ensure that our institutional transition be as seamless as possible, and was more than sympathetic to the concerns that the move presented to myself and others. Though I missed my life at UC Berkeley, being welcomed into the community at Stanford has left me feeling very fortunate to have had two exceptional campuses to call home during graduate school. Of course, Carolyn’s fearless attitude made this possible. It is truly bittersweet to finish a momentous journey guided by, and in witness to, Carolyn’s passion, vision, and ambition. I would also like to thank my Co-Chair, Professor Judith P. Klinman for her care and mentorship beginning from our very first meeting. Pursuant to my interest in enzymes, Judith graciously invited me to join her group meetings every Monday morning. These self-effacing discussions exposed me to the profound problems of protein science, which also happen to be quite central to life and its origin. Her brilliance, enthusiasm, and creativity have, in no small part, inspired in me a deep captivation with the fundamental questions in biochemistry, and convinced me of the satisfaction achieved through pursuing their answers. Her kindness and generosity have been touching, and throughout my life, I hope to carry with me the same curiosity for Nature that she possesses. The completion of this dissertation would not have been possible without contributions from my collaborators. I thank Katlyn Meier for being a bright and talented scientist whose commitment has helped me surmount the greatest obstacles I have confronted in research. Her specialized knowledge and aptitude for applying it to new scientific problems has been remarkable. I feel very lucky to have worked with her, and under the guidance of her mentor Professor Ed Solomon. I also thank Julien Lafrance-Vanasse for his amazing contributions, and regret that our circumstances allowed only a short period to work together. His tenacity in countering the failures that frequently accompany a new endeavor lead to a hugely important breakthrough. I also acknowledge Professors Matthew Francis and Jack Kirsch, whose roles on my qualify examination and thesis committees have been invaluable. My Bertozzi lab cohort began as a group of new initiates to the 8th floor of Latimer Hall, and very soon became a group of dear friends. Our countless hours together inside, outside, and even on top of the lab have been filled with conversation, support, struggle, and celebration. So, I thank Ioannis Mountziaris, Francis Rodriguez-Rivera, David Spiciarich, and Nathan Yee. I look forward to their friendship for many years to come, and I could not have asked for a more

vi thoughtful group of people to spend these years with. I have also been proud to call many other incredible individuals friends as well as coworkers. Brian Belardi, Justin Kim, and Peyton Shieh have been shining examples of the talent, creativity, and, grit required to make advancements in research. They hold the search for knowledge, and the value of truth, in highest regard. I thank the three of them for their advice and friendship, and I will continue to strive to hold myself to the standards that they have set for themselves. Asia Avelino has been a critical advocate in helping me accomplish my goals. Her kindness and generosity as a friend have also had an enormous impact on my time in the Bertozzi lab. I also thank George Fei, my exceptional undergraduate trainee; it was a pleasure to be a small part of his academic trajectory. Many friends and colleagues have left a lasting impression on me: Doug, Lauren, Sam, Junwon, Brendan, and, Neil being among them. There is no shortage of truly superb people that I have encountered during graduate school, and those unmentioned have my utmost my respect and appreciation. Finally, I express my gratitude to my mother and father. They instilled in me the values and provided for me the opportunities that I have needed to pursue my path. Despite our great distance, my parents and sisters have remained closer to me than ever, and their belief in me has been a constant source of motivation.

vii

Chapter 1 Formylglycine, a posttranslationally generated residue with unique catalytic capabilities and biotechnology applications

Portions of this chapter were modified from: Appel, M.J.; Bertozzi C.R. ACS Chem. Biol. 2015, 10, 72-84.

1

Chapter 1. Formylglycine, a posttranslationally generated residue with unique catalytic capabilities and biotechnology applications

Introduction

Posttranslational modification (PTM) of canonical amino acid side chains is a mechanism for augmenting the chemical diversity of enzymatic catalysis. Many cofactors involved in fundamental metabolic transformations derive from protein backbone or side chain modifications.1 The novel functionalities created – redox moieties, electrophiles or metal chelators, for example – allow for catalytic mechanisms unattainable with canonical protein chemical groups. A relative newcomer to the family of PTM-derived catalytic cofactors is Cα- formylglycine (fGly, Figure 1.1A), a modification of Cys or Ser side chains that is essential for the activity of Type I sulfatases.2 Unlike the complex pathways for generating many of the peptide- and protein-derived quinocofactors,3 the biosynthesis of fGly is accomplished by only one enzymatic step. Astoundingly, Type I sulfatases achieve rate-enhancements (kcat/kuncat) of 1026 for alkylsulfate substrates, making them among the most efficient enzymes ever measured, and this can be attributed in part to the unique catalytic properties of fGly.4

Ubiquitous across all domains of life, sulfatases catalyze the hydrolysis of a vast array of natural and synthetic aryl- and alkylsulfate ester substrates. Three divergent classes of sulfatases have been identified, but the Type I family members are the most common and the only class found in eukaryotes. In aerobic organisms, Type I sulfatases become active when the formylglycine- generating enzyme (FGE) (also referred to as sulfatase-modifying factor 1, or SUMF1)5, 6 catalyzes the oxidation of cysteine to fGly. In humans, 17 sulfatases have been identified, of which 14 have been assigned specific activities in catabolism, signaling, and development.2, 7 Human sulfatases are initially translated into the endoplasmic reticulum (ER); some are retained there while others are targeted to the lysosome, the Golgi, or to the cell surface.7 Lysosomal sulfatases act on sulfated glycolipids (sulfatides) and glycosaminoglycans, and their activities are necessary for proper degradation of these glycosides. ER-resident sulfatases, most notably steroid sulfatase (STS; arylsulfatase C, ASC), regulate hormone levels by desulfation of inactive precursors such as dehydroepiandrosterone 3-sulfate and iodothyronine sulfate.2 Secreted sulfatases (Sulf1, Sulf2) modulate the sulfation level of cell-surface heparan sufate, thereby regulating signaling events critical for development and tumor progression.8 The disruption of individual sulfatases causes at least 8 pathologies in humans, including 6 lysosomal storage disorders (e.g. mucopolysaccaridoses, metachromatic leukodystrophy), the bone disease chondrodysplasia punctate type 1, and skin disorder X-linked ichthyosis.9 Deficiency in FGE causes multiple sulfatase deficiency (MSD), a fatal disorder marked by decreased activity of all sulfatases.10 Microbial sulfatases were historically thought to be utilized for scavenging environmental sulfur, but a growing body of work over the past decade has revealed a much more elaborate role in modulating endosymbiont and host-pathogen interactions by remodeling host sulfation.11 Given the breadth of research on sulfatase biology, we defer to a number of reviews for a thorough appraisal of sulfatase biochemistry and physiology,2, 12, 13, 14 the genetic basis of FGE and sulfatase disorders in humans, 9, 15, 16 and the pursuit of novel sulfatases for bioengineering applications.17

2

This review will focus the catalytic function of fGly and mechanisms by which enzymes from various organisms are thought to produce this PTM. Finally, we discuss the use of fGly’s aldehyde functionality as a chemical handle for site-specific protein chemical modification, a biotechnology application of fGly that has undergone recent commercial translation.

fGly is an essential posttranslation modification of Type I sulfatases

Type I sulfatases are the predominant mediators of sulfate ester hydrolysis in all domains of life. They are abundant, highly conserved, and require the fGly PTM for catalysis.2 Some sulfatases have been assigned defined biological substrates (e.g., sulfatases that act on the glycosaminoglycans chondroitin and heparan sulfate),16 while most, particularly from microbial sources, have not been characterized at this level of biochemical detail. Thus, many Type I sulfatases are annotated as arylsulfatases, simply because they catalyze the hydrolysis of colorimetric arylsulfate substrates in vitro.2

fGly is generated from a cysteine or serine residue embedded in a conserved active site-localized sequence motif, (C/S)XPXRXXXLTGR; cysteine predominates as the modification target in eukaryotes and aerobic microbes whereas cysteine or serine is found in anaerobes.15, 18 The original discovery of fGly was made during biochemical analysis of the fatal congenital illness Multiple Sulfatase Deficiency (MSD).19 The severe physiological defects associated with this condition reflected a combination of those associated with single sulfatase deficiencies.10 Analysis of MSD-derived cell lines revealed diminished activity across the Type I sulfatase family, despite the fact that protein levels of specific sulfatases appeared unaltered.10, 20 This observation led to speculation that MSD patients lacked a regulatory component or catalytic cofactor required for sulfatase activity. Importantly, cell-fusion complementation of MSD cells with those deficient in a single-sulfatase could restore activity of the deficient sulfatase, confirming the competence of individual MSD sulfatase genes.21 The inability of MSD cells to produce active recombinant sulfatases likewise suggested a defect in a factor required for activity of several sulfatase gene products.22

Direct biochemical characterization of human arylsulfatases A & B (ASA & ASB) revealed a tryptic peptide carrying an unknown modification. At the site where a cysteine residue was predicted, mass spectrometry analysis indicated a novel amino acid that formed an adduct with glycerol and, when treated with NaBH4, yielded a serine residue. fGly was thus identified for the first time as a PTM.19 Subsequently, MSD-derived sulfatases were shown to contain significantly lower amounts of fGly, but the complete loss of sulfatase activity is likely embryonic-lethal in humans.23 A

3

B

C

4

D

Figure 1.1. Hydration of fGly to fGly-diol, Pseudomonas aeruginosa arylsulfatase structure and active site architecture (PAS, PDB:1HDH), and proposed catalytic mechanisms of Type I sulfatase. (A) fGly is rapidly hydrated to a geminal diol, fGly-diol, and this form predominates in the sulfatase resting state. (B) fGly diol, yellow; Ca2+, green (C) Highlighted features: Ca2+, green; metal binding, magenta; substrate binding and catalytic, gray. (D) Following the binding of sulfate ester substrate, fGly-diol is activated for nucleophilic attack at sulfur by aspartate (Asp317). The sulfoenzyme is formed and desulfation may proceed by one of two pathways, hydrolysis at sulfur (SN2) or more probable, elimination from the remaining fGly-diol hydroxyl (E2), catalyzed by a histidine base (His115). Catalytic mechanism scheme adapted with alteration from reference (28).

fGly participates directly in sulfate ester hydrolysis

The correlation of fGly deficiency with loss of sulfatase activity implied a direct role for fGly in enzymatic activity. Absent further information, several modes of reactivity were proposed based on known chemistries of the aldehyde group – its electrophilicity as well as the nucleophilicity of the geminal diol formed by aldehyde hydration (Figure 1.1A). Clarity came from the analysis of several sulfatase crystal structures which support a role for fGly in covalent catalysis.24 The first sulfatase structure, that of human ASB, was solved by Guss and coworkers in 1997, and one year later the structure of ASA was solved by von Figura, Saenger and coworkers.25,26 A third human sulfatase structure, that of steroid sulfatase (ASC, STS) was later reported by Ghosh and coworkers.27 Notably, despite relatively low sequence homology, these sulfatases adopt a similar fold to alkaline phosphatases, consistent with their related catalytic functions.

The sulfatase structures resolved several uncertainties regarding fGly. First, the fGly geminal diol (fGly-diol) was detected in ASA, suggesting a role as catalytic nucleophile. Electron density

5

consistent with a sulfated fGly-diol, a putative intermediate in a covalent catalysis mechanism, was observed both in ASB and ASC.25,27 However, it is important to note that the presence of an artifactual phosphate ester instead could not be ruled out. Further evidence came from the 1.3 Å resolution structure of the P. aeruginosa arylsulfatase (PAS), which clearly showed electron density for the fGly-diol at full occupancy (Figure 1.1B).28 An earlier observation that fGly within native folded ASA and ASB is refractory to NaBH4 reduction is consistent with the presence of the fGly-diol rather than the fGly-aldehyde in the resting enzyme.19 In addition to fGly, the active sites of Type I sulfatases contain a divalent cation, usually Ca2+ but also Mg2+, and a side-chain hydrogen bonding network which coordinates fGly and substrate. The metal helps to bind and polarize the substrate, and the fGly-centered hydrogen bonding network provides for general acid-base catalysis. Three aspartates, one of which H-bonds with an fGly- diol hydroxyl, and a glutamine/asparagine constitute the metal-binding residues, leaving two open coordination sites for binding of sulfate and the second fGly-diol hydroxyl. Several acidic residues bind the substrate, and a histidine base is predicted to assist in catalysis. A concise comparison of the corresponding active site residues for seven (of eight total) fGly enzyme crystal structures is provided by Kraztner, Steinfeld and coworkers (Figure 1.1C).24 The first structure of human sulfamidase (N-sulfoglucosamine sulfohydrolase, SGSH) was recently reported and overlays closely with previous sulfatases, especially in the active site region, despite low overall sequence identity (between 19-25% across fGly enzymes that have been structurally characterized).24

The presence of an active site fGly-diol suggested possible catalytic mechanisms (Figure 1.1D). Guss (ASB)25 and von Figura (ASA)26 both proposed that one of the hydroxyls of the fGly-diol acts as a nucleophile for initial attack at the substrate sulfur atom, releasing the co- product and forming a sulfoenzyme intermediate. The details of their proposed pathways diverge thereafter, though both emphasize acid-base catalysis. In the latter case, deprotonation of the free remaining fGly-diol hydroxyl group by a conserved histidine would allow E2 elimination of 26 sulfate, generating a transient fGly-aldehyde in a scheme abbreviated “SN2-E2”. Guss and coworkers proposed instead that SN2 substitution at sulfur by hydroxide or water was responsible for the breakdown of the sulfoenzyme intermediate, which can be designated as the “SN2- SN2” pathway (Figure 1.1D).25 Consistent with the unequal roles of each hydroxyl group in the geminal diol in either proposed mechanism, only the S stereoisomer of the transesterified enzyme has been observed in crystal structures of ASB and PAS.25,29

Intuitively, one may suppose that the biosynthetic cost of installing the unusual fGly PTM is more than repaid because fGly enables a mechanism unavailable to Ser/Cys/Thr nucleophiles. Replacing cysteine with serine in recombinant sulfatases prevents the conversion to fGly by eukaryotic FGEs, and has provided an instructive probe to compare the two proposed mechanisms. In the case of ASA and ASB, the Cys-to-Ser mutant sulfatase will bind substrate and perform the initial S–O cleavage to form the serine sulfoenzyme and release the alcohol co- product. But, this adduct is a dead-end product that will not release sulfate even after extended 30 time periods. If the wild-type α-hydroxy sulfate ester intermediate were resolved by SN2-type attack of water at sulfur, and not by E2 eliminaton, the serine mutant would be expected to retain at least partial activity. Thus, the E2 pathway is presently favored. Sulfoenzyme formation is detected in only about 20% of the Cys-to-Ser sufatase mutant, however, and this complex is unable to form in crystals. These facts suggest that the fGly-diol contributes to accelerated

6

enzyme sulfation as well, either by the enhanced nucleophilicity of the diol hydroxyl group compared to a primary alcohol, or by specific hydrogen bonding features of the active site that are not replicated in the serine mutant.31 Since the serine mutant affects both breakdown and formation of the sulfoenzyme intermediate, its behavior alone is not sufficient to prove the elimination mechanism.

Further chemical and theoretical analysis has yielded a framework for understanding the significance of fGly and its likely reaction pathway. First, the uncatalyzed hydrolysis of the S–O bond of sulfate proceeds 106-fold slower than spontaneous hydrolysis of the P–O bond of 4 18 4 phosphate esters. The high barrier for the uncatalyzed reaction, with a t1/2 = 10 years, may explain the evolutionary significance of fGly’s novel catalytic strategy. Alkaline phosphatase hydrolyzes phosphate esters through a phosphoserine intermediate that releases inorganic phosphate by attack of a Zn+-activated hydroxide.32 This catalytic paradigm may be inadequate for sulfate ester S–O cleavage, or was otherwise impractical or unavailable to sulfatases during evolution, thus requiring the specialized functionality of fGly. Of the two proposed fGly catalyzed pathways, the SN2-E2 mechanism requires only one high energy barrier S–O bond cleavage. The rate of the initial SN2 step may be accelerated by the decreased pKa of the geminal 33 diol hydroxyl group (e.g., acetaldehyde hydrate has a pKa of 13.57 ) compared to a conventional hydroxyl group (e.g., 15.9334). A recent theoretical study has suggested that the initial nucleophilic attack of fGly-diol on substrate may be rate-determining, and not the subsequent 35 elimination. Perhaps inconsistent with this prediction, a measured βLG of 0 for Vmax but -0.86 for Vmax/KM for PAS suggests that the alkoxide leaving group is not involved in the rate- determining step, but only the first irreversible chemical step.36 For the desulfation step, hydrolysis of the sulfoenzyme intermediate by water in the SN2 pathway may be quite slow compared to E2, again for the reason of S–O bond cleavage. Experimentally, pre-steady state kinetics of the wild-type enzyme may permit the comparison of the relative sulfation/desulfation rates. Stereochemical characterization of reaction products can also provide valuable clues to mechanism. In principle, the SN2-E2 mechanism would result in net inversion at sulfate while SN2-SN2 would result in net retention. In an early study, prior to the discovery of fGly, interception of the sulfoenzyme intermediate generated using a chiral sulfate ester with an exogenous nucleophile produced a new sulfate ester with net retention of configuration.37 This observation is consistent with an initial SN2-like step in the sulfatase mechanism but does not inform the course of desulfation.

Other determinants of reactivity may explain the proposed preference of sulfatases for an E2 release pathway, and the differential specificity of sulfatases and phosphatases in general. A recent experimental and theoretical study by Krenske and coworkers affirms earlier theoretical 35 studies, which predict a much lower transition state barrier for E2 elimination compared to SN2 substitution to release sulfate (estimated ΔΔH‡ of 26.5 kcal/mol).36 This study estimates a ΔΔH‡ of only 12 kcal/mol for the corresponding phosphoenzyme intermediate, which may account for why fGly is found mostly in sulfatases, and not in alkaline phosphatase. Wild-type ASA is also inhibited by a prohibitively slow rate of elimination/hydrolysis of the corresponding phosphoenzyme intermediate at fGly.29 This is not an inherit characteristic of fGly enzymes, however. PAS can catalyze arylphosphate hydrolysis at a lower catalytic efficiency (kcat/KM) compared to arylsulfates (7.9 x 102 M-1 s-1 vs. 4.9 x 107 M-1 s-1), but with dramatic rate acceleration (kcat/kuncat) and catalytic proficiency ((kcat/KM)/kuncat) compared to the uncatalyzed

7

hydrolysis of arylphosphates; PAS is predicted to release phosphate by elimination as with sulfate.35, 38, 39 In the opposing case of promiscuous sulfatase activity in alkaline phosphatase, the source of large catalytic discrimination for phosphate esters over sulfate esters appears to be complex and only partially related to the difference in formal charge between the two substrates.40 Notably, a family of Type I sulfatase/alkaline phosphatase related fGly-dependent enzymes has been recently discovered and structurally characterized. It includes a phosphatase/phosphonatase from Rhizobium leguminosarum termed RlPMH,41 and an orthologous but highly promiscuous phosphate/phosphonate/sulfate/sulfonate ester hydrolase from Burkholderia caryophilli (BcPMH).42 RlPMH has a >103-fold preference for phosphonate monoesters/phosphate diesters over sulfate esters and displays burst-phase kinetics, indicating that formation of the phosphoenzyme intermediate is not rate-limiting.41 This enzyme family reflects the apparent catalytic versatility of fGly as tuned by protein context, as well as the shared ancestry of Type I sulfatases and alkaline phosphatases.

It is important to note that two other unrelated sulfatase families exist in some prokaryotes, but unlike Type I, they proceed with concomitant oxidation of the alcohol co-product to an aldehyde (Type II), or stereochemical inversion of the alcohol co-product (Type III).17 It is clear that such changes would not be suitable for the signaling or catabolic roles of eukaryotic sulfatases. Though further experimental evidence is needed to clarify remaining mechanistic uncertainties, it is clear that Type I sulfatases are among the most extraordinary biological catalysts known, with 26 4 rate enhancements (kcat/kuncat) as high as 10 , and undoubtedly warrant further studies.

FGE catalyzes fGly formation and is dysfunctional in MSD

In 2003, the gene encoding FGE (also termed sulfatase-modifying factor 1 or SUMF1) was identified as responsible for MSD by Ballabio6 and von Figura,5 using genetic and biochemical approaches, respectively. Von Figura and coworkers exploited FGE’s ability to bind tightly but not turnover a peptide containing its recognition motif with serine substituted for cysteine (SXPXR). By affinity enrichment of bovine testis extract, FGE was thus purified for the first time.5 Investigation of human and bacterial FGEs has revealed a new enzyme family defined by both a novel catalytic mechanism and protein fold. Mapping the mutations in SUMF1 alleles from MSD patients onto the hFGE crystal structure, as well as in vivo and in vitro dissection of mutant phenotypes, has provided a rationale for their deleterious effects. MSD mutations result in structural, catalytic, and intracellular stability defects.43, 44, 45

Prior to isolation of FGE, fGly formation was studied using intact microsomes and later with reticuloplasm, using an in vitro transcription/translation system.46,47 These studies indicated that fGly generation occurs cotranslationally, and prior to folding. These initial efforts suggested a dependence on thiol reductants, Ca2+ ions, and an alkaline pH optimum.48 Subsequently, it was observed that FGE activity is dependent on O2, presumably as a terminal electron acceptor (cf. 49 Figure 1.3C). Interestingly, O2 availability can limit sulfatase activity in vivo by decreasing FGE activity, and the decrease of sulfatase activity levels can mediate hypoxic response.50

8

Mammalian FGE activity is regulated by protein partners

Regulation of mammalian FGE activity occurs spatially and is mediated by protein-protein interactions in the ER and Golgi. Recent work has identified functional interactions with several proteins in the ER and along the secretory pathway including: ERp44, PDI, ERGIC-53, and pFGE among others.51, 52, 53, 54 FGE lacks a canonical ER retention sequence; instead, the N- terminal region of FGE (residues 34-88 following signal peptide cleavage) forms both non- covalent interactions and a disulfide linkage with ERp44, a trafficking protein that is the primary means for retrieval of FGE from the Golgi to the ER. In fact, this N-terminal region is also involved in activation of sulfatases by an unknown (ER-retention-independent) mechanism. The region contains a cysteine pair (Cys50 and Cys52) that form intermolecular disulfide bonds both with Erp44 and other FGE molecules. Only Cys52 is necessary for FGE activity in vivo, but not in vitro, 54 while both cysteines are dispensable for retrieval by ERp44 through noncovalent complex formation,51 though an alternate mutational study disputes this finding (Figure 1.2A).52

Protein disulfide isomerase (PDI) is a well-known ER-resident chaperone that allows nascent polypeptides to reach their preferred disulfide configuration by catalyzing fast exchange with its own redox active disulfide.55 PDI appears to be involved both in retaining FGE in the ER and promoting activity, and these functions are dependent on catalytically active PDI (Figure 1.2B).52 ERGIC-53 is an anterograde cargo binding protein which cycles between the ER and the Golgi.52 It functions to traffic FGE from the ER to the cis-Golgi, and also seems to protect it from proteasomal degradation (Figure 1.2C). Once in the cis-Golgi, most FGE is retrieved to the ER via ERp44, with the remaining fraction secreted from the cell.52 It is possible that competition with other ERp44 substrates may regulate FGE secretion by this process. Downstream in the secretory pathway, a fraction of FGE is N-terminally truncated and inactivated by furin and furin-like proteases (Figure 1.2D). Approximately 20-30% of secreted FGE avoids proteolytic cleavage and remains active.56 Secreted FGE can traffic back to the ER from the cell surface. Indeed, tissue-specific expression of FGE in FGE-null mice has been shown to activate sulfatases in non-transduced tissues, in this way acting as a paracrine agent. Uptake is dependent in part on the mannose receptor and FGE N-glycosylation (Figure 1.2D-F).57 Taken together, a model for the localization and activity of FGE has emerged (Figure 1.2). However, it remains of great interest to understand the significance of FGE secretion and reuptake to the ER, as this offers a route for enzyme replacement therapy for treatment of MSD.

9

Figure 1.2. Regulation and trafficking of mammalian FGE by protein partners. (A) ERp44 binds the N-terminal extension of FGE and delivers it back to the ER. (B) Protein disulfide isomerase (PDI) retains FGE in the ER through binding, and increases catalytic activity, perhaps by acting as a disulfide reductant. (C) ERGIC-53 transports FGE to the Golgi and prevents degradation by the proteasome. (D) A portion of FGE is trafficked to the cell surface and secreted through the secretory pathway, furin and other proteases cleave and thereby inactivate 70-80% of secreted FGE. (E) Secreted FGE binds mannose receptor (MR) through its N-glycan. (F) Extracellular FGE can be trafficked back to the ER in an active state. (G) pFGE interacts with FGE for an unknown purpose, but may chaperone unfolded sulfatases or mediate their interaction with FGE. Finally, the paralog of FGE (pFGE, encoded by the gene SUMF2) was identified by homology shortly after SUMF1 was identified, and its precise role is the subject of ongoing investigation.6 pFGE displays strong sequence and structural conservation (48% identity) with hFGE, is localized to the ER, and is transcribed at comparable levels to FGE in human tissues, but pFGE lacks FGE activity and is different by several features.58, 59, 60 pFGE lacks the active site cysteine

10

pair and functionally critical N-terminal region of FGE. Instead, pFGE has a C-terminal PGEL sequence, which is a noncanonical variant of the ER retention signal KDEL. Based on co- immunoprecipitation data, pFGE forms heterodimers and higher order oligomers with FGE and also interacts with sulfatases, though direct physical evidence for pFGE interactions has remained elusive (Figure 1.2G). Curiously, pFGE overexpression decreases sulfatase activation in vivo in a dose-dependent manner, but does not increase FGE retention in the ER. The meaning of these results is unclear, but it is possible that in appropriate stoichiometry, pFGE may act as a sulfatase-specific chaperone or otherwise mediate interaction of FGE with sulfatases.58, 60

FGE defines a novel protein fold

The first series of crystal structures of human FGE were reported in 2005 by Dierks et al., and FGE was shown to adopt a novel fold, the FGE-fold (Figure 1.3A).43, 61 Subsequently, the crystal structure of the S. coelicolor FGE (scFGE) was solved in our laboratory and overlays closely with hFGE, with an rmsd of 0.65 Å.49 The FGE-fold is defined by one domain of low secondary structure content, approximately 30% combined β-sheet and α-helix in both orthologs. hFGE binds two Ca2+ ions and scFGE just one; two structural disulfides and an N-glycosite are particular the human enzyme. A cysteine pair (C336 and C341), shown to be catalytically essential and separated by four residues in a loop, exists in a distribution of the reduced (dithiol, FGEred) and oxidized (disulfide, FGEox) forms based on both crystal structures, and by mass spectrometry (Figure 1.3A).62,43 A co-crystal structure of hFGE C336S with LCTPSRA substrate was obtained by Roeser et al in 2006 and defined a shallow substrate binding channel which can bind up to 6 substrate residues in the consensus peptide sequence in an extended conformation 63 (Figure 1.3B). Additional structures support the existence of a probable O2 binding pocket adjacent to the active site cysteines. The use of halide bound hFGE crystal structures to define the size and environment of this pocket predict that it could accommodate a negatively charged oxidative intermediate in complex with substrate.64 Homologs of FGE have been found in a class of bacteria and bacteriophage diversity-generating retroelement (DGR) variable proteins, which generate adaptive protein variation through an RNA intermediate.65 With an rmsd of 0.98 Å between the catalytic region of FGE and the Treponema denticola TvpA variable region, the FGE-fold has since been classified as a sub-type of the C-type lectin fold.66

FGE is a unique cysteine-dependent oxidase/oxygenase

Many biochemical transformations in both primary and secondary metabolism across aerobic organisms harness the reduction potential of O2. Reduction of molecular oxygen is thermodynamically favorable, but there is a high barrier to its reaction with singlet state organic compounds due to its triplet ground state.67 Most oxidases/oxygenases require reductants of 2+ 2+ sufficient strength, typically paramagnetic transition metals (mainly Fe & Cu ), and/or organic cofactors (flavins, pterins, quinones) which generate a stabilized radical following single electron transfer.68 Subsequent one or two electron transfer events are very favorable and rapid, producing potent O2-derived oxidants that may add in to substrate (oxygenase) or become further reduced to H2O or H2O2 (oxidase). Cofactorless oxidases/oxygenases have also been identified. They typically promote the direct oxidation of substrates, without the intermediacy of a metal or

11 cofactor, by targeting O2 to a stabilized carbanion at a specific position of an aryl substrate in a manner comparable to flavin reactivity.69

A

B

12

C

D

Figure 1.3. FGE structure, overall reaction, and speculative mechanistic proposal. (A) Crystal structure of human FGE (PDB:1Y1E), with active site cysteines (Cys336, Cys341; engaged in partial disulfide bond), the essential serine (Ser333), and Ca2+ ions (green) highlighted. (B) Close-up of human FGE Cys336S (PDB:2AFY) in complex with substrate as Cys341-LCTPSRA mixed disulfide, with substrate binding residues highlighted. (C) Overall reaction catalyzed by - FGE on generic CTPSR substrate. Given the 2e oxidation of cysteine to formylglycine, O2 is - reduced to H2O2. However, as has been proposed, the concomitant input of an additional 2e and protons would result in the production of 2 H2O instead. (D) Abbreviated summary of proposed 63 mechanism provided by Roeser et al. FGEox binds substrate through thiol-disulfide exchange with substrate at Cys341. Cys336 then reacts with O2 to generate a peroxysulfenic acid. Decomposition of this intermediate, and perhaps the input of 2 external electrons, would regenerate FGEox, release 1 H2O, and form a substrate cysteine . Base-catalyzed elimination of H2O from substrate sulfenic acid produces a transient thioaldehyde, which gives fGly upon hydrolysis.

FGE does not possess a canonical metal or organic redox cofactor and thus has been categorized as a cofactorless oxidase/oxygenase. But, there is no evidence that FGE utilizes any variation of the strategies above to catalyze reaction with O2. Instead, mechanistic proposals center on the essential redox-active C336/C341 pair located in the active site. The lability of these cysteines to irreversible oxidation by air or H2O2, creating cysteine sulfinic and sulfonic acids, suggests that 49,43 they may mediate reaction of O2 with the substrate cysteine side chain. Mutation of either of

13

these cysteines abrogates activity, and they are positioned closely to the substrate cysteine in a cocrystal structure where Cys336 (hFGE) is mutated to serine (Figure 1.3B).63 The isolation of this complex, and the observation of partially oxidized cysteine side chains in various crystal structures, formed the basis of the proposed mechanism shown in Figure 1.3D. This scheme invokes a disulfide bonded form of FGE (FGEox) as the enzyme’s resting state. Substrate binding is then followed by disulfide exchange, forming a covalent mixed disulfide bond between enzyme and substrate. Direct reaction of a free cysteine side chain at C336 with O2 is proposed to generate a reactive intermediate, a cysteine oxoform presumably, that then targets the substrate cysteine side chain.

In both the hFGE and scFGE apo crystal structures, Cys336 (hFGE numbering) displays additional electron density that can be modeled as a sulfenic acid, peroxysulfenic acid, free peroxide or water. Ser333 is likely critical for catalysis due its role in coordinating such an intermediate. A substrate cysteine sulfenic acid may be generated through the transfer of oxidative species from one of the above reactive derivatives. Elimination of H2O from sulfenic acid would then generate a thioaldehyde intermediate, which would hydrolyze to generate fGly and H2S (Figure 1.3D). Notably, in vitro biochemical studies have shown that an external reductant is required to achieve high substrate turnover (e.g. GSH or DTT).48 Thus, although FGE activates O2 in a cofactor-independent manner, a reductant may be required to regenerate the enzyme’s active state during each catalytic cycle, or at an intermediate stage.69 It is possible that hFGE relies on PDI as a reducing agent in vivo, but there is no direct evidence for this.52

anSME catalyzes O2-independent fGly formation on Ser/Cys sulfatases in anaerobes

Given fGly’s impressive catalytic capability, it is not surprising that an O2-independent fGly generating system also exists in anaerobic and facultative anaerobic microbes. A family of radical S-adenosylmethione (RS)-dependent dehydrogenases has been identified to fill this role. AtsB from Klebsiella pneumoniae was identified as an fGly generating enzyme by Dierks and coworkers in 1999, and was previously shown to be required for high activity of recombinant Klebsiella sulfatase in E. coli.70, 71 Curiously, the native substrate of this enzyme contains Ser and not Cys in a recognition motif that is otherwise identical to the FGE-dependent sulfatase sequences. Based on biochemical and taxonomic evidence, the discovery of a Cys-sulfatase- specific ortholog from Clostridium perfringens (cpe) led to the designation of a new enzyme family, the anaerobic sulfatase-maturating enzymes (anSME).18 anSMEcpe can also oxidize Ser- type substrates,72 suggesting a similar reaction pathway for both targeted side chains.73 AtsB, whose native substrates are Ser-type, is 4-fold more active with cysteine-based substrates in vitro. This preference for Cys may originate from increased thiol compared to alcohol reactivity at one or more proposed elementary steps. Spectroscopic characterization revealed that anSME, in addition to the canonical RS [4Fe-4S] cluster, contains two auxiliary [4Fe-4S] clusters as well (Aux I and Aux II).74 Drennan, Booker and coworkers recently reported the first crystal structure of anSMEcpe with and without bound substrate (1.6-1.8 Å resolution), showing both Aux clusters in the C-terminal SPASM domain with fully occupied coordination sites, unlike the RS cluster which contains an open Fe-binding site for substrate.75

14

Mechanistic investigation of anSMEs has revealed many details of this reaction (Figure 1.4). As with all RS enzymes, a canonical [4Fe-4S]+ cluster reductively cleaves S-adenosylmethionine (SAM) to generate Met and the 5'-deoxyadenosine radical (5'-dA.). Initial H. abstraction by 5'- . dA , as evidenced by substrate deuterium labeling and stereospecific Cβ-methyl substitution, is 73, 76 performed at the pro-(S)-Cβ-hydrogen of Cys/Ser in the consensus sequence. Deprotonation of the thiol/alcohol Cβ radical by an asparate side chain and charge donation into the radical promotes the second substrate oxidation step. The second one-electron oxidation would yield a thioaldehyde intermediate in the case of cysteine, which would hydrolyze to fGly, or would yield fGly directly in the case of serine. Based on the orientation of Aux I and Aux II in relation to the RS cluster, Aux I is the likely electron acceptor for the second oxidation step. Aux II is proximal to bulk solvent and may receive this electron from Aux I and donate it to a soluble carrier. This pathway accounts for the multiple turnover ability of anSME in the presence of flavodoxin without external reductants, meaning that recycling of the second substrate electron to reduce the RS cluster cannot occur directly from AuxI/II.75 Further experimentation will clarify the role of each Aux cluster. Despite the strong conservation between substrates of anSME and FGE, the way each enzyme binds substrate is distinct. FGE-substrate interaction hinges on a hydrophobic pocket for Pro binding and several hydrogen bond partners to Arg (in CXPXR), with only 4 of 12 H-bonds to the substrate backbone.63 However, anSME binds substrate primarily through the substrate backbone, accounting for 14 of 17 H-bonds; this explains the existence of alternative native anSME substrates containing CXAXR without Pro, though Arg side-chain binding is maintained.75

15

Figure 1.4. Mechanism of fGly biogenesis= by the anaerobic sulfatase maturing enzyme (anSME). The reduced radical S-adenosylmethionine (RS) [4Fe-4S]+ cluster first binds and reductively cleaves SAM to produce methionine (Met) and 5’-deoxyadenosyl radical (5’-dA.). . The 5’-dA abstracts the pro-(S)-Cβ-hydrogen from the conserved substrate cysteine or serine residue. Thiol deprotonation by an active site aspartyl side chain generates a transient radical anion, which is thought to be oxidized by auxiliary [4Fe-4S]2+ cluster I (Aux I). Auxiliary cluster II may oxidize Aux I and transfer the second substrate electron to a soluble carrier for regeneration of the reduced RS cluster.

Some organisms have as yet unidentified fGly producing systems

Yet a third means for producing fGly has been observed, with the responsible enzyme(s) remaining elusive. Recombinant expression of Type I sulfatases in E. coli has been shown to produce active enzymes with varying proportions of fGly modification.77 Berteau and coworkers have shown that a putative E. coli anSME ortholog was unnecessary for the conversion of recombinant sulfatases, and that an unidentified, O2-dependent, Cys-type fGly-generating enzyme must be responsible for installing the aldehyde. According to sequence data available at the time, as of 2007, approximately 10% of bacterial genomes were predicted to encode arylsulfatase genes without the co-occurrence of obvious FGE or anSME orthologs.78 This is also true for several fungi, and even for the metazoan Caenorhabditis elegans, which contains 3 sulfatase genes.12, 79 C. elegans has been shown to utilize sulfatases for heparan sulfate remodeling, as in humans.80 Thus, it is likely that these organisms all carry a related system for fGly production with important functional consequences. We believe that identification of the putative third type of fGly-generating enzyme may reveal new aspects of fGly biology.

fGly provides a genetically encoded, site-specific protein bioconjugation handle

During characterization of the sequence determinants of cotranslational FGE activity, it was reported that the minimal 5-residue motif, CXPXR, is efficiently converted when embedded in an otherwise nonnative sequence.81 This observation led us to speculate that the CXPXR sequence could be exploited as a means to genetically encode fGly residues in various proteins of interest. The aldehyde group, in addition to its unique catalytic nature within sulfatases, is also a reactive electrophile with orthogonal reactivity to canonical protein side chain functionalities. We proposed that classical carbonyl-based bioconjugation chemistries, such as oxime and hydrazine formation, could be used to covalently attach various cargo to such “aldehyde tagged” proteins in a site-specific manner. The use of FGE and its 5-residue motif to arm recombinant proteins with aldehyde “chemical handles” was first reduced to practice in E. coli expression systems.82 The limited endogenous FGE-like activity in E. coli was found to be saturated by overexpressed proteins, necessitating co-expression of ectopic FGE along with the protein of interest in order to achieve high yields of Cys-to-fGly conversion. The M. tuberculosis FGE was particularly useful in this regard, as the enzyme was found to have relaxed sequence specificity and tolerated single alanine substitutions within the CXPXR motif.83 Subsequently, the method was used to produce aldehyde tagged recombinant proteins in mammalian cell expression systems. As eukaryotic FGE is naturally ER localized, membrane-

16

associated and secreted proteins, both of which traffic through the secretory pathway, were found to be viable targets for aldehyde tagging.84, 85

As a site-specific protein modification method, the aldehyde tag has many advantages compared to other options.86, 87 It is operationally simple, requiring simple cloning of the CXPXR motif at the desired modification site and expression of the protein of interest in FGE-expressing cells. Once installed, the aldehyde can be reacted with aminooxy or hydrazide reagents to form the corresponding oxime and hydrazone conjugates, respectively (Figure 1.5A). This straightforward process has been used to site-specifically PEGylate proteins,82 to introduce biophysical probes at discrete locations for single molecule studies of protein dynamics,88, 89 to immobilize proteins and phage on solid materials,90, 91, 92 to construct artificially glycosylated proteins,93, 94 and to generate DNA-protein conjugates.95 The aldehyde tag method has also been used in conjunction with other bioconjugation reactions to generate chemically-fused protein heterodimers (Figure 1.5B).96

A

B

17

C

Figure 1.5. Principle and application of aldehyde tag engineering. (A) fGly is installed on a protein target and used as a bioconjugation handle with aminooxy, or modified Pictet-Spengler reagent functionalized molecules. (B) Bifunctional protein fusion of human growth hormone (hGH) and human IgG (hIgG) synthesized by aldehyde tagging each followed by Cu-free azide- alkyne cycloaddition (taken with permission from reference 96). (C) ADC produced site- specifically and with precise drug:antibody ratio, shown with hydrazino Pictet-Spengler ligation product, for recent example of Trastuzumab and Maytansine.102

The biopharma industry has a growing interest in chemically modified protein therapeutics. Antibody-drug conjugates (ADCs) in particular are highly represented in biopharma pipelines, largely for cancer indications.97 There is increasing awareness of the benefits of generating such constructs via site-specific attachment of drug molecules to the antibody scaffold.98, 99 This trend presents an opportunity for clinical applications of fGly-functionalized proteins. However, critical to the therapeutic performance of such constructs is stability of the drug-protein linkage. Oximes and hydrazones can undergo reversible hydrolysis at physiologically relevant pHs, an inherent liability to their use in ADCs. To address this issue, we developed a new C-C bond forming reaction of protein-associated fGly residues with alkoxyamine-functionalized indoles (1, Figure 1.5A), an adaptation of the classic Pictet-Spengler reaction that we termed the Pictet- Spengler ligation.100 A second-generation version, the hydrazino Pictet-Spengler ligation (2, Figure 1.5A), proceeds readily at neutral pH to give stable fGly conjugates.101 This chemistry has recently been employed to construct homogeneous ADCs comprising the antimitotic natural product maytansine linked to the anti-Her2 antibody Trastuzumab (Figure 1.5C).102 The use of fGly as a chemical handle for these ADCs enabled precise control over the site and stoichiometry of drug conjugation, thereby facilitating drug optimization.

18

Conclusion

Many questions remain regarding the molecular details of fGly biogenesis and its role in catalysis. But in the two decades since fGly was first discovered, sulfatases and the enzymes that activate them have become widely appreciated as lessons in enzyme evolution and catalytic strategy. As well, the orthogonal reactivity of the aldehyde group with respect to canonical protein functionality has enabled biotechnology applications of fGly that are approaching clinical translation. The scope of FGE as a tool for protein engineering has yet to be fully explored. Interesting questions abound, such as the amenability of the enzyme to targeted engineering for enhanced performance or altered substrate specificity. As well, the intriguing discovery of fGly outside the Type I sulfatase family as required for the phosphatase/phosphonatase RlPMH and multi-substrate hydrolase BcPMH, hints that fGly could be more widespread across microbial proteomes. And the search for additional FGE-like enzymes is far from over. Organisms as familiar as E. coli and C. elegans generate fGly on recombinant proteins and produce fGly- dependent sulfatases respectively, but their FGE-like machineries remain undefined. We anticipate a rich future for fundamental and applied studies of this small but powerful PTM.

References

1. Okeley, N. M.; van der Donk, W. A., Novel cofactors via post-translational modifications of enzyme active sites. Chem. Biol. 2000, 7 (7), R159-R171.

2. Hanson, S. R.; Best, M. D.; Wong, C.-H., Sulfatases: Structure, mechanism, biological activity, inhibition, and synthetic utility. Angew. Chem. Int. Ed. 2004, 43 (43), 5736- 5763.

3. Klinman, J. P.; Bonnot, F., Intrigues and intricacies of the biosynthetic pathways for the enzymatic quinocofactors: PQQ, TTQ, CTQ, TPQ, and LTQ. Chem. Rev. 2013.

4. Edwards, D. R.; Lohman, D. C.; Wolfenden, R., Catalytic proficiency: The extreme case of S–O cleaving sulfatases. J. Am. Chem. Soc. 2011, 134 (1), 525-531.

5. Dierks, T.; Schmidt, B.; Borissenko, L. V.; Peng, J.; Preusser, A.; Mariappan, M.; von Figura, K., Multiple sulfatase deficiency is caused by mutations in the gene encoding the human Cα-formylglycine generating enzyme. Cell 2003, 113 (4), 435-444.

6. Cosma, M. P.; Pepe, S.; Annunziata, I.; Newbold, R. F.; Grompe, M.; Parenti, G.; Ballabio, A., The multiple sulfatase deficiency gene encodes an essential and limiting factor for the activity of sulfatases. Cell 2003, 113 (4), 445-456.

7. Wiegmann, E. M.; Westendorf, E.; Kalus, I.; Pringle, T. H.; Lübke, T.; Dierks, T., Arylsulfatase K, a novel lysosomal sulfatase. J. Biol. Chem. 2013, 288 (42), 30019- 30028.

19

8. Buono, M.; Visigalli, I.; Bergamasco, R.; Biffi, A.; Cosma, M. P., Sulfatase modifying factor 1–mediated fibroblast growth factor signaling primes hematopoietic multilineage development. J. Exp. Med. 2010, 207 (8), 1647-1660.

9. Diez-Roux, G.; Ballabio, A., Sulfatases and human disease. Annu. Rev. Genomics Hum. Genet. 2005, 6 (1), 355-379.

10. Austin, J. H., Studies in metachromatic leukodystrophy: XII. multiple sulfatase deficiency. Arch. Neurol. 1973, 28 (4), 258-264.

11. Ulmer, J. E.; Vilén, E. M.; Namburi, R. B.; Benjdia, A.; Beneteau, J.; Malleron, A.; Bonnaffé, D.; Driguez, P.-A.; Descroix, K.; Lassalle, G.; Le Narvor, C.; Sandström, C.; Spillmann, D.; Berteau, O., Characterization of glycosaminoglycan (GAG) sulfatases from the human gut symbiont Bacteroides thetaiotaomicron reveals the first GAG- specific bacterial endosulfatase. J. Biol. Chem. 2014, 289 (35), 24289-24303.

12. Sardiello, M.; Annunziata, I.; Roma, G.; Ballabio, A., Sulfatases and sulfatase modifying factors: an exclusive and promiscuous relationship. Hum. Mol. Genet. 2005, 14 (21), 3203-17.

13. Bojarová, P.; Williams, S. J., Sulfotransferases, sulfatases and formylglycine-generating enzymes: a sulfation fascination. Curr. Opin. Chem. Biol. 2008, 12 (5), 573-581.

14. Ghosh, D., Human sulfatases: A structural perspective to catalysis. Cell. Mol. Life Sci. 2007, 64 (15), 2013-2022.

15. von Figura, K.; Schmidt, B.; Selmer, T.; Dierks, T., A novel protein modification generating an aldehyde group in sulfatases: its role in catalysis and disease. BioEssays 1998, 20 (6), 505-10.

16. Dierks, T.; Schlotawa, L.; Frese, M. A.; Radhakrishnan, K.; von Figura, K.; Schmidt, B., Molecular basis of multiple sulfatase deficiency, mucolipidosis II/III and Niemann-Pick C1 disease - Lysosomal storage disorders caused by defects of non-lysosomal proteins. Biochim. Biophys. Acta 2009, 1793 (4), 710-25.

17. Toesch, M.; Schober, M.; Faber, K., Microbial alkyl- and aryl-sulfatases: mechanism, occurrence, screening and stereoselectivities. Appl. Microbiol. Biotechnol. 2014, 98 (4), 1485-1496.

18. Berteau, O.; Guillot, A.; Benjdia, A.; Rabot, S., A new type of bacterial sulfatase reveals a novel maturation pathway in prokaryotes. J. Biol. Chem. 2006, 281 (32), 22464-22470.

19. Schmidt, B.; Selmer, T.; Ingendoh, A.; Von Figura, K., A novel amino-acid modification in sulfatases that is defective in Multiple Sulfatase Deficiency. Cell 1995, 82 (2), 271- 278.

20

20. Fluharty, A. L.; Stevens, R. L.; Davis, L. L.; Shapiro, L. J.; Kihara, H., Presence of arylsulfatase A (ARS A) in multiple sulfatase deficiency disorder fibroblasts. Am. J. Hum. Genet. 1978, 30 (3), 249-55.

21. Horwitz, A. L., Genetic complementation studies of multiple sulfatase deficiency. Proc. Natl. Acad. Sci. U. S. A. 1979, 76 (12), 6496-9.

22. Rommerskirch, W.; von Figura, K., Multiple sulfatase deficiency: catalytically inactive sulfatases are expressed from retrovirally introduced sulfatase cDNAs. Proc. Natl. Acad. Sci. U. S. A. 1992, 89 (7), 2561-2565.

23. Schlotawa, L.; Ennemann, E. C.; Radhakrishnan, K.; Schmidt, B.; Chakrapani, A.; Christen, H. J.; Moser, H.; Steinmann, B.; Dierks, T.; Gartner, J., SUMF1 mutations affecting stability and activity of formylglycine generating enzyme predict clinical outcome in multiple sulfatase deficiency. Eur. J. Hum. Genet. 2011, 19 (3), 253-61.

24. Sidhu, N. S.; Schreiber, K.; Propper, K.; Becker, S.; Uson, I.; Sheldrick, G. M.; Gartner, J.; Kratzner, R.; Steinfeld, R., Structure of sulfamidase provides insight into the molecular pathology of mucopolysaccharidosis IIIA. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2014, 70 (5), 1321-1335.

25. Bond, C. S.; Clements, P. R.; Ashby, S. J.; Collyer, C. A.; Harrop, S. J.; Hopwood, J. J.; Guss, J. M., Structure of a human lysosomal sulfatase. Structure 1997, 5 (2), 277-289.

26. Lukatela, G.; Krauss, N.; Theis, K.; Selmer, T.; Gieselmann, V.; von Figura, K.; Saenger, W., Crystal structure of human arylsulfatase A: the aldehyde function and the metal ion at the active site suggest a novel mechanism for sulfate ester hydrolysis. Biochemistry 1998, 37 (11), 3654-64.

27. Hernandez-Guzman, F. G.; Higashiyama, T.; Pangborn, W.; Osawa, Y.; Ghosh, D., Structure of human estrone sulfatase suggests functional roles of membrane association. J. Biol. Chem. 2003, 278 (25), 22989-22997.

28. Boltes, I.; Czapinska, H.; Kahnert, A.; von Bülow, R.; Dierks, T.; Schmidt, B.; von Figura, K.; Kertesz, M. A.; Usón, I., 1.3 Å Structure of arylsulfatase from Pseudomonas aeruginosa establishes the catalytic mechanism of sulfate ester cleavage in the sulfatase family. Structure 2001, 9 (6), 483-491.

29. Chruszcz, M.; Laidler, P.; Monkiewicz, M.; Ortlund, E.; Lebioda, L.; Lewinski, K., Crystal structure of a covalent intermediate of endogenous human arylsulfatase A. J. Inorg. Biochem. 2003, 96 (2–3), 386-392.

30. Recksiek, M.; Selmer, T.; Dierks, T.; Schmidt, B.; von Figura, K., Sulfatases, trapping of the sulfated enzyme intermediate by substituting the active site formylglycine. J. Biol. Chem. 1998, 273 (11), 6096-6103.

21

31. von Bulow, R.; Schmidt, B.; Dierks, T.; von Figura, K.; Uson, I., Crystal structure of an enzyme-substrate complex provides insight into the interaction between human arylsulfatase A and its substrates during catalysis. J. Mol. Biol. 2001, 305 (2), 269-77.

32. Cleland, W. W.; Hengge, A. C., Enzymatic mechanisms of phosphate and sulfate transfer. Chem. Rev. 2006, 106 (8), 3252-3278.

33. Bell, R. P.; Onwood, D. P., Acid strengths of the hydrates of formaldehyde, acetaldehyde and chloral. Trans. Faraday Soc. 1962, 58 (0), 1557-1561.

34. Murto, J., Nucleophilic reactivity of alkoxide ions toward 2,4-dinitro-fluorobenzene and the acidity of . Acta Chem. Scand. 1964, 18 (5), 1043-1053.

35. Marino, T.; Russo, N.; Toscano, M., Catalytic mechanism of the arylsulfatase promiscuous enzyme from Pseudomonas Aeruginosa. Chem. - Eur. J. 2013, 19 (6), 2185- 2192.

36. Williams, S. J.; Denehy, E.; Krenske, E. H., Experimental and theoretical insights into the mechanisms of sulfate and sulfamate ester hydrolysis and the end products of Type I sulfatase inactivation by aryl sulfamates. J. Org. Chem. 2014, 79 (5), 1995-2005.

37. Chai, C. L.; Loughlin, W. A.; Lowe, G., The stereochemical course of sulphuryl transfer catalysed by arylsulphatase II from Aspergillus oryzae. Biochem. J. 1992, 287, 805-12.

38. Olguin, L. F.; Askew, S. E.; O’Donoghue, A. C.; Hollfelder, F., Efficient catalytic promiscuity in an enzyme superfamily: An arylsulfatase shows a rate acceleration of 1013 for phosphate monoester hydrolysis. J. Am. Chem. Soc. 2008, 130 (49), 16547-16555.

39. Luo, J.; van Loo, B.; Kamerlin, S. C., Examining the promiscuous phosphatase activity of Pseudomonas aeruginosa arylsulfatase: a comparison to analogous phosphatases. Proteins 2012, 80 (4), 1211-26.

40. Andrews, L. D.; Zalatan, J. G.; Herschlag, D., Probing the origins of catalytic discrimination between phosphate and sulfate monoester hydrolysis: Comparative analysis of alkaline phosphatase and protein tyrosine phosphatases. Biochemistry 2014, 53 (43), 6811-6819.

41. Jonas, S.; van Loo, B.; Hyvönen, M.; Hollfelder, F., A new member of the alkaline phosphatase superfamily with a formylglycine nucleophile: Structural and kinetic characterisation of a phosphonate monoester hydrolase/phosphodiesterase from Rhizobium leguminosarum. J. Mol. Biol. 2008, 384 (1), 120-136.

42. van Loo, B.; Jonas, S.; Babtie, A. C.; Benjdia, A.; Berteau, O.; Hyvönen, M.; Hollfelder, F., An efficient, multiply promiscuous hydrolase in the alkaline phosphatase superfamily. Proc. Natl. Acad. Sci. U. S. A. 2010, 107 (7), 2740-2745.

43. Dierks, T.; Dickmanns, A.; Preusser-Kunze, A.; Schmidt, B.; Mariappan, M.; von Figura, K.; Ficner, R.; Rudolph, M. G., Molecular basis for multiple sulfatase deficiency and

22

mechanism for formylglycine generation of the human formylglycine-generating enzyme. Cell 2005, 121 (4), 541-552.

44. Annunziata, I.; Bouche, V.; Lombardi, A.; Settembre, C.; Ballabio, A., Multiple sulfatase deficiency is due to hypomorphic mutations of the SUMF1 gene. Hum. Mutat. 2007, 28 (9), 928.

45. Schlotawa, L.; Radhakrishnan, K.; Baumgartner, M.; Schmid, R.; Schmidt, B.; Dierks, T.; Gartner, J., Rapid degradation of an active formylglycine generating enzyme variant leads to a late infantile severe form of multiple sulfatase deficiency. Eur. J. Hum. Genet. 2013, 21 (9), 1020-1023.

46. Dierks, T.; Schmidt, B.; von Figura, K., Conversion of cysteine to formylglycine: a protein modification in the endoplasmic reticulum. Proc. Natl. Acad. Sci. U. S. A. 1997, 94 (22), 11963-8.

47. Dierks, T.; Lecca, M. R.; Schmidt, B.; von Figura, K., Conversion of cysteine to formylglycine in eukaryotic sulfatases occurs by a common mechanism in the endoplasmic reticulum. FEBS Lett. 1998, 423 (1), 61-65.

48. Fey, J.; Balleininger, M.; Borissenko, L. V.; Schmidt, B.; von Figura, K.; Dierks, T., Characterization of posttranslational formylglycine formation by luminal components of the endoplasmic reticulum. J. Biol. Chem. 2001, 276 (50), 47021-8.

49. Carlson, B. L.; Ballister, E. R.; Skordalakes, E.; King, D. S.; Breidenbach, M. A.; Gilmore, S. A.; Berger, J. M.; Bertozzi, C. R., Function and structure of a prokaryotic formylglycine-generating enzyme. J. Biol. Chem. 2008, 283 (29), 20117-20125.

50. Bhattacharyya, S.; Tobacman, J. K., Hypoxia reduces arylsulfatase B activity and silencing arylsulfatase B replicates and mediates the effects of hypoxia. PLoS One 2012, 7 (3), e33250.

51. Mariappan, M.; Radhakrishnan, K.; Dierks, T.; Schmidt, B.; von Figura, K., ERp44 mediates a thiol-independent retention of formylglycine-generating enzyme in the endoplasmic reticulum. J. Biol. Chem. 2008, 283 (10), 6375-83.

52. Fraldi, A.; Zito, E.; Annunziata, F.; Lombardi, A.; Cozzolino, M.; Monti, M.; Spampanato, C.; Ballabio, A.; Pucci, P.; Sitia, R.; Cosma, M. P., Multistep, sequential control of the trafficking and function of the multiple sulfatase deficiency gene product, SUMF1 by PDI, ERGIC-53 and ERp44. Hum. Mol. Genet. 2008, 17 (17), 2610-2621.

53. Gande, S. L.; Mariappan, M.; Schmidt, B.; Pringle, T. H.; von Figura, K.; Dierks, T., Paralog of the formylglycine-generating enzyme – retention in the endoplasmic reticulum by canonical and noncanonical signals. FEBS J. 2008, 275 (6), 1118-1130.

54. Mariappan, M.; Gande, S. L.; Radhakrishnan, K.; Schmidt, B.; Dierks, T.; von Figura, K., The non-catalytic N-terminal extension of formylglycine-generating enzyme is required

23

for its biological activity and retention in the endoplasmic reticulum. J. Biol. Chem. 2008, 283 (17), 11556-11564.

55. Wilkinson, B.; Gilbert, H. F., Protein disulfide isomerase. Biochim. Biophys. Acta 2004, 1699 (1–2), 35-44.

56. Ennemann, E. C.; Radhakrishnan, K.; Mariappan, M.; Wachs, M.; Pringle, T. H.; Schmidt, B.; Dierks, T., Proprotein convertases process and thereby inactivate formylglycine-generating enzyme. J. Biol. Chem. 2013, 288 (8), 5828-5839.

57. Zito, E.; Buono, M.; Pepe, S.; Settembre, C.; Annunziata, I.; Surace, E. M.; Dierks, T.; Monti, M.; Cozzolino, M.; Pucci, P.; Ballabio, A.; Cosma, M. P., Sulfatase modifying factor 1 trafficking through the cells: from endoplasmic reticulum to the endoplasmic reticulum. EMBO J. 2007, 26 (10), 2443-53.

58. Mariappan, M.; Preusser-Kunze, A.; Balleininger, M.; Eiselt, N.; Schmidt, B.; Gande, S. L.; Wenzel, D.; Dierks, T.; von Figura, K., Expression, localization, structural, and functional characterization of pFGE, the paralog of the Cα-formylglycine-generating enzyme. J. Biol. Chem. 2005, 280 (15), 15173-9.

59. Dickmanns, A.; Schmidt, B.; Rudolph, M. G.; Mariappan, M.; Dierks, T.; von Figura, K.; Ficner, R., Crystal structure of human pFGE, the paralog of the Calpha-formylglycine- generating enzyme. J. Biol. Chem. 2005, 280 (15), 15180-7.

60. Zito, E.; Fraldi, A.; Pepe, S.; Annunziata, I.; Kobinger, G.; Di Natale, P.; Ballabio, A.; Cosma, M. P., Sulphatase activities are regulated by the interaction of sulphatase- modifying factor 1 with SUMF2. EMBO Rep. 2005, 6 (7), 655-60.

61. Roeser, D.; Dickmanns, A.; Gasow, K.; Rudolph, M. G., De novo calcium/sulfur SAD phasing of the human formylglycine-generating enzyme using in-house data. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2005, 61 (8), 1057-1066.

62. Preusser-Kunze, A.; Mariappan, M.; Schmidt, B.; Gande, S. L.; Mutenda, K.; Wenzel, D.; von Figura, K.; Dierks, T., Molecular characterization of the human Cα- formylglycine-generating Enzyme. J. Biol. Chem. 2005, 280 (15), 14900-14910.

63. Roeser, D.; Preusser-Kunze, A.; Schmidt, B.; Gasow, K.; Wittmann, J. G.; Dierks, T.; von Figura, K.; Rudolph, M. G., A general binding mechanism for all human sulfatases by the formylglycine-generating enzyme. Proc. Natl. Acad. Sci. U. S. A. 2006, 103 (1), 81-86.

64. Roeser, D.; Schmidt, B.; Preusser-Kunze, A.; Rudolph, M. G., Probing the oxygen- binding site of the human formylglycine-generating enzyme using halide ions. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2007, 63 (5), 621-627.

65. Alayyoubi, M.; Guo, H.; Dey, S.; Golnazarian, T.; Brooks, Garrett A.; Rong, A.; Miller, Jeffery F.; Ghosh, P., Structure of the essential diversity-generating retroelement protein

24

bAvd and its functionally important interaction with reverse transcriptase. Structure 2013, 21 (2), 266-276.

66. Le Coq, J.; Ghosh, P., Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement. Proc. Natl. Acad. Sci. U. S. A. 2011, 108 (35), 14649-53.

67. Ho, R. N.; Liebman, J.; Valentine, J., Overview of the energetics and reactivity of oxygen. In Active Oxygen in Chemistry, Foote, C.; Valentine, J.; Greenberg, A.; Liebman, J., Eds. Springer Netherlands: 1995; Vol. 2, pp 1-23.

68. Klinman, J. P., How do enzymes activate oxygen without inactivating themselves? Acc. Chem. Res. 2007, 40 (5), 325-333.

69. Fetzner, S.; Steiner, R., Cofactor-independent oxidases and oxygenases. Appl. Microbiol. Biotechnol. 2010, 86 (3), 791-804.

70. Murooka, Y.; Ishibashi, K.; Yasumoto, M.; Sasaki, M.; Sugino, H.; Azakami, H.; Yamashita, M., A sulfur- and tyramine-regulated Klebsiella aerogenes operon containing the arylsulfatase (atsA) gene and the atsB gene. J. Bacteriol. 1990, 172 (4), 2131-2140.

71. Szameit, C.; Miech, C.; Balleininger, M.; Schmidt, B.; von Figura, K.; Dierks, T., The iron sulfur protein AtsB is required for posttranslational formation of formylglycine in the Klebsiella sulfatase. J. Biol. Chem. 1999, 274 (22), 15375-81.

72. Benjdia, A.; Subramanian, S.; Leprince, J.; Vaudry, H.; Johnson, M. K.; Berteau, O., Anaerobic sulfatase-maturating enzymes, first dual substrate radical S- adenosylmethionine enzymes. J. Biol. Chem. 2008, 283 (26), 17815-26.

73. Grove, T. L.; Ahlum, J. H.; Qin, R. M.; Lanz, N. D.; Radle, M. I.; Krebs, C.; Booker, S. J., Further Characterization of Cys-Type and Ser-Type Anaerobic Sulfatase Maturating Enzymes Suggests a Commonality in the Mechanism of Catalysis. Biochemistry 2013, 52 (17), 2874-2887.

74. Grove, T. L.; Lee, K. H.; St Clair, J.; Krebs, C.; Booker, S. J., In vitro characterization of AtsB, a radical SAM formylglycine-generating enzyme that contains three [4Fe-4S] clusters. Biochemistry 2008, 47 (28), 7523-38.

75. Goldman, P. J.; Grove, T. L.; Sites, L. A.; McLaughlin, M. I.; Booker, S. J.; Drennan, C. L., X-ray structure of an AdoMet radical activase reveals an anaerobic solution for formylglycine posttranslational modification. Proc. Natl. Acad. Sci. U. S. A. 2013.

76. Benjdia, A.; Leprince, J.; Sandström, C.; Vaudry, H.; Berteau, O., Mechanistic investigations of anaerobic sulfatase-maturating enzyme: Direct Cβ H-Atom abstraction catalyzed by a radical AdoMet enzyme. J. Am. Chem. Soc. 2009, 131 (24), 8348-8349.

25

77. Dierks, T.; Miech, C.; Hummerjohann, J.; Schmidt, B.; Kertesz, M. A.; von Figura, K., Posttranslational formation of formylglycine in prokaryotic sulfatases by modification of either cysteine or serine. J. Biol. Chem. 1998, 273 (40), 25560-4.

78. Benjdia, A.; Dehò, G.; Rabot, S.; Berteau, O., First evidences for a third sulfatase maturation system in prokaryotes from E. coli aslB and ydeM deletion mutants. FEBS Lett. 2007, 581 (5), 1009-1014.

79. Landgrebe, J.; Dierks, T.; Schmidt, B.; von Figura, K., The human SUMF1 gene, required for posttranslational sulfatase modification, defines a new gene family which is conserved from pro- to eukaryotes. Gene 2003, 316 (0), 47-56.

80. Townley, R. A.; Bülow, H. E., Genetic analysis of the heparan modification network in Caenorhabditis elegans. J. Biol. Chem. 2011, 286 (19), 16824-16831.

81. Frese, M. A.; Dierks, T., Formylglycine aldehyde Tag--protein engineering through a novel post-translational modification. ChemBioChem 2009, 10 (3), 425-7.

82. Carrico, I. S.; Carlson, B. L.; Bertozzi, C. R., Introducing genetically encoded into proteins. Nat. Chem. Biol. 2007, 3 (6), 321-322.

83. Rush, J. S.; Bertozzi, C. R., New aldehyde tag sequences identified by screening formylglycine generating enzymes in vitro and in vivo. J. Am. Chem. Soc. 2008, 130 (37), 12240-12241.

84. Wu, P.; Shui, W.; Carlson, B. L.; Hu, N.; Rabuka, D.; Lee, J.; Bertozzi, C. R., Site- specific chemical modification of recombinant proteins produced in mammalian cells by using the genetically encoded aldehyde tag. Proc. Natl. Acad. Sci. U. S. A. 2009, 106 (9), 3000-3005.

85. Rabuka, D.; Rush, J. S.; deHart, G. W.; Wu, P.; Bertozzi, C. R., Site-specific chemical protein conjugation using genetically encoded aldehyde tags. Nat. Protoc. 2012, 7 (6), 1052-1067.

86. Rabuka, D., Chemoenzymatic methods for site-specific protein modification. Curr. Opin. Chem. Biol. 2010, 14 (6), 790-796.

87. Rashidian, M.; Dozier, J. K.; Distefano, M. D., Enzymatic labeling of proteins: Techniques and approaches. Bioconjugate Chem. 2013, 24 (8), 1277-1294.

88. Shi, X.; Jung, Y.; Lin, L.-J.; Liu, C.; Wu, C.; Cann, I. K. O.; Ha, T., Quantitative fluorescence labeling of aldehyde-tagged proteins for single-molecule imaging. Nat. Methods 2012, 9 (5), 499-503.

89. Ghoneim, M. K.; Spies, M., Direct correlation of DNA binding and single protein domain motion via dual illumination fluorescence microscopy. Nano Lett. 2014.

26

90. Cho, H.; Jaworski, J., A portable and chromogenic enzyme-based sensor for detection of abrin poisoning. Biosens. Bioelectron. 2014, 54 (0), 667-673.

91. Kwak, E.-A.; Jaworski, J., Controlled surface immobilization of viruses via site-specific enzymatic modification. J. Mater. Chem. B 2013, 1 (28), 3486-3493.

92. Cho, H.; Jaworski, J., Enzyme directed formation of un-natural side-chains for covalent surface attachment of proteins. Colloids Surf., B 2014, 122 (0), 846-850.

93. Hudak, J. E.; Yu, H. H.; Bertozzi, C. R., Protein glycoengineering enabled by the versatile synthesis of aminooxy glycans and the genetically encoded aldehyde tag. J. Am. Chem. Soc. 2011, 133 (40), 16127-35.

94. Smith, E. L.; Giddens, J. P.; Iavarone, A. T.; Godula, K.; Wang, L.-X.; Bertozzi, C. R., Chemoenzymatic Fc glycosylation via engineered aldehyde tags. Bioconjugate Chem. 2014, 25 (4), 788-795.

95. Liang, S. I.; McFarland, J. M.; Rabuka, D.; Gartner, Z. J., A modular approach for assembling aldehyde-tagged proteins on DNA scaffolds. J. Am. Chem. Soc. 2014, 136 (31), 10850-10853.

96. Hudak, J. E.; Barfield, R. M.; de Hart, G. W.; Grob, P.; Nogales, E.; Bertozzi, C. R.; Rabuka, D., Synthesis of heterobifunctional protein fusions using copper-free click chemistry and the aldehyde tag. Angew. Chem. Int. Ed. 2012, 51 (17), 4161-4165.

97. Chari, R. V. J.; Miller, M. L.; Widdison, W. C., Antibody–drug conjugates: An emerging concept in cancer therapy. Angew. Chem. Int. Ed. 2014, 53 (15), 3796-3827.

98. Junutula, J. R.; Raab, H.; Clark, S.; Bhakta, S.; Leipold, D. D.; Weir, S.; Chen, Y.; Simpson, M.; Tsai, S. P.; Dennis, M. S.; Lu, Y.; Meng, Y. G.; Ng, C.; Yang, J.; Lee, C. C.; Duenas, E.; Gorrell, J.; Katta, V.; Kim, A.; McDorman, K.; Flagella, K.; Venook, R.; Ross, S.; Spencer, S. D.; Lee Wong, W.; Lowman, H. B.; Vandlen, R.; Sliwkowski, M. X.; Scheller, R. H.; Polakis, P.; Mallet, W., Site-specific conjugation of a cytotoxic drug to an antibody improves the therapeutic index. Nat. Biotechnol. 2008, 26 (8), 925-932.

99. Agarwal, P.; Bertozzi, C. R., Site-specific antibody-drug conjugates: the nexus of bioorthogonal chemistry, protein engineering, and drug development. Bioconjugate Chem. 2014.

100. Agarwal, P.; van der Weijden, J.; Sletten, E. M.; Rabuka, D.; Bertozzi, C. R., A Pictet- Spengler ligation for protein chemical modification. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (1), 46-51.

101. Agarwal, P.; Kudirka, R.; Albers, A. E.; Barfield, R. M.; de Hart, G. W.; Drake, P. M.; Jones, L. C.; Rabuka, D., Hydrazino-Pictet-Spengler ligation as a biocompatible method for the generation of stable protein conjugates. Bioconjugate Chem. 2013, 24 (6), 846- 851.

27

102. Drake, P. M.; Albers, A. E.; Baker, J.; Banas, S.; Barfield, R. M.; Bhat, A. S.; de Hart, G. W.; Garofalo, A. W.; Holder, P.; Jones, L. C.; Kudirka, R.; McFarland, J.; Zmolek, W.; Rabuka, D., Aldehyde tag coupled with HIPS chemistry enables the production of ADCs conjugated site-specifically to different antibody regions with distinct in vivo efficacy and PK outcomes. Bioconjugate Chem. 2014, 25 (7), 1331-1341.

28

Chapter 2 Structural and spectroscopic characterization of the formylglycine-generating enzyme defines a singular class of copper-dependent oxidase

29

Chapter 2. Structural and spectroscopic characterization of the formylglycine-generating enzyme defines a singular class of copper-dependent oxidasea

Introduction

The formylglycine-generating enzyme (FGE) is responsible for the co- or posttranslational activation of Type I sulfatases in eukaryotes and aerobic microbes.1 This is accomplished by oxidation of a sulfatase active site cysteine to Cα-formylglycine (fGly), a critical cofactor in the hydrolysis of sulfate ester substrates. Previously, FGE was thought to catalyze this reaction using a novel oxidase/monooxygenase mechanism without the assistance of a metal or organic cofactor (i.e., a cofactor-independent mechanism).2 However, it has recently been reported that it is in fact a copper-dependent metalloenzyme; two active site cysteine residues have been implicated in copper binding.3-5 This evidence strongly suggests that FGE represents a novel class of monocopper oxidases.6 Despite this revelation, structural characterization of an authentic Cu- FGE complex has not been reported. Other major aspects of this transformation remain a mystery, including the manner in which reaction with O2 at the monocopper center is initiated, and how reactive intermediates are directed for oxidation of substrate to fGly. Of relevance to human health, dysfunction of FGE results in a rare congenital disease caused by global decrease of sulfatase function, Multiple Sulfatase Deficiency (MSD).7, 8 Also, owing to the promiscuity of FGE, unique biotechnology applications are accessible by the site-specific introduction of fGly into recombinant proteins as a bioconjugation handle.9 Therefore, elucidation of the of FGE is of great interest and could reveal a new paradigm of bioinorganic catalysis.

Prior to 2015, biochemical studies and X-ray crystal structures which provide mechanistic 8, 10 information regarding FGE were reported. FGE requires O2 for activity, and the 11 stoichiometry of O2 consumption was reported to be 1:1 with substrate. The active site is comprised of a critical cysteine pair separated by five residues in a structured loop. Both crystallographic and mass spectrometry data show that this pair is redox-active depending on 12, 10 applied potential and, exists in the dithiol (FGEred) and disulfide (FGEox) oxidation states. They are also sensitive to irreversible oxidation to sulfonic acid following exposure to elevated pH or H2O2 during crystallization. Curiously, electron density adjacent to Cys336 may be ambiguously modeled as H2O, sulfenic acid, or peroxysulfenic acid and is observed in crystals from both human and Streptomyces coelicolor FGE. These data suggested that the labile active- site cysteines are directly involved in catalysis. Mutation of Cys336 to Ser allows the co- crystallization of an enzyme-substrate complex in which Cys341 is engaged in a mixed disulfide bond with the substrate cysteine. Presumed to be a mimic of the Michaelis complex formed following thiol-disulfide interchange between FGEox and substrate, a speculative catalytic cycle was proposed by Roeser et al.13 However, a critical discovery by Holder et. al. in 2015 which FGE as a Cu-dependent enzyme.3 This report demonstrated both activation and stoichiometric Cu binding in S. coelicolor and human FGEs. Despite crystallographic and biochemical studies of FGE over the preceding decade, the involvement of Cu remained unknown, perhaps owing to a non-canonical Cu binding

a Katlyn K. Meier, Julien Lafrance-Vanasse, and Hyeongtaek Lim contributed to work presented in this chapter. 30

site, and the saturation of available copper during recombinant expression prior to isolation. Indeed, initial investigation of this phenomenon by Knop et. al. questioned the mechanism of 4 -17 activation by copper. Further studies confirmed high affinity (Kd ~ 10 M) copper binding dependent on both active site cysteine residues.5 Additionally, during the preparation of this chapter, publication of FGE crystal structures bound to catalytically inactive cuprous (Ag(I)) and cupric (Cd(II)) mimics have recently provided structural confirmation of a linear di-cysteine binding motif reminiscent of known Cu(I) binding proteins and chaperones.14 The authors propose that a conformational change observed between Ag(I)- and Cd(II)- bound FGE represents a catalytically relevant Cu(II)-ligated intermediate state. The native metal-bound crystal structure has remained elusive, likely owing to the difficulty of preparing stably metallated Cu-FGE crystals. Here, we present the first X-ray crystal structure of Cu-bound Streptomyces coelicolor FGE (scFGE) at a resolution of 2.2 Å. This represents the first holoenzyme structure of a copper oxidase which utilizes a linear cuprous bis(thiolate) ligand system (C272, C277) superficially similar to well-characterized Cu(I) transporters, chaperones, and binding/electron transfer proteins.15-17 Additionally, we report the first electronic absorption and electron paramagnetic resonance features of a transient cupric-FGE complex, providing evidence that FGE can bind both copper redox states with the same active site. An autoreduction phenomenon converts Cu(II)-scFGE to Cu(I)-scFGE. However, Cu(I)-scFGE alone is the catalytically competent resting enzyme. By directly interrogating Cu(I)-scFGE using X-ray absorption spectroscopy, the linear 2-coordinate bis(thiolate) model observed in crystallo was authenticated. Together, these results represent the first structural characterization of copper-bound FGE, in both redox states. Finally, we speculate on the structural and biochemical determinants that distinguish the catalytic requirements of FGE from disparate copper proteins and which inspire plausible reaction mechanisms. In light of the fascinating biology,18 biotechnology applications,19 and unique enzymology of FGE,3-5, 14 these findings provide an essential framework for expanding knowledge of biological O2 catalysis and copper metalloenzyme chemistry.

Results

X-ray crystal structure of the copper-bound S. coelicolor FGE holoenzyme Owing to its biochemical tractability and previously reported apo X-ray crystal structure,10 the S. coelicolor FGE (scFGE) was chosen for structural studies with the reconstituted holoenzyme. Using the copper activation procedure described previously,3 a crystallization screen was performed around the conditions which generated the apo enzyme crystal.10 Initial screens with pre-metallated enzyme, as well as supplementation with copper salts, failed to yield copper- containing crystals. This suggested that copper may be released during crystallization conditions, or that a disulfide active-site conformation which excluded copper entirely was a prerequisite for crystallization. However, successful crystallization of the C277A mutant (data not shown) contradicted the latter explanation. Instead, an irreversible copper dissociation mechanism accounts for the absence of bound copper among crystallization trials. Following copper binding, the extended time period of crystal growth could allow the dissociation of copper, made irreversible by oxidation of the putative cysteine ligands and yielding apo crystals only. The

31

apparent instability of Cu-FGE is consistent with the recent biochemical findings of Knop & Seebeck et. al.5 A

B

Figure 2.1. Crystal structure of copper-bound S. coelicolor FGE. (A) Cu-scFGE holoenzyme crystal structure (green) closely aligns to the apo enzyme structure (purple, PDB 2Q17), RMSD = 0.8 Å. Shown with Cu (brown sphere) and structural Ca (green sphere). (B) Copper bound at the FGE active site, ligated to Cys272 (Cu–S1 2.2 Å) and Cys277 (Cu–S2 2.3 Å), and shown with adjacent active site residues. The electron density map is overlaid as gray mesh (2Fo-Fc, 1.5 σ).

32

Table 2.1. Crystallography data collection and refinement statistics

Cu-scFGE Data collection Space group P3121 Cell dimensions a=b, c (Å) 140.0, 217.1 Resolution (Å) 46-2.11 (2.4-2.11) Completeness (%) 97.13

Refinement No. reflections 137788 Rwork / Rfree 0.2128/0.2530 No. atoms Protein 11086 Ligand/ion 10 Water 642 R.m.s. deviations Bond lengths (Å) 0.014 Bond angles (°) 1.198

With this dissociation mechanism in mind, a specific multi-solution crystal soaking procedure was employed. In brief, crystals were harvested, reduced with dithiothreitol (DTT) treatment, soaked with CuSO4, and lastly, soaked in cryoprotectant for storage in liquid N2. This sequence resulted in the first crystal structure of FGE with copper, its native and sole functional metal cofactor (Figure 2.1). Holoenzyme crystals diffracted to a resolution of 2.2 Å, and phases were determined by molecular replacement using apo-scFGE. During the completion of this work, an Ag(I)-bound FGE crystal structure from the species Thermomonospora curvata was published (PDB 5NXL).14 The coordination environment of the copper center is isostructural to the inactive Ag(I)-FGE and features a linear, 2-coordinate metal binding site comprised of both active site 16 cysteines, superficially similar to cuprous trafficking proteins (Figure 2.1B). The S1-[Cu]-S2 angle is 171°, and S–Cu bond distances are 2.2 Å and 2.3 Å for C272 and C277, respectively, with their accuracy limited by resolution. Aside from side chain rotations of Tyr44 and Arg279, separated from the active site by distances of 5-10 Å, the apo and holoenzyme structures align closely with RMSD = 0.8 Å (Figure 2.1A). Data collection and refinement statistics are listed in Table 2.1. Though the redox state of copper in crystallo was not explicitly known, Cu(I) is inferred from the 2-coordinate linear ligand geometry and the propensity for copper photoreduction during X-ray diffraction data collection at 77K.20-21 This crystal structure is particularly revelatory because this binding motif is absent from previously described copper 6 oxidases/oxygenases and must confer reactivity that is uniquely suited to the catalysis of O2- dependent fGly formation.

33

Unique spectroscopic features of a transient Cu(II)-FGE complex Upon obtaining a crystallographic model of the copper active site, structural and biochemical interrogation of solution holoenzyme was pursued using UV/vis and electron paramagnetic resonance spectroscopy (EPR). These efforts revealed the first Cu(II)-FGE complex and its characteristic UV/vis features (Figure 2.2). When Cu(II) salts were mixed with apo-scFGE in the absence of external reductants, a vivid blue color developed after several seconds. This color corresponds to at least two broad absorbance features with maxima at 348 and 380 nm. These features saturate in intensity at approximately 1:1 Cu(II):FGE, and possess a molar absorptivity of 4940 M-1 cm-1 at 380 nm (Figure 2.2A). Curiously, solutions of Cu(II)-scFGE returned to transparency after incubation for 10-60 min. Stopped-flow UV/vis spectroscopy was employed to measure the precise formation and decay kinetics of this newly reported optically-active species. Anaerobically prepared apo-scFGE was mixed with one equivalent of Cu(NO3)2, both in Tris pH 9.0 buffer (Figure 2.2B, left panel). Broad absorbance features in the region of 320-450 nm form immediately and maximize in intensity by 175 s. Features decay to baseline at approximately 17 min. But, these features are absent when an identical experiment is performed using the inactive scFGE C272A/C277A double mutant (Figure 2.2B, right panel). Therefore, Cu(II) binding occurs at the active site and forms a transient Cys272-S-Cu(II)-S-Cys277 complex analogous to that of the crystal structure. However, the presence of additional ligands from solvent or protein is not discernible from these data. The kinetics of the 380 nm absorbance feature were fit to a double-exponential (Equation 1) using nonlinear regression.

(1) Y = A1-e-k1t + Ae-k2t + C

-1 -1 The observed rate constants were k1 = 0.02 s for formation, and k2 = 0.001 s for decay, at 100 µM Cu(II)-scFGE.

A

34

B

Figure 2.2. Characterization of a transient Cu(II)-scFGE complex by UV/vis spectroscopy. (A) Titration of 300 µM apo-scFGE with Cu(II) by addition of increasing molar equivalents of Cu(NO3)2 followed by immediate measurement (left panel). The apparent molar absorptivity of -1 -1 the Cu(II)-scFGE complex is ε380 = 4940 M cm (right panel). (B) Kinetics of the Cu(II)- scFGE complex measured by stopped-flow UV/vis spectroscopy under anaerobic conditions. Solutions containing 200 µM of the apo enzyme were mixed with 200 µM Cu(NO3)2, both in 50 mM Tris 9.0, 0.5 M NaCl, 10% glycerol. The WT enzyme (left panel) exhibits absorbance maxima 348 nm and 380 nm. Growth of these features from 1 s (black) to 175 s (red) is indicated by gray spectra. Disappearance of these features from 175 s (red) to 1000 s (blue) is indicated by green spectra. The active-site double mutant FGE C272A/C277A lacked Cu(II)-dependent absorbance features (right panel). All spectra were collected at 4 °C.

Given its instability, the Cys272-S-Cu(II)-S-Cys277 complex was reasoned to decay by either – release of bound copper to solution, or 1e reduction to form an optically transparent Cys272-S- Cu(I)-S-Cys277 complex. To address the validity of the former mechanism, a copper binding time course was examined. One equivalent of Cu(II) was combined manually with apo-scFGE under anaerobic conditions and time-point aliquots were immediately buffer exchanged to remove unbound Cu(II). FGE samples were assayed for total copper content after buffer exchange. Over one hour, the Cu:FGE ratio declined from 0.99 to 0.9 (Figure 2.3A). This decrease was minor compared to the total loss of UV/vis signal during the analogous stopped-flow experiment (Figure 2.2B). Therefore, the disappearance of the optically-active Cys272-S-Cu(II)-S-Cys277 complex is not attributed to copper release. To test the likelihood of a reductive decay pathway, X-band electron paramagnetic resonance (EPR) spectra were collected during a parallel time course (Figure 2.3B). The EPR time course shows the presence of at least two closely related paramagnetic copper species forming within 86 s (black trace) and their near-disappearance over approximately 1 hour (green trace). Experimental constraints on the mixing dead time prevented capture of the initial binding phase. Qualitatively, the EPR spectra contains a mixture of at least 3 species. Two of these species, with similar g values and hyperfine coupling constants, appear to be sulfur-ligated copper (purple arrows). Additionally, the intense feature near g = 2.00 may be tentatively assigned as an organic radical. Despite these complexities, the EPR time course demonstrates the disappearance of EPR-active copper with comparable kinetics to the decay of Cu(II)-scFGE absorbance features. The loss of EPR-active copper is consistent with the

35

autoreduction of Cu(II)-scFGE to Cu(I)-scFGE. As expected from prior studies, direct binding of + – Cu(I) to scFGE was also possible. Anaerobic addition of a single equivalent of Cu(MeCN)4 PF6 , followed by buffer exchange, yielded stable holoenzyme with >0.98:1 Cu:FGE (data not shown). A B

Figure 2.3. The transient Cu(II)-scFGE complex is autoreduced to Cu(I)-bound scFGE. (A) The fraction of bound Cu over time following addition of Cu(NO3)2 to apo-scFGE, under anaerobic conditions. Time-point aliquots were removed and buffer exchanged prior to measurement of FGE-bound copper by biquinoline assay. Error bars represent the standard deviation of three replicates. (B) X-band EPR quantification of total Cu(II) following addition of Cu(NO3)2 to apo- scFGE under anaerobic conditions. Each spectra represents a single time point aliquot (black trace = 86 s, green trace = 1 hr) removed from a solution of Cu(II)-scFGE and flash frozen in liquid N2 for analysis. Purple arrows indicate the hyperfine-coupling features of two similar copper species. The intense feature near g =2.00 may be ascribed to an organic radical.

Insight into the mechanism of Cu(II)-FGE autoreduction Typically, biological copper is transferred to nascent copper enzymes from cuprous carriers.17 Therefore, the function of the transient Cu(II)-scFGE species may be related to a specific catalytic context as opposed to a native metallation process. It was for this reason especially that information detailing the mechanism of autoreduction was desired. In binding reactions containing only enzyme and Cu(II), protected from O2, we sought to first identify the origin of electrons that reduce Cu(II) to Cu(I). Considering the facile oxidation of thiols by Cu(II),22 enzyme cysteines were predicted to act as terminal electron donors for autoreduction. In addition to the C272 and C277 copper ligands, scFGE possess 3 isolated and surface-exposed cysteine residues (C193, C237, C301). To determine if Cu(II) autoreduction accompanies a change in the thiol-disulfide redox balance, anaerobic reconstitution reactions were analyzed by non-reducing

36

SDS-PAGE (Figure 2.4). Therein, Cu(II) treatment accompanies time-dependent formation of disulfide mediated FGE homodimers and trimers. Observation of this effect does not inform the individual steps of Cu(II) autoreduction, but underscores the apparent instability of Cu(II)- scFGE. Though apo-scFGE binds Cu(II) rapidly, it forms a complex with sufficiently high reduction potential to scavenge FGE-derived electrons. We posit that the organic radical present in Cu(II)-scFGE EPR spectra may originate from active site or surface exposed cysteinyl- radicals which form as intermediates of copper autoreduction.

Figure 2.4. Oxidative dimerization of apo-FGE following Cu(II) addition. One equivalent of Cu(NO3)2 was added to apo-scFGE anaerobically, and time point aliquots were removed and alkylated with N-ethylmaleimide in SDS to prevent further thiol-disulfide exchange. Aliquots were analyzed by non-reducing SDS-PAGE 1. apo enzyme control; 2-7. 1, 2, 5, 10, 30, and 60 min following Cu(II) addition.

Redox preference of the resting enzyme The catalytic properties of each copper oxidation state have not been addressed directly in prior studies. Though the activity of Cu(I)-loaded FGE has been demonstrated under steady-state conditions,4-5 our data prove that apo-FGE binds both Cu(I) and Cu(II). Cu(II)-scFGE is unstable over a period of minutes, but the possibility that it is catalytically competent in the presence of substrate and O2 is kinetically feasible. The observed rate-constant for the decay of Cu(II)-scFGE is more than 2 orders of magnitude lower than that of the turnover number previously reported for scFGE (0.001 s-1 vs. 0.3 s-1).3 Further, the oxidation state of the resting enzyme places specific constraints on the mechanistic pathway available for substrate oxidation. For this reason, testing explicitly the catalytic ability of each Cu-FGE species was of fundamental importance. Single-turnover reaction conditions, containing a ratio of 1:1 FGE:substrate, were employed to compare the activity of Cu(I)- and Cu(II)-scFGE during the first turnover. These assays used a 15-mer peptide substrate (Cys-scP15) containing a 13 residue recognition motif from an S. coelicolor sulfatase. To account for the inherent instability of the Cu(II)-scFGE complex, the addition of 1 equivalent of Cu(NO3)2 to apo-scFGE was performed immediately prior to each reaction time course. After 2 minutes of reaction time, Cu(I)-scFGE achieves nearly 70% conversion of substrate to fGly, compared to only 10% conversion by Cu(II)-scFGE (Figure 2.5).

37

A 5 minute pretreatment with a 200-fold excess of DTT (2mM) has no effect on the rate of product formation for Cu(I)-scFGE, but results in a substantial activation of Cu(II)-scFGE. These data authoritatively define Cu(I)-scFGE as the resting enzyme. It is notable that the inclusion of DTT does not further activate the pre-reduced enzyme for a single turnover, as during multi- turnover conditions.4 This effect provides clarification; the input of external electrons from a soluble donor is not required to resolve an intermediate, but instead occurs only after the chemical steps leading to product formation.

80

70

60

50 [fGly] (μM) 40

30

20 Cu(I)-scFGE Cu(I)-scFGE + 2 mM DTT 10 Cu(II)-scFGE Cu(II)-scFGE + 2 mM DTT 0 0 20 40 60 80 100 120 140 time (sec) Figure 2.5. Single-turnover activity assays of Cu(I)- and Cu(II)-loaded scFGE. Reactions were initiated by the addition of FGE, in either copper redox state, to a final concentration of 100 µM in solutions containing 100 µM substrate peptide (DNP-GGICTPARASLLTGQ-NH2) in 50 mM Tris 7.5, 0.5 M NaCl, 10% glycerol (v/v) buffer at 4 °C. Reactions were performed in the presence or absence of the reducing agent DTT.

X-ray absorption spectroscopy of Cu(I)-scFGE Our biochemical data define the importance of the Cu(I)-scFGE complex as the catalytic state, a structural model of which was provided by X-ray crystallography. However, the fraction of any oxidized copper, and the distribution of alternative coordination states can be obscured during the collection of X-ray diffraction data.23 Particularly, the 2-coordinate crystallographic model of Cu-FGE cannot definitively rule out the existence of a third ligand to copper in solution. For FGE, such a ligand could be contributed by solvent, and the abundance of a 3-coordinate complex may be controlled by equilibria dependent on temperature or pH effects. Several water molecules, bound 3-4 Å from Cu, and H-bonded to active-site side chains, could be candidates for a labile third ligand. This ambiguity restricts the utility of extrapolation from the crystal structure, and can be eliminated by further structural interrogation of the catalytic cuprous complex. Unfortunately, the closed shell d10 configuration of Cu(I) is silent by EPR or optical

38

methods. Therefore, X-ray absorption spectroscopy (XAS) was selected for characterizing Cu(I)- FGE. XAS permits interrogation of oxidation state, coordination number, ligand identity, and bond distances in Cu(I) complexes.24 The Cu(I)-scFGE holoenzyme was prepared anaerobically and found to be stable at a concentration of 1 mM in 50 mM Tris 9.0, 0.5 M NaCl, 10% glycerol. Buffer pH 9.0 was selected because it is close to the maxima in the pH-rate profile,25 which may have important effects on the ligand occupancy. First, the Cu K-edge X-ray absorption spectrum of a frozen solution of Cu(I)-scFGE was examined (Figure 2.6A). The presence of an absorption feature in the region of 8984 eV distinguishes Cu(I) from Cu(II) complexes. This K-edge feature has been assigned as the Cu (1s  4p) transition, and its intensity is indicative of coordination number, among other properties.26 The normalized intensity of this feature in the Cu(I)-scFGE complex is close to 1.0, corresponding to the presence of 2-coordinate Cu(I).27 For reference, the Cu(I)- scFGE absorption spectrum is plotted with those of model 2- and 3-coordinate cuprous complexes (Figure 2.6A). A

B C

39

Figure 2.6. X-ray absorption spectra of Cu(I)-scFGE. (A) Cu K-edge XAS spectra of Cu(I)- scFGE, shown for comparison with model 2- and 3-coordinate cuprous complexes. (B) Cu K- edge spectra and (C) EXAFS data (inset) and non-phase-shift-corrected Fourier transforms for Cu(I)-FGE (black) and autoreduced Cu(II)-scFGE  Cu(I)-scFGE (blue). All samples were prepared anaerobically in 50 mM Tris 9.0, 0.5 M NaCl, 10% glycerol buffer.

Extended X-ray absorption fine structure (EXAFS) spectra are collected energy levels beyond the edge region, where rapid oscillations in absorption intensity correspond to constructive and destructive scattering of X-ray excited photoelectrons by ligand atoms.28 EXAFS data (Figure 2.6C) was used, starting with initial parameters derived from the crystal structure, to estimate bond length and identity of ligand atoms in Cu(I)-scFGE. The best fit was determined to fully corroborate the crystal structure of the holoenzyme: 2 sulfur ligands with an average S–Cu bond length of 2.14 Å (summary, Table 2.2). It was also of interest to characterize the end-point copper configuration of the autoreduced enzyme. After decay to transparency (~1 hr), an anaerobic Cu(II)-scFGE  Cu(I)-scFGE sample was flash frozen and likewise subjected to XAS/EXAFS. Comparing absorption data for both complexes, it is concluded that autoreduction produces an identical cuprous enzyme complex as direct Cu(I) binding to FGE (Figure 2.6B & C). Table 2.2. Cu(I)-scFGE EXAFS Fitting Results

a b 2 2 c d Complex Fit # CN/Path R(Å) σ (Å ) ΔE0(eV) Error F

Cu(I)-scFGE 1A 1 Cu-S 2.14 67 -13.68 0.41 1B 2 Cu-S 2.14 390 -13.79 0.29 1C 3 Cu-S 2.14 641 -14.14 0.34

Cu(II)-scFGE  Cu(I)-scFGE 2A 1 Cu-S 2.13 97 -15.75 0.43 (autoreduced) 2B 2 Cu-S 2.13 420 -15.60 0.30 2C 3 Cu-S 2.13 669 -15.82 0.34 aCN is coordination number. bThe estimated standard deviations in R are ±0.02 Å. cThe σ2 values are 5 d 6 2 6 2 1/2 multiplied by 10 . The error F is given by [∑k (χexptl – χcalcd) /∑k χexptl ] . Errors in coordination numbers are ±25% and those in the identity of the scatterer Z are ±1. The best acceptable fit result for each complex is emphasized with the bold and italic font.

Discussion

Structural characterization of the FGE holoenzyme This study reveals the first X-ray crystal structure of the FGE holoenzyme containing copper, its native metal cofactor. The Cu binding site matches that of the Ag(I) inhibited crystal (PDB 5NXL). Single-turnover activity of Cu(I)-scFGE and XAS/EXAFS data together validate the 2-

40

coordinate X-ray structure as a genuine representation of the catalytically competent resting state copper configuration. The similarity of this Cu-binding motif to well-known Cu(I) binding proteins is intriguing because FGE has instead adopted it to enable copper-O2 catalysis. There is no sequence homology between previously described cuprous binding proteins and FGE.15-17 Though some cuprous proteins, for example the copper-responsive transcriptional activator CueR, bind copper with a seemingly identical linear bis(thiolate) complex (Figure 2.8), a notable discriminating feature is the Cβ1-S1-[Cu]-S2-Cβ2 torsion angle (φ). In FGE, φ = 19°, whereas in the crystal structure of CueR, φ = 108° (PDB 1Q05). In model 2-coordinate digonal bis(thiolate) cuprous complexes, φ values of 90° (CueR-like) are estimated to be favored by a small margin of 4 29 -21 kJ/mol compared to 0 or 180° (FGE-like). CueR binds Cu(I) with an affinity (Kd ~10 M) approximately 4 orders of magnitude higher than FGE.30 Although several additional energetic features are required to account for the large discrepancy between affinities, φ may at least be a telling correlate of functional divergence. Contrary to a static role in ligating Cu(I), the unique deviation of this torsion angle in FGE can be viewed as one indication that its copper site has instead evolved to perform the dynamic requirements of oxidase/oxygenase catalysis, such as the accommodation of higher coordination states.

Discovery of a transient Cu(II)-FGE complex The discovery and characterization of a non-catalytic, or perhaps pre-catalytic, Cu(II)-scFGE complex is also described. The kinetics of Cu(II)-dependent UV/vis and EPR features demonstrate the instability of this complex, which decays by undergoing autoreduction via electron scavenging from protein thiols. Owing to the softer character of the Cu(I) cation compared to Cu(II), sulfur ligands are known to confer a high reduction potential for the Cu(I)/Cu(II) couple.31 Therefore, Cu binding and XAS data provide a coherent model for a redox process that all the while maintains bound metal, but converts Cu(II)-scFGE to a stable Cu(I)- scFGE conformation (Figure 2.7).

41

Figure 2.7. A redox-dependent copper binding pathway. Cu(II) binding forms a high reduction potential active-site intermediate which is subsequently autoreduced to Cu(I)-scFGE. Electrons ultimately originate from the oxidation of protein thiol groups. Direct binding of Cu(I) is not accompanied by the appearance of spectroscopic features, and forms a catalytically active complex when exposed to substrate and O2.

Here we more deeply consider the properties of the Cu(II) complex, and the potential utility this complex for further functional insight into catalysis. In the Cu(I)-scFGE crystal structure, Cβ–S– Cu bond angles are 102°/89° for Cys272/Cys277, respectively. The ligand configuration of Cu(II)-scFGE is assumed to be similar with respect to the presence of sulfur ligands at minimum, because of the elimination of Cu(II) binding and absorbance features in the C272A/C277A mutant. This Cβ–S–Cu bonding configuration is reminiscent of bis(thiolate) ligation in Cu(II)- substituted horse liver alcohol dehydrogenase (HLADH), and permits substantial σ overlap between one pair of sulfur 3p(π) orbitals and the copper d-orbitals.32 Thus, the UV/vis spectroscopic features of Cu(II)-scFGE are tentatively assigned as a pair of SCu2+ ligand-to- metal charge transfer (LMCT) bands contributed from each S ligand. However, it is noted that Cu(II)-scFGE lacks the lower-energy LMCT bands originating from the orthogonal pair of sulfur 3p(π) orbitals. This is taken to mean that the true geometry of the Cu(II)-scFGE complex may differ from that of Cu(II)-HLADH/Cu(I)-scFGE. But, it is yet unclear if the ligand environment of the transient Cu(II)-scFGE is analogous to that of a recently reported inactive Cd(II)-FGE crystal structure (PDB 5NYY).14 In this structure, Cd(II) binding rotates the side chain of Cys277 (S. coelicolor numbering) and rearranges to a tetrahedral 4-coordinate structure, with additional bound water and acetate ligands from solvent. The authors propose that this complex may be 14 analogous to an intermediate in catalysis containing bound substrate and O2. The true relevance

42

of either Cu(II)-scFGE, as prepared and characterized here, or Cd(II)-FGE remains unknown. Its relative instability notwithstanding, the Cu(II)-scFGE complex is a valuable tool for probing the identity of intermediates because it contains the native metal. For instance, X-ray data collection immediately following Cu(II) addition to apo-scFGE crystals, with beamline intensity modulated to limit photoreduction,21 could capture a high-resolution structure of Cu(II)-scFGE. This result would determine the relevance of the Cd(II) tetrahedral configuration in describing alternative coordination states involved in turnover. Likewise, it is possible that Cu(II)-scFGE may be stabilized to reduction and amenable to further structural studies by trapping with substrate and a superoxide mimic, e.g. CN–.33 The formation of a dead-end complex would be impressive validation of a 4-coordinate complex implicated in H-atom abstraction (vide infra).5, 14

FGE is a structurally distinct monocopper oxidase The FGE resting enzyme is structurally analogous to cuprous binding proteins by way of metal binding site, but it shares catalytic behavior, and likely mechanistic features, with the mononuclear copper oxidases/oxygenases.34 In this way, FGE is uniquely situated between two prominent areas of biological copper functionality. To our knowledge, FGE is the sole copper- dependent O2-activating enzyme that oxidizes a substrate carbon adjacent to a thiol moiety. For this reason, understanding the details of catalysis by FGE is also of special interest. Several classes of O2-activating mononuclear (and non-coupled binuclear) copper enzymes serve as models for substrate oxidation pathways. Galactose oxidase (GAO) catalyzes the 2e– oxidation of D-galactose to D-galacto-hexodialdose directly from a Cu(II)/cofactor radical pair through an inner-sphere bound complex, only then binding and reducing O2 to H2O2 to regenerate the resting 2e– oxidized active site.35 Lytic polysaccharide monooxygenase (LPMO), in the vicinity of substrate, first undergoes reduction to Cu(I) and then binds O2 to potentially form a Cu(II)– superoxo or, depending on the input of external electrons, a Cu(II)-oxyl intermediate. At this stage, C–H abstraction/hydroxylation at the anomeric carbon of a carbohydrate substrate occurs without direct ligation of substrate at copper.36 Peptidylglycine-hydroxylating monooxygenase (PHM) and related family members likewise catalyze substrate hydroxylation, but have a second non-coupled copper center for the purpose of precisely-tuned electron transfer to the first copper site.37-38 The distinct copper coordination sites of FGE, CueR, LPMO and GAO are represented in Figure 2.8.

43

Figure 2.8. Structural comparison of diverse protein copper centers. High-resolution structures of Cu sites are displayed for: FGE (this study), LPMO (2YOX), CueR (1Q05), and GAO (1GOF).

Several biochemical details must be taken into account prior to applying the reactivity paradigms of the aforementioned copper enzymes to FGE. Prior to the identification of copper as a cofactor 11 for FGE, a stoichiometry of ~1.17:1 O2:fGly was measured. Without care to reconstitute FGE with copper prior to use, these measurements were presumably performed using mostly inactive apo enzyme containing a small amount of copper-bound active enzyme. However, a ratio of oxygen consumption close to one with product was reproduced in our holoenzyme preparations (0.94:1 O2:fGly, data not shown).The single-turnover data in this study make clear two additional constraints: (1) the catalytic competency of the cuprous enzyme state, eliminating a GAO-like ping-pong mechanism; and (2) the requirement for an external electron donor only after the chemical steps leading to product formation are complete. Therefore, two oxidase reaction mechanisms can be envisioned (Figure 2.9). Both mechanisms rely on two successive 1e– oxidation steps to convert substrate thiol to thioaldehyde, forming fGly by hydrolysis off- enzyme. In either case, the powerful observation of a kinetic isotope effect on the pro-(R)-β- hydrogen of the substrate cysteine (kH/kD = 3.0-3.7) dictates that H-atom abstraction is the rate- determining step.4 Considering the availability of only a single electron from the Cu(I) site prior to this step, it is currently difficult to conceive of an oxidant responsible for H-atom abstraction other than a Cu(II)–superoxo complex. As alluded to prior, this species has been invoked for several copper enzymes, though its direct identification in a protein remains an outstanding challenge.6

44

Figure 2.9. Proposed mechanisms of FGE catalysis. Two divergent reaction pathways are considered, and distinguished by the configuration of bound substrate. Bonding of the substrate sulfur to the copper center constitutes an inner-sphere oxidation mechanism (shaded box), whereas an alternative pathway relies only on the non-covalent formation of an enzyme-substrate complex prior to O2 binding and activation. Of most pressing importance for FGE specifically is determining the manner in which substrate binds. In the first pathway, substrate binds non-covalently to FGE and adjacent to the copper center. Binding would promote O2-activation at the Cu(I) site to form the reactive 3-coordinate Cu(II)–superoxo complex. Alternatively, an inner-sphere pathway would involve direct ligation of the substrate sulfur to Cu(I) prior to O2 involvement; this mechanism predicts a 4-coordinate

45

Cu(II)–superoxo intermediate.14 The inner-sphere mechanism would be novel among monocopper oxidases/oxygenases but would rationalize the need for an atypical copper site. Though such a complex may be prove intractable to crystallization, our ability to access the Cu- FGE crystal is encouraging. Moreover, demonstrating here that FGE is suitable for study by XAS offers the prospect of testing this model directly without requiring a crystalline sample. The viability of this intermediate could be tested by combining Cu(I)-scFGE and the peptide substrate anaerobically, as it should afford the 3-coordinate complex if protected from O2. It must also be stated that it has not been categorically disproven that FGE operates instead as a monooxygenase, but it would be challenging to test for oxygen incorporation into the fGly 18 product directly by an O2 labeling experiment. The rapid aldehyde-diol equilibrium of fGly in solution would lead to scrambling of labeled product. A monooxygenase mechanism is presumably burdened by the obligation of a 2e– oxidized intermediate comprised of a geminal thiol-alcohol species. The ability of this intermediate to eliminate hydrogen sulfide without further oxidation is unclear. Regardless, the characterization reported herein begins to provide a coherent mechanistic framework for understanding catalysis by this singular enzyme.

Materials and Methods

Molecular biology and protein purification A Streptomyces coelicolor FGE (scFGE) construct was previously cloned into the pET151 expression vector.10 Point mutants were generated using the Q5 site-directed mutagenesis kit (New England Biolabs). Expression and purification of FGE were performed as follows. Single clones of Escherichia coli BL21 DE3 cells (New England Biolabs) transformed with pET151- scFGE were used to inoculate 250 mL of LB with ampicillin and grown for 16 hours at 37 ºC. Terrific broth containing ampicillin and glycerol was inoculated at a ratio of 40:1 with overnight culture and grown to OD 0.4–0.6 at 37 ºC. Cultures were cooled to 18 ºC and induced with isopropyl-α-thiogalactoside at a final concentration of 0.1-1 mM. Following 18-20 hours of growth, cells were harvested by centrifugation at 5000 x g for 30 minutes at 4 ºC. Cell pellets were resuspended with lysis buffer (0.5 M NaCl, 50 mM Tris-HCl pH 8.0, 25 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine hydrochloride, 10% glycerol (v/v)) at approximately 5 mL per 1 g of cell pellet, 1:10000 Pierce Universal Nuclease (v/v) (Thermo Fisher Scientific), and with 1 EDTA-free protease inhibitor cocktail tablet (Roche) per 50 mL of resuspension. The cell suspension was first homogenized to a single-cell suspension with a Dounce homogenizer. Then cell lysis was accomplished by 3 passes through an Emulsiflex C3 homogenizer (Avestin) at 15 kPa. Lysate was cleared by centrifugation at 20,000 x g for 45 min at 4 ºC. The supernatant was applied to a 5 mL HisTrap HP Ni-NTA column (GE Life Science) connected to an Akta Pure FPLC (GE Healthcare Life Sciences). Flow through was monitored by A280 and washed with lysis buffer until absorbance reached baseline. His6-tagged protein was eluted with a gradient of 0-50% Elution buffer (Lysis buffer containing 0.5 M imidazole) over 10 column volumes. The N-terminal His6 tag was removed from combined fractions by cleavage with Tobacco Etch Virus (TEV) protease at a ratio of 0.02-0.05 (w/w) to FGE overnight at 4 ºC, followed by TEV removal with a subtractive Ni-NTA column. TEV protease was isolated in-house using a published 39 procedure. His6-free FGE was concentrated and further purified by gel-filtration

46

chromatography on a Superdex 75 HiLoad column (GE Healthcare Life Sciences) in storage buffer (50 mM Tris-HCl pH 7.5, 0.5 M NaCl, 1 mM dithiothreitol (DTT), 10% glycerol (v/v)). The concentration of purified proteins was determined by absorbance at 280 nm under denaturing conditions, using the method of Gill and von Hippel.40 Typical final purity was greater than or equal to 90% and the final yield was approximately 30 mg per 1 L of E. coli culture. Proteins were aliquoted, flash frozen with liquid N2, and stored at -80 ºC until use.

Crystallography Reconstituted Cu-FGE was crystallized under similar conditions as described previously.10 Crystals were grown by hanging-drop vapor diffusion at 23 °C for one week using a 2uL:2uL mixture of protein solution (8 mg/ml in 50 mM Tris 7.5, 0.5 M NaCl, 10% glycerol, 1 mM DTT) and precipitant buffer (100 mM Tris 8.0, 3.2 M ammonium formate, 0.3% octyl-beta-glucoside (w/v), 3.2% 2-butanol (v/v)) in Easy-Xtal trays (Qiagen). Crystals were then harvested and soaked, and flash-frozen in liquid nitrogen prior to data collection. Specifically, Cu-FGE crystals were soaked in 1mM DTT (in precipitant buffer) for 15 minutes, then soaked in CuCl2 (in precipitant buffer) for 30 minutes, and finally, backsoaked and cryoprotected in 20% glycerol (in precipitant buffer). Diffraction data were collected from single crystals at SSRL BL7-1 (SLAC, Menlo Park, CA) using an ADSC Quantum-315 charge-coupled device (Area Detector Systems, Poway, CA). All data sets were processed with XDS and AIMLESS. The structures were solved by molecular replacement using PHENIX.41 The structures were subjected to iterative rounds of refinement PHENIX REFINE, and model building using COOT (Crystallographic Object- Oriented Toolkit).42 Final model statistics were calculated with PHENIX and MolProbity.43 The figures were visualized using PyMOL (Schrödinger).

Peptide synthesis Peptide substrates were synthesized manually on Fmoc-Rink Amide MBHA low loading resin (EMD Millipore) using commercially available Nα-Fmoc protected amino acid monomers. Peptide coupling reactions used the reagent COMU in DMF.44 A 2-10 fold molar excess of amino acid monomer was used in each coupling step with reaction times of 30 minutes up to overnight with agitation by N2 gas. Fmoc groups were deprotected by treatment with 20% piperidine in DMF for 15 min. All peptides used herein were amidated at the C-terminus, and the N-terminus either acetylated with acetic anhydride, or derivatized with the chromophore 2,4- dinitrobenzene by reaction with 1-fluoro-2,4-dinitrobenzene (Sanger’s reagent). Following peptide synthesis, peptides were cleaved from solid phase and side chains globally deprotected using a cocktail containing 94:2.5:2.5:1 trifluoroacetic acid (TFA)/H2O/ethane dithiol/triisopropylsilane for 3 h under N2. The cleavage mixture was filtered to remove resin, concentrated under vacuum, and triturated by dropwise addition to diethyl ether (Et2O) at 4 °C. Precipitated crude peptide was collected by centrifugation and washed twice more with Et2O prior to dissolution in H2O and lyophilization. Crude peptides were purified using reverse-phase chromatography with a gradient of 20-30% acetonitrile (MeCN)/water + 0.1% TFA over 60 min on a Microsorb 100-5 C18 Dynamax 250 x 21.4 mm column (Agilent Technologies). Peptides were characterized by HPLC-UV/vis to be of greater than 98% purity.

47

Peptide characterization by high-resolution mass spectrometry Substrate peptides were sequenced using liquid chromatography-electrospray ionization spectrometry as follows. Purified peptides were diluted to 0.5 µM with 0.1 % formic acid. 1 uL (500 fmol) of each peptide was injected on an Acclaim PepMap RSLC C18 column (Thermo Fisher Scientific) using a Dionex UtiliMate 3000 HPLC (Thermo Fisher Scientific) at a flow rate of 0.3 µL/min, connected in-line to an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). Instrument method parameters were as follows: MS1 resolution: 60,000 at 400 m/z; scan range: 300−1500 m/z. The most abundant ions (top speed at n=3 seconds) were subjected to higher-energy collision induced dissociation (HCD), collision induced dissociation (CID), and electron transfer dissociation (ETD). HCD parameters: 30% normalized collision energy; CID parameters: activation q 0.25, 35% normalized collision energy; ETD parameters: calibrated ETD reaction times, AGC reagent target at 1e5. For all MS2s, isolation width was set at 2 m/z. Dynamic exclusion was enabled with a repeat count of 3, a repeat duration of 10 seconds, and an exclusion duration of 10 seconds. Data analysis was accomplished using Xcalibur software (Thermo Fisher Scientific). All peptide sequences were manually validated (see Appendix).

Anaerobic reconstitution of FGE with Cu(I) or Cu(II)

Solutions of enzyme and buffer were transferred into an anaerobic glovebox under N2 atmosphere to equilibrate overnight. To remove DTT and adjust pH, FGE was exchanged from storage buffer into 50 mM Tris-HCl pH 9.0, 0.5 M NaCl, 10% glycerol (v/v) using pre- equilibrated Bio-Spin 6 buffer exchange spin columns (Bio-Rad Laboratories). In order to determine the extent of Cu(II) autoreduction, FGE was mixed with 1 equivalent of Cu(NO3)2 in pH 9.0 buffer and distributed into EPR sample tubes. Sealed time point samples were removed from the glovebox and immediately frozen in liquid N2 for analysis. A parallel set of time point samples was buffer exchanged using Bio-Spin 6 columns to remove unbound Cu(II) and removed from the glovebox for total copper quantification by colorimetric biquinoline assay as described previously.45 For loading with Cu(I), exchanged protein was instead reconstituted at 4 °C in pH 9.0 buffer with 1:1 FGE:Cu(I) at a final concentration of 0.5 mM each using a freshly + – prepared 10 mM Cu(MeCN)4 PF6 stock in anhydrous MeCN. A second buffer exchange step removed unbound Cu(I) and MeCN. FGE was concentrated as necessary using 10 kDa molecular weight cutoff spin filters (EMD Millipore). This procedure yielded Cu(I)-scFGE with 98-99 mol % copper.

Stopped-flow UV/vis spectroscopy

Solutions of apo-scFGE and Cu(NO3)2 were prepared in the glovebox under N2 atmosphere in 50 mM Tris-HCl pH 9.0, 0.5 M NaCl, 10% glycerol (v/v) buffer. To observe Cu(II) binding to apo- scFGE, Syringe 1 contained 200 µM apo-scFGE (WT or C272A/C277A) and Syringe 2 contained 200 µM Cu(NO3)2, both in pH 9.0 buffer. An SX-20 stopped-flow UV/vis spectrophotometer (Applied Photophysics) was used to collect UV/vis spectra with a photodiode

48

array detector following a mixing dead time of 7 ms. All stopped-flow UV/vis experiments were performed at 4 °C, maintained by a chilled recirculating bath.

Enzyme activity assays Discontinuous assays for FGE activity were performed as previously described.3 Peptide stock concentrations were measured by absorbance using the published extinction coefficient at 350 -1 -1 46 nm for N-dinitrophenyl (DNP)-peptides, ε350 = 15800 M cm . All reactions were done in 1.5 mL tubes on a temperature controlled Thermomixer R (Eppendorf) with shaking at 1000 RPM. Single turnover reactions were run at 4 °C and contained 50 mM Tris 7.5, 0.5 M NaCl, 10% glycerol (v/v), 100 µM DNP-GGICTPARASLLTGQ-NH2 (DNP-P15), and additional additives as noted. Single turnover reactions were initiated by addition of 100 µM scFGE. To directly test the activity of Cu(II)-loaded scFGE, 1 equivalent of Cu(NO3)2 was added to apo-scFGE, the mixture briefly mixed, and added to single turnover reaction solutions. This was done immediately prior to each Cu(II)-scFGE single turnover reaction. All reactions were manually quenched to a final concentration of 100 mM HCl and 100 uL was injected on an Agilent 1200 HPLC connected in-line to a diode array detector. A C18 poroshell column (Agilent Technologies) was used to resolve product (fGly peptide) and starting material (Cys peptide) with a gradient of 0-50% MeCN/water + 0.1% TFA. Percent conversion was calculated by integrating product and starting material absorbance at 350 nm.

Electron paramagnetic resonance spectroscopy X-band EPR spectra were obtained with a Bruker EMX spectrometer, an ER 041 XG microwave bridge, and an ER4116DM cavity. A sample temperature of 77 K was maintained using a liquid nitrogen finger dewar. EPR settings were as follows: Freq. ≈ 9.6 GHz, Power ≈ 10 mW, Mod. Amp. = 4.00 G. All spectra were averaged over 10 scans. EPR spin quantitation of the paramagnetic copper content was performed using a 0.945mM AAS Cu standard solution, Specpure (purchased from Alfa Aesar) in MES (pH 6.0) and 40% glycerol. EPR spectra were simulated using the SpinCount simulation software developed by Professor Michael Hendrich of Carnegie Mellon University (www.chem.cmu.edu/groups/hendrich/facilities/index.html).

X-ray absorption spectroscopy Solutions of Cu(I)-scFGE and autoreduced Cu(II)-scFGE were prepared anaerobically as described in text. The samples were loaded into delrin XAS cells with 38 μm Kapton windows, removed from the glovebox, and immediately frozen and stored in liquid N2 until analysis. The Cu K-edge X-ray absorption spectroscopy (XAS) was conducted at the Stanford Synchrotron Radiation Lightsource (SSRL) on the focused 16-pole, 2 T wiggler side-station beam line 9-3 under storage ring parameters of 3 GeV and ~500 mA. A Si(220) double-crystal monochromator was used for energy selection. A Rh-coated pre-monochromator mirror was used for harmonic rejection and vertical collimation, and a cylindrical Rh-coated post-monochromator mirror was used for beam focusing. The samples were maintained at a constant temperature of ~10 K during data collection using an Oxford Instruments CF 1208 liquid helium cryostat. A Canberra 100-

49

element Ge monolith solid-state detector and soller slits equipped with a Ni filter were used to collect Cu Kα fluorescence data. Internal energy calibration was accomplished by simultaneous measurement of the transmission of a Cu foil placed between two ionization chambers situated after the sample. The first inflection point of the foil spectrum was assigned to 8980.3 eV. The extended X-ray absorption fine structure (EXAFS) data are reported to k = 12.8 Å-1 in order to avoid interference from the Zn K-edge. Photodamage was not observed during the course of data collection. The EXAFS data presented here are 5-scan average spectrum collected over three fresh sample positions and 12-scan average spectrum collected over six fresh sample positions for Cu(I)-FGE and autoreduced Cu(II)-scFGE. Background subtraction and normalization of the data were performed using PySpline (Stanford Linear Accelerator Center). The data were processed by fitting a second-order polynomial to the pre-edge region and subtracting this from the entire spectrum as a background. A three-region polynomial spline of orders 2, 3, and 3 was used to model the smoothly decaying post-edge region. The data were normalized by scaling the spline function to an edge jump of 1.0 at 9000 eV. The EXAFS curve-fitting analysis program OPT in EXAFSPAK (George, G. N. Stanford Synchrotron Radiation Laboratory: Stanford, CA, 2000) was used to fit the EXAFS data. The theoretical phase and amplitude functions were generated by FEFF (version 7.0, University of Washington). A starting structural model was obtained from the crystal structure. During the fitting process, the bond distance (R) and the mean-square thermal and static deviation in R (σ2), which is related to the Debye-Waller factor, were allowed to vary. The threshold energy (E0), the point at which the photoelectron wave vector k is 0, was also allowed to vary but was constrained as a common value for all 2 components in a given fit. The amplitude reduction factor (S0 ) was fixed to a value of 1.0 and the coordination numbers (N) were systematically varied to achieve the best fit to the EXAFS data and their Fourier transforms (FTs). Use of the Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science and Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. The SSRL Structural Molecular Biology Program is supported by the DOE Office of Biological and Environmental Research, and by the National Institutes of Health, National Institute of General Medical Sciences (including P41GM103393). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of NIGMS or NIH.

References

1. Appel, M. J.; Bertozzi, C. R., Formylglycine, a post-translationally generated residue with unique catalytic capabilities and biotechnology applications. ACS Chem. Biol. 2015, 10 (1), 72-84.

2. Fetzner, S.; Steiner, R., Cofactor-independent oxidases and oxygenases. Appl. Microbiol. Biotechnol. 2010, 86 (3), 791-804.

50

3. Holder, P. G.; Jones, L. C.; Drake, P. M.; Barfield, R. M.; Bañas, S.; de Hart, G. W.; Baker, J.; Rabuka, D., Reconstitution of formylglycine-generating enzyme with copper(II) for aldehyde tag conversion. J. Biol. Chem. 2015, 290 (25), 15730-15745.

4. Knop, M.; Engi, P.; Lemnaru, R.; Seebeck, F. P., In vitro reconstitution of formylglycine- generating enzymes requires copper(I). ChemBioChem 2015, 16 (15), 2147-2150.

5. Knop, M.; Dang, T. Q.; Jeschke, G.; Seebeck, F. P., Copper is a cofactor of the formylglycine-generating enzyme. ChemBioChem 2017, 18 (2), 161-165.

6. Solomon, E. I.; Heppner, D. E.; Johnston, E. M.; Ginsbach, J. W.; Cirera, J.; Qayyum, M.; Kieber-Emmons, M. T.; Kjaergaard, C. H.; Hadt, R. G.; Tian, L., Copper active sites in biology. Chem. Rev. 2014, 114 (7), 3659-853.

7. Cosma, M. P.; Pepe, S.; Annunziata, I.; Newbold, R. F.; Grompe, M.; Parenti, G.; Ballabio, A., The multiple sulfatase deficiency gene encodes an essential and limiting factor for the activity of sulfatases. Cell 2003, 113 (4), 445-456.

8. Dierks, T.; Schmidt, B.; Borissenko, L. V.; Peng, J.; Preusser, A.; Mariappan, M.; von Figura, K., Multiple sulfatase deficiency is caused by mutations in the gene encoding the human Cα-formylglycine generating enzyme. Cell 2003, 113 (4), 435-444.

9. Rabuka, D.; Rush, J. S.; deHart, G. W.; Wu, P.; Bertozzi, C. R., Site-specific chemical protein conjugation using genetically encoded aldehyde tags. Nat. Protoc. 2012, 7 (6), 1052-1067.

10. Carlson, B. L.; Ballister, E. R.; Skordalakes, E.; King, D. S.; Breidenbach, M. A.; Gilmore, S. A.; Berger, J. M.; Bertozzi, C. R., Function and structure of a prokaryotic formylglycine-generating enzyme. J. Biol. Chem. 2008, 283 (29), 20117-20125.

11. Peng, J.; Alam, S.; Radhakrishnan, K.; Mariappan, M.; Rudolph, M. G.; May, C.; Dierks, T.; von Figura, K.; Schmidt, B., Eukaryotic formylglycine-generating enzyme catalyses a monooxygenase type of reaction. FEBS J. 2015, 282 (17), 3262-3274.

12. Dierks, T.; Dickmanns, A.; Preusser-Kunze, A.; Schmidt, B.; Mariappan, M.; von Figura, K.; Ficner, R.; Rudolph, M. G., Molecular basis for multiple sulfatase deficiency and mechanism for formylglycine generation of the human formylglycine-generating enzyme. Cell 2005, 121 (4), 541-552.

13. Roeser, D.; Preusser-Kunze, A.; Schmidt, B.; Gasow, K.; Wittmann, J. G.; Dierks, T.; von Figura, K.; Rudolph, M. G., A general binding mechanism for all human sulfatases by the formylglycine-generating enzyme. Proc. Natl. Acad. Sci. U. S. A. 2006, 103 (1), 81-86.

14. Meury, M.; Knop, M.; Seebeck, F. P., Structural basis for copper-oxygen mediated C-H bond activation by the formylglycine-generating enzyme. Angew. Chem. Int. Ed. Engl. 2017, 56 (28), 8115-8119.

51

15. Rubino, J. T.; Franz, K. J., Coordination chemistry of copper proteins: How nature handles a toxic cargo for essential function. J. Inorg. Biochem. 2012, 107 (1), 129-143.

16. Boal, A. K.; Rosenzweig, A. C., Structural biology of copper trafficking. Chem. Rev. 2009, 109 (10), 4760-4779.

17. Huffman, D. L.; O'Halloran, T. V., Function, Structure, and Mechanism of Intracellular Copper Trafficking Proteins. Annu. Rev. Biochem 2001, 70 (1), 677-701.

18. Dierks, T.; Schlotawa, L.; Frese, M. A.; Radhakrishnan, K.; von Figura, K.; Schmidt, B., Molecular basis of multiple sulfatase deficiency, mucolipidosis II/III and Niemann-Pick C1 disease - Lysosomal storage disorders caused by defects of non-lysosomal proteins. Biochim. Biophys. Acta 2009, 1793 (4), 710-25.

19. York, D.; Baker, J.; Holder, P. G.; Jones, L. C.; Drake, P. M.; Barfield, R. M.; Bleck, G. T.; Rabuka, D., Generating aldehyde-tagged antibodies with high titers and high formylglycine yields by supplementing culture media with copper(II). BMC Biotechnol. 2016, 16 (1), 1-11.

20. Carugo, O.; Carugo, K. D., When X-rays modify the protein structure: radiation damage at work. Trends Biochem. Sci 2005, 30 (4), 213-219.

21. Gudmundsson, M.; Kim, S.; Wu, M.; Ishida, T.; Momeni, M. H.; Vaaje-Kolstad, G.; Lundberg, D.; Royant, A.; Ståhlberg, J.; Eijsink, V. G. H.; Beckham, G. T.; Sandgren, M., Structural and electronic snapshots during the transition from a Cu(II) to Cu(I) metal center of a lytic polysaccharide monooxygenase by X-ray photoreduction. J. Biol. Chem. 2014, 289 (27), 18782-18792.

22. Smith, R. C.; Reed, V. D.; Hill, W. E., Oxidation of thiols by copper(II). Phosphorus, Sulfur, and Silicon and the Related Elements 1994, 90 (1-4), 147-154.

23. Strange, R. W.; Feiters, M. C., Biological X-ray absorption spectroscopy (BioXAS): a valuable tool for the study of trace elements in the life sciences. Curr. Opin. Struct. Biol. 2008, 18 (5), 609-616.

24. Yano, J.; Yachandra, V. K., X-ray absorption spectroscopy. Photosynth. Res. 2009, 102 (2), 241.

25. Fey, J.; Balleininger, M.; Borissenko, L. V.; Schmidt, B.; von Figura, K.; Dierks, T., Characterization of posttranslational formylglycine formation by luminal components of the endoplasmic reticulum. J. Biol. Chem. 2001, 276 (50), 47021-8.

26. Kau, L. S.; Spira-Solomon, D. J.; Penner-Hahn, J. E.; Hodgson, K. O.; Solomon, E. I., X- ray absorption edge determination of the oxidation state and coordination number of copper. Application to the type 3 site in Rhus vernicifera laccase and its reaction with oxygen. J. Am. Chem. Soc. 1987, 109 (21), 6433-6442.

52

27. Solomon, E. I.; Lowery, M. D.; Lacroix, L. B.; Root, D. E., Electronic absorption spectroscopy of copper proteins. In Methods Enzymol., James, F. R.; Bert, L. V., Eds. Academic Press: 1993; Vol. Volume 226, pp 1-33.

28. Penner-Hahn, J. E., Characterization of “spectroscopically quiet” metals in biology. Coord. Chem. Rev. 2005, 249 (1), 161-177.

29. Pushie, M. J.; Zhang, L.; Pickering, I. J.; George, G. N., The fictile coordination chemistry of cuprous-thiolate sites in copper chaperones. Biochim. Biophys. Acta 2012, 1817 (6), 938-947.

30. Changela, A.; Chen, K.; Xue, Y.; Holschen, J.; Outten, C. E.; O'Halloran, T. V.; Mondragón, A., Molecular basis of metal-ion selectivity and zeptomolar sensitivity by CueR. Science 2003, 301 (5638), 1383-1387.

31. Belle, C.; Rammal, W.; Pierre, J.-L., Sulfur ligation in copper enzymes and models. J. Inorg. Biochem. 2005, 99 (10), 1929-1936.

32. Farrar, J. A.; Formicka, G.; Zeppezauer, M.; Thomson, A. J., Magnetic and optical properties of copper-substituted alcohol dehydrogenase: a bisthiolate copper (II) complex. Biochem. J. 1996, 317 (2), 447-456.

33. Carugo, K. D.; Battistoni, A.; Carrì, M. T.; Polticelli, F.; Desideri, A.; Rotilio, G.; Coda, A.; Bolognesi, M., Crystal structure of the cyanide-inhibited Xenopus laevis Cu, Zn superoxide dismutase at 98 K. FEBS Lett. 1994, 349 (1), 93-98.

34. Klinman, J. P., Mechanisms whereby mononuclear copper proteins functionalize organic substrates. Chem. Rev. 1996, 96 (7), 2541-2562.

35. Whittaker, J. W., The radical chemistry of galactose oxidase. Arch. Biochem. Biophys. 2005, 433 (1), 227-239.

36. Meier, K. K.; Jones, S. M.; Kaper, T.; Hansson, H.; Koetsier, M. J.; Karkehabadi, S.; Solomon, E. I.; Sandgren, M.; Kelemen, B., Oxygen activation by Cu LPMOs in recalcitrant carbohydrate polysaccharide conversion to monomer sugars. Chem. Rev. 2017, Article ASAP.

37. Cowley, R. E.; Tian, L.; Solomon, E. I., Mechanism of O2 activation and substrate hydroxylation in noncoupled binuclear copper monooxygenases. Proceedings of the National Academy of Sciences 2016, 113 (43), 12035-12040.

38. Klinman, J. P., The copper-enzyme family of dopamine β-monooxygenase and peptidylglycine α-hydroxylating monooxygenase: Resolving the chemical pathway for substrate hydroxylation. J. Biol. Chem. 2006, 281 (6), 3013-3016.

39. Tropea, J. E.; Cherry, S.; Waugh, D. S., Expression and purification of soluble His(6)- tagged TEV protease. Methods in molecular biology (Clifton, N.J.) 2009, 498, 297-307.

53

40. Gill, S. C.; von Hippel, P. H., Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem. 1989, 182 (2), 319-26.

41. Adams, P. D.; Grosse-Kunstleve, R. W.; Hung, L.-W.; Ioerger, T. R.; McCoy, A. J.; Moriarty, N. W.; Read, R. J.; Sacchettini, J. C.; Sauter, N. K.; Terwilliger, T. C., PHENIX: building new software for automated crystallographic structure determination. Acta Crystallographica Section D 2002, 58 (11), 1948-1954.

42. Emsley, P.; Cowtan, K., Coot: model-building tools for molecular graphics. Acta Crystallographica Section D 2004, 60 (12 Part 1), 2126-2132.

43. Davis, I. W.; Leaver-Fay, A.; Chen, V. B.; Block, J. N.; Kapral, G. J.; Wang, X.; Murray, L. W.; Arendall, I. I. I. W. B.; Snoeyink, J.; Richardson, J. S.; Richardson, D. C., MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 2007, 35 (suppl_2), W375-W383.

44. El-Faham, A.; Funosas, R. S.; Prohens, R.; Albericio, F., COMU: A safer and more effective replacement for benzotriazole-based uronium coupling reagents. Chemistry – A European Journal 2009, 15 (37), 9404-9416.

45. Felsenfeld, G., The determination of cuprous ion in copper proteins. Arch. Biochem. Biophys. 1960, 87 (2), 247-251.

46. Sanger, F., The terminal peptides of insulin. Biochem. J. 1949, 45 (5), 563-574.

54

Chapter 3 Insights into the mechanism of FGE: elucidation of a Cu(I)-coordinated enzyme-substrate complex, and discovery of an intermediate during catalysis

55

Chapter 3. Insights into the mechanism of FGE: elucidation of a Cu(I)-coordinated enzyme-substrate complex, and discovery of an intermediate during catalysisa

Introduction

Prior investigation of FGE has defined the structural and spectroscopic features of its copper center. As described in Chapter 2, these efforts have revealed the first Cu-bound crystal structure of FGE, the electronic absorption and EPR features of a transient cupric complex, and the functional consequences of Cu(I)/Cu(II) binding. However, the structure of the productive substrate binding conformation, and the nature of reactive Cu/O2 intermediates responsible for substrate oxidation (Figure 3.1) remain key unanswered questions in pursuit of a mechanistic framework for understanding catalysis by FGE.

Figure 3.1. Structure of potential FGE copper-oxygen complexes capable of performing H-atom abstraction from substrate.

Herein, we report dynamic changes in the coordination state of the FGE copper-center, as measured by X-ray absorption spectroscopy. These findings demonstrate a shift from 2- coordinate linear in the resting enzyme to 3-coordinate trigonal-planar in the enzyme-substrate complex. Importantly, the catalytic competency of the Cu(I)-scFGE:substrate complex upon exposure to O2 positions it on the reaction coordinate. Strikingly, conversion of this complex to product is correlated with the formation of a transient intermediate. This species was identified by its distinct absorbance at 420 nm using stopped-flow UV/vis spectroscopy. Further, the kinetic behavior of this optical feature upon deuterium substitution confirms that it corresponds to a catalytic intermediate occurring prior to, or at, the rate-determining H-atom abstraction step of substrate oxidation. Collectively, these data lead to the putative assignment of the intermediate as a cupric-superoxo complex, potentially the first identified in an active enzyme system. Though further interrogation of the electronic structure of this intermediate is required, this result may represent a critical advancement in the understanding of not only FGE, but also the greater class of O2-activating copper enzymes. Finally, this study inspired a revised proposal of FGE’s catalytic mechanism, now supported at several steps by experimental evidence (Figure 3.12).

a Katlyn K. Meier and Hyeongtaek Lim contributed to work presented in this chapter. 56

Results

X-ray absorption spectroscopy of the enzyme substrate complex The nature of the binding mode of the cysteinyl substrate at the FGE copper center has been the subject of predictions,1-2 but it has hitherto remained unsubstantiated. Experimental and computational studies have identified a shallow binding groove, adjacent to the active site, for recognition of the substrate peptide in an extended linear conformation.3-4 These studies were performed using apo-FGE but are still informative in that they orient the substrate thiol toward the two active site cysteine residues (Cys272/Cys277) now known to be the sole protein ligands of the copper cofactor. Two plausible reaction pathways are distinguished by the structure of the FGE:peptide complex at and prior to the H-atom abstraction step. Specifically at question is whether the substrate thiolate is merely situated in the vicinity of the copper center (outer-sphere pathway), or is directly bonded to it (inner-sphere pathway). The latter mechanism would anticipate major determinants of reactivity for the copper center governed by the copper- substrate bonding interaction. Therefore, a description of the substrate binding mode(s) in the reconstituted holoenzyme is of central importance.

Figure 3.2. Structure of FGE peptide substrates used in this study. Note, the text uses the names “Cys-scP14” and “H-scP14” to refer to the same peptide for contextual clarity. The labile pro- (R)-β-hydrogen is highlighted in red. X-ray absorption spectroscopy (XAS) provides a means to interrogate the ligand environment at Cu(I) directly in the presence of substrate, effectively resolving the question of inner- or outer- sphere reaction mechanisms. The structure and reactivity of the Cu-loaded S. coelicolor FGE (scFGE) has been extensively characterized, including by XAS, making it an ideal model for these studies. Two additional experimental factors were of concern. First, since FGE catalysis is O2-dependent, it was reasoned that Cu(I)-FGE and the peptide substrate may form a stable complex, in either pathway, that is strictly unreactive under anaerobic conditions. Second, since X-ray absorption profiles reflect the sum of all species present in the sample, the substrate must have sufficient affinity for Cu(I)-FGE to yield high concentrations relative to free enzyme. The peptide Ac-ICTPARASLLTGQY-NH2 (scP14) was selected as a substrate for this purpose; it is comprised of 13 residues from an S. coelicolor sulfatase sequence (NCBI Reference Sequence: WP_011031740.1), with the addition of a C-terminal tyrosine for facile quantification (Figure 3.2). After mixing 1:1 Cu(I)-scFGE:scP14 at 1 mM each under anaerobic atmosphere, samples were loaded into sample cells, flash frozen, and analyzed by XAS. X-ray near edge absorption

57

spectra (XANES) display a decrease in the intensity of the 8984 eV feature (the Cu (1s  4p) transition) of this complex compared to Cu(I)-scFGE. This suggests a change in coordination number from 2 to 3 for at least a sizable fraction of the Cu(I)-scFGE:scP14 sample (Figure 3.3A). Furthermore, fitting of XAFS data (Figure 3.3B) for this complex is consistent with 3 S– Cu bonds (summary, Table 3.1). The 2-coordinate resting enzyme was measured to have an average S–Cu bond distance of 2.14 Å; this distance increases to 2.22 Å in the 3-coordinate enzyme-substrate complex. These data represent the first structural evidence for a substrate binding mode relying on inner-sphere coordination to Cu. Moreover, the catalytic competency of this complex (vide infra) proves that it does, indeed, lay on the reaction coordinate. A B

Figure 3.3. X-ray absorption spectra of the cuprous enzyme-substrate complex. Cu K-edge XAS spectra (A), and EXAFS data (inset) and non-phase-shift-corrected Fourier transforms (B) for Cu(I)-FGE (black) and Cu(I)-FGE:scP14 (red).

58

Table 3.1. EXAFS Fitting Results for Cu(I)-scFGE and Cu(I)-scFGE-scP14

a b 2 2 c d Complex Fit # CN/Path R(Å) σ (Å ) ΔE0(eV) Error F

Cu(I)-FGE 1A 1 Cu-S 2.14 67 -13.68 0.41 1B 2 Cu-S 2.14 390 -13.79 0.29 1C 3 Cu-S 2.14 641 -14.14 0.34 Cu(I)-FGE:scP14 2A 2 Cu-S 2.22 363 -10.50 0.36 2B 3 Cu-S 2.22 580 -11.30 0.30 2C 2 Cu-S 2.17 258 -13.33 0.27 1 Cu-S 2.28 -61 2D 2 Cu-S 2.22 364 -11.68 0.35 1 Cu-O 2.56 1292 2E 4 Cu-S 2.22 771 -11.60 0.32 aCN is coordination number. bThe estimated standard deviations in R are ±0.02 Å. cThe σ2 values are 5 d 6 2 6 2 1/2 multiplied by 10 . The error F is given by [∑k (χexptl – χcalcd) /∑k χexptl ] . Errors in coordination numbers are ±25% and those in the identity of the scatterer Z are ±1. The best acceptable fit result for each complex is emphasized with the bold and italic font.

The previously reported Cu-FGE crystal structure, enhanced by information from XAS, forms the basis of more detailed structural descriptions of the Cu(I)-FGE and Cu(I)-FGE:substrate copper centers by computation. First, molecular docking simulations were employed to identify substrate binding conformations; short peptides (e.g. ICTPAR) were docked against the active- site and peptide binding pocket. Catalytically feasible conformations were encouraged by restriction of the substrate sulfur atom to movement within a 5 Å sphere centered at the copper center. Subsequently, the resting enzyme crystal structure and the substrate-docked model were subjected to geometry-optimization using density functional theory (DFT). These calculations demonstrate that the FGE copper center shifts from 2-coordinate digonal (resting enzyme) to 3- coordinate trigonal planar upon substrate association (Figure 3.4). Importantly, these structures reveal clues about changes to the electronic structure at copper that alter O2 reactivity along the reaction coordinate, addressed in the discussion section of this text.

59

A B

Figure 3.4. The FGE copper center undergoes a shift in coordination environment upon substrate binding. Geometry-optimized DTF structures of the (A) 2-coordinate linear Cu(I) resting enzyme shown with adjacent residues, and, (B) the 3-coordinate trigonal planar Cu(I) enzyme-substrate (shown as ICTP peptide) complex (right panel).

Identification of selenocysteine as an alternative substrate for FGE Catalytic promiscuity, the ability of enzymes to act on non-native substrates with a rate enhancement (kcat/kuncat) over unity, provides valuable insight into protein evolution and, as relevant here, an avenue of inquiry for understanding the determinants of enzyme reactivity and mechanism.5-7 Thus, it was of great interest to determine if FGE can accept alternative substrates, beginning with peptides containing isostructural residues in place of cysteine. It is well-known that FGE cannot catalyze the conversion of serine substrates to fGly.8 It has also been reported that for the peptide substrates typically used in in vitro activity assays, serine variants do not strongly inhibit FGE, though this effect may differ with peptide length and/or type of FGE.9-10 This may be rationalized by the inability of the serine hydroxyl group to ligate the FGE copper center, which would not only prevent the formation of a productive inner-sphere complex for O2- activation, but would also increase the binding energy of association. Selenocysteine (Sec, U) is a non-canonical amino acid that is isostructural to cysteine. Compared to cysteine, its increased polarizability contributes to a reduced pKa and decreased reduction potential, resulting in differences between their metal affinities and energetic preferences for reaction pathways.11 These unique properties confer its essentiality in critical metal binding and redox-catalytic roles in multiple enzyme classes. Sec is installed ribosomally and encoded in mRNAs by the presence of an in-frame UGA codon upstream of a specific sequence element, SECIS. Translational intervention by SECIS, a cis-acting stem-loop RNA, recruits an elongation factor responsible for directing Sec-tRNA to the nascent polypeptide.12-13 The distinct properties of Sec are integral to the catalytic capability of these enzymes; substitution of Sec with Cys typically results in diminished activity by several log-fold.13 There are no native sulfatase substrates of FGE known to encode Sec. But for the purpose of this study, and unlike serine, it was reasoned that the selenolate group could directly ligate the FGE copper center, an interaction that may promote catalysis for Sec substrates.

60

Scheme 3.1. Synthesis of Fmoc-Sec(bzl)-OH.

To access a Sec substrate, a Nα-Fmoc-(Se-benzyl)-L-selenocysteine monomer was prepared for solid-phase peptide synthesis, according to a procedure reported previously (Scheme 3.1).14 Synthetic Sec (Sec-scP15) and Cys (Cys-scP15) peptides were comprised of the 13 residue sulfatase motif, with the addition of an N-terminal Gly-Gly linker and a chromophore (Figure 3.2). The inclusion of an N-terminal dinitrophenyl (DNP) chromophore for these substrates allowed highly sensitive fGly detection by HPLC analysis. Initial reaction assays demonstrated that Sec-scP15 is indeed converted to fGly-scP15 by scFGE, the identical product of Cys-scP15 turnover. Under standardized conditions, scFGE displays comparable activity for Cys-scP14 and Cys-scP15, but converts Sec-scP15 with an activity of over two orders of magnitude lower (Table 3.2). This is the first discovery of an alternative FGE substrate with respect to substitution at the reactive cysteinyl position.

Table 3.2. Activity of scFGE toward cysteine and selenocysteine substrates. Activity is reported as the rate of fGly production after 10 min under the following reaction conditions: 1 µM scFGE, 100 µM substrate, 25 mM triethanolamine (TEAM) pH 9.0, 25 °C.

substrate activity (µM min-1)

Cys-scP14 12.5 ± 0.5

Cys-scP15 9.1 ± 0.1

Sec-scP15 0.07 ± 0

An alternative substrate is useful for revealing discriminating aspects of an enzyme mechanism if the efficiency of catalysis (kcat) in particular is altered. In lieu of complete kinetic characterization of Sec-scP15, the binding propensity of this peptide was measured indirectly via a competition experiment. The initial rate of Cys-scP14 turnover, at a concentration ~10-fold

61

greater than its KM (Table 3.3), was measured in the presence of increasing concentrations of -2 Sec-scP15. It is noted that the krel ~10 of Sec-scP15 meant that no formation of fGly-scP15 was detectable during the initial rate regime of Cys-scP14 reactions. At a concentration of 60 µM Cys-scP14, approximately 20-fold inhibition was measured with 100 µM Sec-scP15. To determine the IC50, initial rate data were fit to Equation 1 using nonlinear regression (Figure 3.5). vmaxvmin (1) v = v min 1 + Under these assay conditions, inhibition of Cys-scP14 turnover by Sec-scP15 corresponds to an IC50 of 3.8 ± 0.7 µM (0.1 µM Cu(I)-scFGE). The mechanism of inhibition is understood to be competitive, i.e. by reversible binding at the active site, because turnover of Sec-scP15 can be accomplished following prolonged reaction times. The fraction of competitive binding interactions of Sec-scP15 with FGE contain those of selenocysteine as well as those corresponding to structural differences between the scP14 and scP15 peptide scaffold at the N- terminus. Therefore, these data are not taken to represent an absolute measure of Sec-scP15 affinity. More conservatively, competitive inhibition by Sec-scP15 demonstrates that impairment of selenocysteine turnover originates mostly from a decreased rate of catalysis, and not binding, compared to Cys-scP14/Cys-scP15.

Figure 3.5. Selenocysteine inhibits FGE activity toward the native cysteine substrate. The initial rate (vo) of Cys-scP14 (60 µM) turnover was measured in the presence of increasing concentrations of Sec-scP15 (0.1-100 µM). The IC50 for FGE inhibition under these conditions is 3.8 ± 0.7 µM Sec-scP15. Within the initial rate regime of Cys-scP14, formation of the cognate -2 fGly product from Sec-scP15 (krel ~10 ) was not observed. Prior investigation has revealed an alkaline pH preference for FGE activity, but the origin of its pH dependence is unknown.15 There are no general bases present near the copper center; as such, the discovery of an inner-sphere binding mode places special emphasis on the ionization state of free substrate. Among other strategies, the identification of selenocysteine as an alternative

62

substrate for FGE supplies a means to probe pH effects. If the pH-rate profile in large part reflects the pKa of substrate, and assuming an identical mechanism, the pH dependence would be 16 subject to change based on its measurement with thiol (pKa, Cys = 8.3) or selenol (pKa, Sec = 5.2)16 substrates. Activity was measured for Cys-scP14 and Sec-scP15 across a pH range of 6.0- 10.0 and normalized separately to the maximum activity of each substrate (Figure 3.6). The pH- rate profiles were fit to Equation 2, with dependence on a single pKa, using nonlinear regression. v (2) v = max 1 +

This permitted estimation of the pKa values influencing the pH-rate dependence of each substrate: pKa, Sec-scP15 = 5.5 ± 0.1, and pKa, Cys-scP14 = 8.1 ± 0.1. Compellingly, these values are quite consistent with the pKas of free substrate at the position of the reactive side chain, 5.2 and 17 8.3 respectively. It is noted that DTT has a pKa of 9.2, and steady-state turnover may be influenced by the proportion of DTT thiolate available to initiate a reduction step.18 Regardless, these data suggest that the pKa of substrate is of much greater significance; Sec-scP15 reactions do not decrease in activity until the pH drops below 6.5, within 1 unit of the selenolate pKa. Thus, across a working pH range of at least 6.0-10.0, the pH-rate dependence appears to correspond to the solution concentration of ionized substrate. Ultimately, this relationship can be absolutely confirmed by examining the pH dependence of kcat/KM for each substrate.

Figure 3.6. pH-rate profiles for the conversion of cysteine and selenocysteine substrates to fGly. Substrates were assayed in 25 mM TEAM buffer pH 6.0-10.0, 2 mM DTT at 25 °C. Assays contained 100 µM substrate and 1 µM scFGE (Cys-scP14), or 10 µM scFGE (SecP15). The activity (µM min-1) of scFGE toward Cys-scP14 (squares) Sec-scP15 (triangles) at each pH was normalized to the maximum activity for each substrate, and does not reflect the relative activity between Cys-scP14/Sec-scP15. Unlike the radical-S-adenosylmethionine enzymes that produce fGly from peptide substrates in anaerobes,19 FGE cannot convert serine residues, nor bind them with high affinity; here, selenocysteine is converted slowly, but it has considerable binding affinity. Collectively, these

63

data support the requirement for a 3-coordinate intermediate, with anionic ligands, in order to promote reactivity of the cuprous FGE complex toward O2 (Figure 3.7).

Figure 3.7. Proposed model for catalytic discrimination between cysteine and selenocysteine substrates. Cu(I)-FGE undergoes a shift from 2- to 3-coordinate upon binding of either the thiolate or selenolate. The pre-activated enzyme-substrate complexes can then generate fGly upon addition of O2. However, turnover is significantly impaired in the case of selenocysteine, owing to decreased reactivity at the cuprous center for the Cu(I)-FGE:selenocysteine complex toward O2 activation.

Measurement of a substrate deuterium kinetic isotope effect in S. coelicolor FGE Existing steady-state kinetic data for FGE was examined to assess the likelihood that a copper- oxygen species may accumulate at sufficient quantities to allow detection. First, the rate of reaction with respect to O2 concentration was considered. In an earlier study, though KM(O2) was not specifically determined, it was reported that reaction rates saturated at approximately 105 20 µM O2 for human FGE under steady-state conditions. This result is promising because it indicates that association of enzyme with O2 is not rate-limiting under experimentally-accessible 21 O2 concentrations. In addition, selective removal of the pro-(R)-β-hydrogen of the substrate cysteine was measured to have an observed kinetic isotope effect (KIE) of kH/kD = 3.7 for wild- type FGE from the thermophilic bacteria Thermomonospora curvata.18 A normal, primary KIE such as this indicates that H-atom abstraction from substrate is, at least in part, rate-limiting.22 This is a critical feature for further investigating the mechanism of substrate oxidation by pursuing the detection of catalytic intermediates; it also provides a convenient test of candidate species. Ultimately, in order for a transient species to be plausible for consideration as an intermediate, it must display kinetic behavior consistent with the KIE upon isotope substitution. As such, determination of the equivalent KIE for scFGE was undertaken. First, protium- and deuterium- containing substrates were selected for measuring the KIE in scFGE. Though H-atom abstraction was previously determined to be 100% stereoselective to the pro-(R)-β-hydrogen of substrate cysteine, dideuterium substitution at the β-carbon of cysteine was preferred for synthetic ease. Contribution from a secondary isotope effect, reporting on the degree of hybridization change at the β-carbon in the transition state, may alter the magnitude of 22 KIEobs, but this effect is typically minor in the absence of tunneling. Thus, it was reasoned that dideuterium substitution would provide a suitable substrate for KIE determination.

64

Figure 3.8. Steady-state kinetic analysis of heavy and light FGE substrates. Steady-state kinetic analysis of Cu(I)-scFGE with H-scP14 (squares) and D2-scP14 (triangles) substrates. Initial rates were fit to the Michaelis-Menten equation by nonlinear regression. Kinetic parameters are reported in Table 3.3.

Table 3.3. Kinetic parameters of scFGE with H-scP14 and D2-scP14 substrates. Observed kinetic isotope effects (KIEs) are reported for kcat (kcat,H/kcat,D2) and kcat/KM ((kcat,H / KM,H)/( kcat,D2/KM,D2)). Kinetic parameters were measured in 25 mM TEAM buffer, pH 9.0 at 25 °C. Standard errors of the fit are listed.

-1 -1 -1 substrate kcat (min ) KM (µM ) kcat/KM (M s ) KIE (kcat) KIE (kcat/KM) H-scP14 16.2 ± 0.6 6.0 ± 0.8 (4.5 ± 0.6) x 104 3.0 ± 0.1 3.0 ± 1.3 4 D2-scP14 5.5 ± 0.1 6.1 ± 0.6 (1.5 ± 0.2) x 10

Two substrate isotopomers, IC(3,3-H2)TPARASLLTGQY (H-scP14) and IC(3,3- 2 H2)TPARASLLTGQY (D2-scP14), were synthesized (Figure 3.2). Substrates were subjected to steady-state kinetic analysis with wild-type Cu(I)-loaded scFGE under previously published 1 reaction conditions to determine kinetic parameters (Figure 3.8). H-scP14 and D2-scP14 were found to be identical in KM (within error) but differ by several fold in kcat. Concordance between the KM values is in agreement with negligible isotope-sensitivity in steps prior to the first

committed step. The KIE of kcat, (kcat,H/kcat,D2) = 3.0 ± 0.1, and the KIE of kcat/KM ((kcat,H / KM,H)/( kcat,D2/KM,D2)) = 3.0 ± 1.3 (Table 3.3). These values are smaller than KIEobs (kH/kD) = 3.7 for T. curvata FGE with monodeuterated substrate. This difference may reflect contribution from an inverse secondary KIE during H-atom abstraction due to the additional geminal deuterium label. It is notable that a more active mutant variant of T. curvata FGE exhibited a KIEobs of (kH/kD) = 3.0 with monodeuterated substrate.18 More likely then, is that the range of KIEs measured for

65

FGE originate from disparate contributions of isotope-insensitive steps to kobs between protein variants, as opposed to a change in the intrinsic KIE for H-atom abstraction.23 Regardless, the H- scP14/D2-scP14 KIEobs confirms the suitability of scFGE in general for pursuing the detection of copper-oxygen intermediates responsible for C–H activation, and this substrate pair in particular for evaluating the authenticity of candidate species.

Detection of an intermediate by stopped-flow UV/vis spectroscopy Initiation and monitoring of enzymatic reactions by stopped-flow UV/vis absorption spectroscopy was chosen as an initial strategy to search for transient species during catalysis. The following concerns were taken into account in designing a two-syringe mixing experiment that would maximize the chance of success. First, XAS data confirms the formation of a 3- coordinate Cu(I)-scFGE:scP14 complex and its stability in the absence of O2. This suggests that the complex may be pre-activated for O2 binding and activation. Therefore single-turnover reactions containing anaerobically pre-formed Cu(I)-scFGE:scP14 and initiated by mixing with excess O2 were selected. The scP14 substrates (Figure 3.2) were suitable for this study with the advantage that they are silent in the visible region as well as in the near-UV region above 280 nm. Stopped-flow spectra of Cu(I)-scFGE:H-scP14 (syringe 1, anaerobic) and O2-saturated buffer (syringe 2) were monitored from 7 ms to 18 minutes. Strikingly, a narrow absorbance feature at 420 nm grows immediately and fully decays by 20 seconds, reaching an intensity maximum at ~1 s. Two other broad absorbance features at 330 nm and 560 nm begin to grow shortly afterwards, but do not decay over the entire time course (Figure 3.9A). Control reactions with Cu(I)-scFGE or H-scP14 (syringe 1, anaerobic), and O2-saturated buffer (syringe 2), completely lacked the 420 nm absorbance feature (Figure 3.9B & C). Therefore, the 420 nm transient was selected as a species of interest.

A B

66

C

Figure 3.9. Detection of an intermediate by stopped-flow UV/vis absorption spectroscopy. (A) Stopped-flow spectra were recorded upon mixing anaerobically pre-formed Cu(I)-scFGE:H- scP14 (100 µM each, final) with O2-saturated buffer (~660 µM, final). The first phase displays growth of a 420 nm absorbance feature starting from 7 ms (black) and reaching a maximum at 0.8 s (red). Spectra collected between 7 ms to 0.8 s are shown in light gray. The second phase displays decay of the 420 nm feature but growth of a poorly resolved species with features at 330 nm and 560 nm, reaching a maximum at 30 s (blue). Spectra collected between 0.8 s to 30 s are shown in light blue. Control spectra were recorded by mixing (B) Cu(I)-scFGE + O2 or, (C) scP14 + O2. All stopped-flow reactions were done in 50 mM Tris pH 9.0, 0.5 M NaCl, 10% glycerol at 4 °C.

In order to test whether the 420 nm species corresponded to a catalytic intermediate, an equivalent stopped-flow UV/vis experiment was performed by mixing Cu(I)-scFGE:D2-scP14 with O2-saturated buffer. The spectra of H-scP14 and D2-scP14 reactions were qualitatively identical with respect to the absorbance bands present. However, the kinetics of the 420 nm species were altered. At present, the absence of additional knowledge, such as the molar absorptivity of this species and the simultaneous rate of product formation, prevents construction of a speciation plot to accurately inform a kinetic mechanism. Instead, the normalized 420 nm absorbance traces (Figure 3.10) were fit to a double-exponential containing growth and decay terms (Equation 3) using nonlinear regression.

(3) Y = A1-e-k1t + Ae-k2t + C

67

Figure 3.10. Kinetic isotope effect observed for decay of the 420 nm intermediate. Kinetic traces at 420 nm for H-scP14 and D2-scP14, recorded during analogous stopped-flow UV/vis experiments.

This yielded observed first-order rate constants for formation (k1) and decay (k2) of the 420 nm -1 -1 -1 -1 species for H-scP14 (k1,H = 4.4 s , k2,H = 0.23 s ) and D2-scP14 (k1,D2 = 4.0 s , k2,D2 = 0.05 s ) -1 -1 substrates. The concordance of k2,H (0.23 s ) and kcat (0.27 s ) for H-scP14 is intriguing, and implies that decay of the 420 nm species may accompany turnover. Also from these apparent rate constants, observed KIEs can be calculated for growth (k1,H/k1,D2) = 1.1 and decay (k2,H/k2,D2) = 4.6. The susceptibility of the 420 nm decay rate exclusively to a normal primary KIE can be compared to (kcat,H/kcat,D2) = 3.0. As alluded to prior, the KIE of kcat is a lower limit for the intrinsic KIE of the isotope-sensitive chemical step. Therefore (k2,H/k2,D2) may more closely reflect the intrinsic KIE for H-atom abstraction. Together, these data directly implicate the 420 nm species as an intermediate occurring at, or prior to, the rate-limiting C–H activation step of catalysis.

Single-turnover kinetic analysis of FGE reactions Though the sensitivity of the apparent decay rate of the 420 nm intermediate to deuterium substitution strongly suggests its legitimacy as an intermediate occurring during catalysis, its behavior must also comply with the kinetics of product formation in an identical time regime. It is not possible to monitor the concentration of the fGly product optically during a stopped-flow time course. Instead, single-turnover FGE reactions were performed using rapid-mixing followed by chemical quench.

68

Figure 3.11. Single-turnover kinetic analysis of FGE substantiates the 420 nm intermediate. Anaerobic pre-formed Cu(I)-scFGE:H-scP14 complex (final concentration of 375 µM) was combined with O2-saturated buffer (final concentration of ~1.1 mM) using a rapid-mixing instrument at 4 °C, pH 9.0. Time-point samples from 36 ms to 14 s were quenched by ejection into a vial containing HCl, and analyzed for product formation by HPLC. Graph shows overlay of product formation (black circles), and stopped-flow UV/vis kinetic trace at 420 nm (red trace) collected during parallel time courses. Product formation data was fit to a double-exponential (black trace). Errors shown are the standard deviation of 2 or more replicates. A mixing strategy analogous to the stopped-flow experiments was used. Cu(I)-scFGE and H- scP14 were combined anaerobically, and single-turnover reactions were initiated by rapid- mixing of this complex with O2-saturated buffer. Following a mixing period of 36 ms to 14 s, reactions were chemically quenched by ejection in HCl (Figure 3.11). It is noted that the magnitude of the quenching dead time is unknown, and may contribute to systematic error at early time points. The rate of product (fGly) formation was fit to a double-exponential (Equation 4) using nonlinear regression.

(4) Y = A1-e-λ1t + B(1-e-λ2t) + C

-1 This yielded observed first-order rate constants of λ1 = 1.8 s for the first phase (35 ms to 1 s), -1 and λ2 = 0.03 s for the second phase (1 s to 14 s). Caution is taken in interpreting the meaning of these apparent rate constants with respect to a kinetic mechanism in the absence of additional data detailing their concentration dependence. However, considering that a ratio of 1:1 Cu(I)- scFGE:scP14 was present in reaction mixtures, the following description is plausible. First, owing to a significant but unknown fraction of pre-formed Cu(I)-scFGE:scP14 complex when combined with O2, the absolute rate of the first phase is fast and corresponds to the rate of association of O2 with Cu(I)-scFGE:scP14 in addition to the rate of the chemical step of Cu(I)- -1 scFGE:scP14:O2 conversion to product. A value of λ1 = 1.8 s (4 °C) is greater than the expected -1 24 lower limit for this rate constant (kcat = 0.27 s , 25 °C). Once the fraction of pre-formed

69

complex is consumed by reaction with O2, the absolute rate of the second phase is slow. This may instead reflect a rate-limiting association step of remaining free Cu(I)-scFGE with scP14 prior to H-atom abstraction. A simplified kinetic mechanism (Scheme 3.2) was considered to describe these events.

Scheme 3.2. Hypothetical kinetic mechanism of FGE. Critically, single-turnover kinetic data also provides an essential test of the catalytic relevance of the 420 nm intermediate. The 420 nm stopped-flow kinetic trace was overlaid with the single- turnover progress curve during an identical time scale (Figure 3.11). Though an absolutely quantitative comparison between the stopped-flow data and rapid chemical quench data is limited by the inherent experimental error in collecting the latter data at early time points, and on a separate instrument, a semi-quantitative comparison is highly informative. Specifically, in agreement with the prediction that the 420 nm species is a catalytic intermediate, its growth (k1,H -1 -1 = 4.4 s ) proceeds at a higher rate than that of product formation (λ1 = 1.8 s ) during the first phase of the single-turnover reaction. This is interpreted to mean that during the initial phase of the reaction, fast formation of the intermediate in turn leads to a maximum turnover rate. More broadly, single-turnover kinetic analysis of FGE reactions fully supports the 420 nm transient species as a bona fide catalytic intermediate.

Discussion

FGE exploits an inner-sphere substrate binding strategy to control catalysis A major finding revealed by this study is the spectroscopic identification of a catalytically competent 3-coordinate Cu(I)-FGE:substrate complex. Perhaps by virtue of this binding configuration, selenocysteine was discovered to be a promiscuous substrate of FGE, likewise -2 converted to fGly, but at a dramatically reduced rate (krel ~10 ). The requirement for this complex, as well as the absence of an active-site general base, also explains a substrate-specific pH-rate profile for FGE. Tight-coupling of O2-activation and turnover are suggested by the inertness of the resting enzyme to O2 in the absence of substrate, illustrated by stopped-flow UV/vis. As with other mononuclear copper oxidases/oxygenases,25 FGE appears to enforce substrate-dependent O2 reactivity. Unlike the noncoupled binuclear copper oxygenases (e.g. PHM, DβM, TβM),26 or the mononuclear copper lytic polysaccharide monooxygenase (LPMO),27 the FGE resting enzyme contains a stable Cu(I) center and is in fact unstable in the cupric form.

Together, these experimental observations inspire the formulation of a general paradigm for catalysis at this structurally and functionally singular copper center. First, the stability of the cuprous enzyme is a function of the significant stabilization of Cu(I) by strong σ overlap, provided by the character and geometry of the bis(thiolate) ligands in a linear 2-coordinate complex.28 However, the copper complex undergoes a structural change through inner-sphere

70

bonding of the cysteinyl thiolate from substrate, which is initially positioned in the vicinity of the copper center by noncovalent interactions between FGE and the surrounding substrate peptide sequence motif. This bonding interaction is favorable owing to the soft nature of both Cu(I) and the third thiolate ligand. The 2- to 3-coordinate transition increases the negative charge of the complex from -1 to -2 and adjusts the geometry of the Cys272/Cys277 protein ligands to accommodate a trigonal planar enzyme-substrate complex. Instead of the requirement for a Cu(II) to Cu(I) redox process to initiate catalysis in canonical monocopper oxygenases, FGE uses a coordination/structural rearrangement instead. Thus, it is asserted that this rearrangement primes the cuprous center for O2 binding and activation by significantly decreasing the Cu(I)/Cu(II) reduction potential in the 3-coordinate intermediate. This is consistent with the known reactivity of synthetic 3-coordinate cuprous complexes,29 and it is expected to be further enhanced for FGE by the preponderance of negative charge conferred by three thiolate ligands in particular. But, reactivity of the trigonal-planar complex is finely tuned, and the presence of a softer selenolate ligand contributed by the Sec substrate results in greater Cu(I) stabilization,30 and therefore a higher Cu(I)/Cu(II) reduction potential, compared to the thiolate provided by the native Cys substrate. It is predicted that direct determination of the 1e– reduction potentials of each FGE copper complex described herein will substantiate this reactivity paradigm.

The nature of the catalytic intermediate The identification of the 420 nm transient represents an exciting breakthrough in the study of mononuclear copper O2-activating enzymes. This species was kinetically validated to correspond to an intermediate during FGE catalysis occurring at, or prior to, the rate-limiting H-atom abstraction step. Absent an unlikely Fenton-like O2-activation mechanism, only the Cu(II)- superoxo, Cu(II)-hydroperoxo, or Cu(II)-oxyl species constitute viable oxidants for C–H activation (Figure 3.1). However, compared to divergent copper oxidases/oxygenases, structural and biochemical data place stringent constraints on the types of copper-oxygen species accessible to FGE. Starting from the 3-coordinate cuprous complex, only a single electron is available for O2-activation prior to H-atom abstraction. Thus, the 420 nm intermediate is tentatively assigned as a Cu(II)-superoxo complex. This species is purported to have been identified in a crystal structure of peptidylglycine- hydroxylating monooxygenase (PHM) soaked in a slow-reacting substrate.31 Thus far, this crystal structure represents the only direct evidence for an enzymatic Cu(II)-superoxo species. Unfortunately, an in crystallo Cu(II)-superoxo can be assigned by electron density alone, and thus the PHM adduct cannot be subject to further study. Ultimately such an intermediate must be investigated using physical techniques which directly report on electronic structure, as with isolable Cu(II)-superoxo complexes identified in model compounds.32 A series of synthetic Cu(II)-superoxo complexes of particular relevance to FGE have been synthesized and studied.33 These model compounds utilize a tridentate nitrogen ligand scaffold to bind copper, and react with O2 to form a tetrahedral Cu(II)-superoxo intermediate capable of performing H-atom abstraction from aliphatic substrates. These complexes exhibit an intense (ε = 4200 M-1 cm-1) absorption band at 397 nm, comparable to the 420 nm intermediate of FGE. It is noted that the 420 nm intermediate appears to lack correlation with lower intensity d-d absorption bands, perhaps owing to FGE’s use of a distinct tris(thiolate) copper site. We predict that continued study will reveal in FGE the first Cu(II)-superoxo complex identified in an actively turning over enzyme, an immensely important milestone in the study of biological O2 catalysis.

71

A model of the FGE catalytic cycle A mechanistic proposal consistent with the results of this study is shown in Figure 3.12, and is largely corroborative of predictions provided by Seebeck and coworkers.2, 9 First, the prediction that the substrate binds directly at copper via inner-sphere coordination of the substrate thiolate is verified experimentally by our data. Furthermore, owing to the stable formation of the 3- coordinate Cu(I)-scFGE:substrate (2) complex under air-free conditions, a kinetic mechanism whereby substrate coordination precedes O2 binding and activation is favored. After the formation of (2), the presence of an open coordination site promotes O2 binding, which may or may not be reversible.34 H-atom abstraction of the pro-(R)-β-hydrogen from the substrate cysteine is the rate-determining step, occurring by proton-coupled electron transfer (PCET) from substrate to an activated oxygen species, the Cu(II)–OO·– intermediate (3). The favorability of this step would be enhanced by stabilization of a substrate radical adjacent to a sulfur bonded to copper.35 This would generate a Cu(II)–OOH with ligation of the substrate-centered radical (4). The breakdown of such an intermediate via homolytic cleavage of the S-Cu bond and recombination with the Cβ radical would generate a Cu(I)–OOH intermediate and a free substrate thioaldehyde (5). Thioaldehydes are predicted to be high energy intermediates which hydrolyze rapidly,36 resulting in the final fGly product. The specific mechanism and role of external reductants, historically DTT for in vitro reactions,20 remains unclear but must occur following product formation. Given that stoichiometric DTT oxidation has been measured during turnover 18 – of FGE, 4e must ultimately reduce O2 to H2O. Meury & Seebeck et. al. have proposed that the Cu(I)-OOH (5) intermediate is resolved by first transferring an oxidative equivalent to one of the enzyme’s sulfur ligands; the formation of an S-O intermediate would accompany a change in the ligand occupancy at Cu(I) and could be reduced directly by an external electron source.2 This sequence is plausible in that it is superficially reminiscent of the inactivation pathway of a Met to Cys ligand point mutant studied in tyramine β-monooxygenase, and it also provides a chemical rationale for the specific requirement of thiol-containing reducing agents.37 This proposal may be verified experimentally by chemical trapping or by spectroscopic evidence of such a complex using the EXAFS methods detailed in this report. The direct 2e– reduction of the distal O-atom of the Cu(I)-OOH from a soluble carrier seems to offer a simpler alternative, but assumes that a sufficiently high reduction potential at copper forestalls a detrimental Cu(II)/·OH forming pathway.

72

Figure 3.12. Revised mechanistic proposal for FGE.

The requirement of a novel active site for aerobic fGly production Though our model neatly rationalizes the behavior of the FGE active site with respect to reactivity, it also raises a broader question: why does FGE utilize such an unprecedented mononuclear copper center to catalyze oxidase chemistry? To our knowledge, no other monocopper enzyme binds substrate and a reactive O2 intermediate simultaneously at the metal center, as FGE appears to do. For instance, galactose oxidase catalyzes the 2e–oxidation of D- galactose to D-galacto-hexodialdose directly from a Cu(II)/cofactor radical pair through an inner-sphere complex bound at the substrate hydroxyl, and only then reacts with O2 as a terminal electron acceptor to regenerate the oxidized resting state.38 But, in the vicinity of substrate, lytic polysaccharide monooxygenase (LPMO) first undergoes reduction to Cu(I), then binds and – activates O2 to form a predicted Cu(II)–OO· intermediate. At this stage, C–H abstraction from

73

the anomeric carbon of the polysaccharide substrate proceeds without direct ligation of substrate copper.27 By virtue of the unique susceptibility of FGE’s thiol substrate to overoxidation,39 we assert that an intermediate such as the 4-coordinate complex (3) is required for a controlled reaction pathway to produce fGly at a Cu/O2 site. The H-atom abstraction step illustrates especially why inner-sphere coordination of the substrate cysteine is crucial in two ways. Firstly, the rate of PCET to remove the first e– from substrate is dependent on donor-acceptor wavefunction overlap, which is highly sensitive to distance and relative orientation.40 Secondly, were the intermediate (4) to contain a carbon-centered radical adjacent to a thiol/thiolate whose reactivity was not suppressed by ligation to copper, radical migration to sulfur could make the intermediate susceptible to damaging off-pathway reactions.41 This is a concern made more relevant by FGE’s shallow, solvent-exposed active site. Thus, constraining the substrate by coordination at the metal site ensures optimum donor-acceptor distance and also prevents the quenching of substrate radical prior to the second oxidation step. Taken together, these factors may explain why FGE has coopted a Cu(I) “binding-like” site to do catalysis. Briefly, a corollary to the mechanism of isopenicillin N synthase (IPNS), a non-heme iron(II)-dependent oxidase, suggests the generalizability of this strategy for biological alpha-thiol H-atom abstraction. As an intermediate during the oxidative cyclization of a tripeptide precursor to isopenicillin, IPNS forms an analogous thioaldehyde at cysteine via two 1e– transfer steps from an inner-sphere ligated Fe- thiolate complex. The thiolate ligand also affords a unique reaction pathway to IPNS, making – favorable the 1e reduction of O2 at Fe(II), resulting in H-atom abstraction from an Fe(III)- superoxo.42-43 It is noted that these principles apply to conversion of Cys to fGly only in FGE, involving O2-activation at a copper center. Radical SAM-type fGly producing enzymes in anaerobic organisms operate on a separate energetic and mechanistic trajectory compatible with both serine and cysteine substrates.44 Future studies It is of the utmost importance to obtain a spectroscopic description of the 420 nm intermediate to validate our mechanistic proposal. Thus, it is proposed that the rapid-mixing approach, described herein to assay turnover, will likewise permit the isolation of frozen samples containing significant quantities of the intermediate. Data collected by EPR33 and resonance Raman spectroscopy45 will be used to assess whether the spin state and O–O stretching frequency of these samples truly support the assignment of a Cu(II)-superoxo species. That being the case, the formation of a Cu(II)- then Cu(I)-hydroperoxo complex is made quite likely by inference. The reduction pathway for the Cu(I)-hydroperoxo intermediate is then open for interrogation by the methods developed in this study. For instance, the coordination environment of the cuprous enzyme following a single turnover, and in absence of reducing agents, can be investigated using EXAFS. This data will determine whether the Cu(I)-hydroperoxo is sufficiently stable to await direct reduction by a soluble 2e– carrier, or instead transfers an oxidative equivalent to a cysteine ligand. The apparent suitability of FGE for interrogation by these approaches is stunning and fortuitous; it is expected that FGE will provide a provocative model system for also understanding the intrigues of analogous copper enzymes.

74

Materials and Methods

Molecular biology and protein purification A Streptomyces coelicolor FGE (scFGE) construct was previously cloned into the pET151 expression vector.46 Expression and purification of FGE were performed as follows. Single clones of Escherichia coli BL21 DE3 cells (New England Biolabs) transformed with pET151- scFGE were used to inoculate 250 mL of LB with ampicillin and grown for 16 hours at 37 ºC. Terrific broth containing ampicillin and glycerol was inoculated at a ratio of 40:1 with overnight culture and grown to OD 0.4–0.6 at 37 ºC. Cultures were cooled to 18 ºC and induced with isopropyl-α-thiogalactoside at a final concentration of 0.1-1 mM. Following 18-20 hours of growth, cells were harvested by centrifugation at 5000 x g for 30 minutes at 4 ºC. Cell pellets were resuspended with lysis buffer (0.5 M NaCl, 50 mM Tris-HCl pH 8.0, 25 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine hydrochloride, 10% glycerol (v/v)) at approximately 5 mL per 1 g of cell pellet, 1:10000 Pierce Universal Nuclease (v/v) (Thermo Fisher Scientific), and with 1 EDTA-free protease inhibitor cocktail tablet (Roche) per 50 mL of resuspension. The cell suspension was first homogenized to a single-cell suspension with a Dounce homogenizer. Then cell lysis was accomplished by 3 passes through an Emulsiflex C3 homogenizer (Avestin) at 15 kPa. Lysate was cleared by centrifugation at 20,000 x g for 45 min at 4 ºC. The supernatant was applied to a 5 mL HisTrap HP Ni-NTA column (GE Life Science) connected to an Akta Pure FPLC (GE Healthcare Life Sciences). Flow through was monitored by A280 and washed with lysis buffer until absorbance reached baseline. His6-tagged protein was eluted with a gradient of 0-50% Elution buffer (Lysis buffer containing 0.5 M imidazole) over 10 column volumes. The N-terminal His6 tag was removed from combined fractions by cleavage with Tobacco Etch Virus (TEV) protease at a ratio of 0.02-0.05 (w/w) to FGE overnight at 4 ºC, followed by TEV removal with a subtractive Ni-NTA column. TEV protease was isolated in-house using a published 47 procedure. His6-free FGE was concentrated and further purified by gel-filtration chromatography on a Superdex 75 HiLoad column (GE Healthcare Life Sciences) in storage buffer (50 mM Tris-HCl pH 7.5, 0.5 M NaCl, 1 mM dithiothreitol (DTT), 10% glycerol (v/v)). The concentration of purified proteins was determined by absorbance at 280 nm under denaturing conditions, using the method of Gill and von Hippel.48 Typical final purity was greater than or equal to 90% and the final yield was approximately 30 mg per 1 L of E. coli culture. Proteins were aliquoted, flash frozen with liquid N2, and stored at -80 ºC until use.

Synthetic procedures

Nα-Fmoc-(Se-benzyl)-L-selenocysteine was prepared for solid-phase peptide synthesis according to literature procedure with minor modifications.14 See 1H, 13C NMR in Appendix.

Nα-Fmoc-L-serine-OMe (1): Nα-Fmoc-L-serine (30.6 mmol) was dissolved in 120 mL anhydrous DMF under N2, and 1.1 molar eq. of anhydrous N,N-diisopropylethylamine was added. The solution was cooled to 0 °C, and 1.1 molar eq. of methyl iodide was added dropwise over 20 min. The reaction was allowed to warm to 25 °C and stirred overnight. Reaction (120 mL) was transferred to a separatory funnel containing 1% HCl (200 mL) and EtOAc (160 mL). The product was extracted with EtOAc (2 x 80 mL) and dried over MgCl2, and the solvent

75

removed under vacuum. 1 was obtained with 71% yield as a yellow solid and used without further purification.

1 H NMR (400 MHz, CDCl3) δ 7.81 – 7.74 (m, 2H), 7.65 – 7.57 (m, 2H), 7.45 – 7.37 (m, 2H), 7.32 (tt, J = 7.5, 1.3 Hz, 2H), 5.72 (d, J = 7.7 Hz, 1H), 4.43 (d, J = 7.6 Hz, 3H), 4.23 (t, J = 6.8 13 Hz, 1H), 4.06 – 3.88 (m, 2H), 3.80 (s, 3H), 2.17 (s, 1H); C NMR (101 MHz, CDCl3) δ 127.90, 120.15 (partial spectra).

Nα-Fmoc-(O-tosyl)-L-serine-OMe (2): 1 (20.3 mmol) was combined in a reaction vessel with 3 molar eq. of p-toluenesulfonyl chloride and dissolved in 50 mL anhydrous pyridine under N2. Reaction (50 mL) was stirred at 0 °C for 8 hours, then transferred to a separatory funnel containing Et2O (200 mL). The organic phase was washed with H2O (2 x 200 mL) and 1 % HCl (2 x 200 mL) and dried over MgSO4. The crude product was concentrated under vacuum and purified by flash chromatography on silica gel (0.3:1 EtOAc/hexanes to 0.5:1 EtOAc/hexanes) to yield 2 (59%).

1 H NMR (400 MHz, CDCl3) δ 7.77 (dd, J = 10.3, 7.8 Hz, 5H), 7.59 (t, J = 7.9 Hz, 2H), 7.45 – 7.38 (m, 3H), 7.37 – 7.30 (m, 3H), 7.29 (s, 1H), 7.27 (s, 1H), 5.64 (d, J = 7.9 Hz, 1H), 4.58 (dt, J = 7.9, 3.1 Hz, 1H), 4.44 (dd, J = 10.3, 3.0 Hz, 1H), 4.39 – 4.32 (m, 2H), 4.27 (dd, J = 10.4, 7.3 13 Hz, 1H), 4.19 (t, J = 7.3 Hz, 1H), 3.74 (s, 3H), 2.36 (s, 3H); C NMR (101 MHz, CDCl3) δ 155.68, 145.41, 143.84, 143.72, 141.43, 132.37, 130.10, 128.16, 127.97, 127.29, 125.35, 125.27, 120.17, 69.16, 67.65, 53.54, 53.25, 47.11, 21.74.

Nα-Fmoc-(Se-benzyl)-L-selenocysteine-OMe (3): Dibenzyl diselenide (9.9 mmol) was dissolved in 15 mL THF under N2 and cooled to 0 °C. 1.2 molar eq. of LiBEt3H in 1.0 M THF was added dropwise over 15 min, then stirred for an additional 15 min at 0 °C. By cannula, 0.67 molar eq. of 2 in 15 mL anhydrous THF was transferred to the reaction vessel and stirred for 2 hours at 0 °C. The reaction was quenched by dropwise addition of acetic acid at 0 °C until no further bubbling was observed, then transferred to a separatory funnel with H2O (100 mL) and EtOAc (150 mL). The organic phase was washed with 0.5% HCl (2 x 100 mL), dried over MgSO4, and concentrated under vacuum. The crude product was purified by flash chromatography on silica gel (0.2:1 EtOAc/hexanes, isocractic) to yield 3 (92%).

1 H NMR (400 MHz, CDCl3) δ 7.75 (dd, J = 7.6, 2.6 Hz, 2H), 7.59 (dd, J = 7.5, 4.5 Hz, 2H), 7.39 (td, J = 7.5, 3.8 Hz, 2H), 7.32 (dd, J = 2.9, 1.2 Hz, 1H), 7.30 (t, J = 1.9 Hz, 1H), 7.28 (t, J = 1.5 Hz, 1H), 7.21 – 7.17 (m, 1H), 5.53 (d, J = 8.1 Hz, 1H), 4.66 (dt, J = 8.0, 5.1 Hz, 1H), 4.40 (d, J = 7.1 Hz, 2H), 4.22 (t, J = 7.1 Hz, 1H), 3.76 (s, 2H), 3.74 (s, 3H), 2.98 – 2.87 (m, 2H); 13C NMR (101 MHz, CDCl3) δ 143.81, 141.44, 129.04, 128.77, 127.88, 127.23, 127.16, 125.22, 120.15, 67.28, 53.89, 52.85, 47.23, 28.12, 25.89.

Nα-Fmoc-(Se-benzyl)-L-selenocysteine (4): 3 (1.8 mmol) was dissolved in 1,2-dichloroethane (12 mL), 2 molar eq. of trimethyltin hydroxide was added, and the reaction was allowed to reflux for 2 hours under N2. The crude product was purified by flash chromatography on silica gel (0.3:1:0.01 EtOAc/hexanes/AcOH to 0.35:1:0.01 EtOAc/hexanes/AcOH) to yield 4 as a white solid (74%).

1 H NMR (400 MHz, CDCl3): δ 7.74 (dd, J = 7.6, 2.9 Hz, 2H), 7.59 (d, J = 7.1 Hz, 2H), 7.38 (td, J = 7.2, 3.9 Hz, 2H), 7.32 – 7.27 (m, 2H), 7.25 (d, J = 5.6 Hz, 3H), 7.21 – 7.17 (m, 1H), 5.51 (d, J = 7.9 Hz, 1H), 4.66 (q, J = 5.9 Hz, 1H), 4.40 (d, J = 7.0 Hz, 2H), 4.21 (t, J = 7.1 Hz, 1H), 3.78

76

13 (s, 2H), 2.95 (d, J = 5.1 Hz, 2H); C NMR (101 MHz, CDCl3) δ 177.29, 141.44, 129.06, 128.80, 127.91, 127.25, 125.21, 120.16, 67.40, 47.21, 28.28, 20.86; MS (ESI): Calculated [M+H]+ = 482.0871, found 482.1. Peptide synthesis: Peptide substrates were synthesized manually on Fmoc-Rink Amide MBHA low loading resin (EMD Millipore) using commercially available Nα-Fmoc protected amino acid monomers, or the Nα-Fmoc-(Se-benzyl)-L-selenocysteine (Fmoc-Sec(bzl)-OH) monomer. A 3,3- D2-Fmoc-L-cys(S-trityl)-OH monomer with an isotopic enrichment of 99.6% (Cambridge Isotope Laboratories) was used to synthesize D2-scP14. Peptide coupling reactions used the reagent COMU in DMF.49 A 2-10 fold molar excess of amino acid monomer was used in each coupling step with reaction times of 30 minutes up to overnight with agitation by N2 gas. Fmoc groups were deprotected by treatment with 20% piperidine in DMF for 15 min. All peptides used herein were amidated at the C-terminus, and the N-terminus was either acetylated with acetic anhydride or derivatized with the chromophore 2,4-dinitrobenzene by reaction with 1-fluoro-2,4- dinitrobenzene (Sanger’s reagent). Following peptide synthesis, peptides were cleaved from solid phase and side chains globally deprotected using a cocktail containing 94:2.5:2.5:1 TFA/H2O/ethane dithiol/triisopropylsilane for 3 h under N2. The cleavage mixture was filtered to remove resin, concentrated under vacuum, and triturated by dropwise addition to Et2O at 4 °C. Precipitated crude peptide was collected by centrifugation and washed twice more with diethyl ether prior to dissolution in H2O and lyophilization. Removal of the Sec-benzyl group required an additional two-step deprotection procedure,14 performed with modification. First, crude Sec(bzl)-containing peptides were treated with 20 molar eq. of 2,2′-dithiobis(5-nitropyridine) (DTNP) in 2% thioanisole/TFA. This cleanly removed the benzyl group, yielding the corresponding selenosulfide protected peptide. To reduce the selenosulfide to selenocysteine, Sec(TNP) peptides were treated with 1 M tris(2-carboxyethyl)phosphine hydrochloride immediately prior to HPLC purification. Crude peptides were purified using reverse-phase chromatography with a gradient of 20-30% MeCN/water + 0.1% TFA over 60 min on a Microsorb 100-5 C18 Dynamax 250 x 21.4 mm column (Agilent Technologies). Peptides were characterized by HPLC-UV/vis to be of greater than 98% purity.

Peptide characterization by high-resolution mass spectrometry All substrate peptides used herein were sequenced using liquid chromatography-electrospray ionization spectrometry. Purified peptides were diluted to 0.5 µM with 0.1 % formic acid. 1 uL (500 fmol) of each peptide was injected on an Acclaim PepMap RSLC C18 column (Thermo Fisher Scientific) using a Dionex UtiliMate 3000 HPLC (Thermo Fisher Scientific) at a flow rate of 0.3 µL/min, connected in-line to an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). Instrument method parameters were as follows: MS1 resolution: 60,000 at 400 m/z; scan range: 300−1500 m/z. The most abundant ions (top speed at n=3 seconds) were subjected to higher-energy collision induced dissociation (HCD), collision induced dissociation (CID), and electron transfer dissociation (ETD). HCD parameters: 30% normalized collision energy; CID parameters: activation q 0.25, 35% normalized collision energy; ETD parameters: calibrated ETD reaction times, AGC reagent target at 1e5. For all MS2s, isolation width was set at 2 m/z. Dynamic exclusion was enabled with a repeat count of 3, a repeat duration of 10 seconds, and an exclusion duration of 10 seconds. Data analysis was accomplished using Xcalibur software

77

(Thermo Fisher Scientific). All peptide sequences were manually validated; see annotated high- resolution mass spectra in Appendix.

Anaerobic reconstitution of FGE with Cu(I)

Solutions of enzyme and buffer were transferred into an anaerobic glovebox under N2 atmosphere to equilibrate overnight. To remove DTT and adjust pH, FGE was exchanged from storage buffer into 50 mM Tris-HCl pH 9.0, 0.5 M NaCl, 10% glycerol (v/v) using pre- equilibrated Bio-Spin 6 buffer exchange spin columns (Bio-Rad Laboratories). Exchanged protein was instead reconstituted at 4 °C in pH 9.0 buffer with 1:1 FGE:Cu(I) at a final + – concentration of 0.5 mM each using a freshly prepared 10 mM Cu(MeCN)4 PF6 stock in anhydrous MeCN. A second buffer exchange step removed unbound Cu(I) and MeCN. FGE was concentrated as necessary using 10 kDa molecular weight cutoff spin filters (EMD Millipore). This procedure yielded Cu(I)-scFGE with 98-99 mol % copper as measured by the colorimetric biquinoline assay. 50

Stopped-flow UV/vis spectroscopy Solutions of 1:1 Cu(I)-scFGE:substrate (ES) complex (200 µM each) were formed by mixing in the glovebox under N2 atmosphere in 50 mM Tris-HCl pH 9.0, 0.5 M NaCl, 10% glycerol (v/v) buffer. ES complex (Syringe 1) and O2-saturated pH 9.0 buffer (at 25 °C, approximately 1320 µM, Syringe 2) were loaded into gastight glass syringes (Hamilton) and fitted with sealed 3-way stopcocks to contain dissolved N2 and O2 respectively. Syringes 1 and 2 were transferred into a glovebox under argon atmosphere containing an SX-20 stopped-flow UV/vis spectrophotometer (Applied Photophysics), and mounted onto the instrument. UV/vis spectra were collected with a photodiode array detector following a mixing deadtime of approximately 5 ms. All stopped-flow UV/vis experiments were performed at 4 °C, maintained by a chilled recirculating bath.

Rapid freeze-quench sample preparation

In order to remove O2 from the rapid mixing instrument, the solvent lines were incubated with a solution of Na2S2O4 for 45 minutes. Excess Na2S2O4 was removed by flushing with gastight syringes of air-free 50 mM Tris-HCl pH 9.0, 0.5 M NaCl, 10% glycerol (v/v) buffer. Next, anaerobic Cu(I)-scFGE:scP14 complex (1:1, 750 µM each) and O2-saturated buffer (at 4 °C, approximately 2.19 mM) were prepared separately in pH 9.0 buffer, loaded into gastight syringes, and mounted on a computer-controlled RQF-3 rapid mixing instrument (Kintek Corporation). For single-turnover kinetic assays, syringes 1 and 2 were combined to react for a programmed mixing time and ejected into a vessel containing a quench solution of HCl at a final concentration of 100 mM. Product formation in acid-quenched reactions was measured by HPLC. All rapid mixing experiments were performed at 4 °C, maintained by a chilled recirculating bath.

78

Enzyme activity assays Peptide stock concentrations were measured by absorbance using published extinction -1 -1 - coefficients: H-scP14/D2-scP14, ε280 = 1280 M cm ; Cys-scP15/Sec-scP15, ε350 = 15800 M 1cm-1.51-52 All reactions were done in 1.5 mL tubes on a temperature controlled Thermomixer R (Eppendorf) with shaking at 1000 RPM. Steady-state activity assays were run as previously described at 25 °C and contained: 25 mM triethanolamine (TEAM) buffer pH 9.0, 2-90 µM H- scP14/D2-scP14 peptides, 2 mM DTT, and 0.1 µM Cu(I)-loaded scFGE. To determine pH-rate profile, activity was measured as the amount of product formed after 10 min in reactions containing: 1 µM scFGE, 100 µM substrate, 25 mM TEAM buffer pH 6.0-10.0, 2 mM DTT at 25 °C. All reactions were quenched to a final concentration of 100 mM HCl and 100 uL was injected on an Agilent 1200 HPLC connected in-line to a diode array detector. A C18 poroshell column (Agilent Technologies) was used to resolve product (fGly peptide) and starting material (Cys peptide) with a gradient of 0-50% MeCN/water + 0.1% TFA. Percent conversion was calculated by integrating product and starting material absorbance at 280 nm. For determination of kcat and KM, the linear initial reaction rates (an average of 5 time point aliquots done in at least duplicate) for each substrate concentration were fit to the Michaelis-Menten equation using SigmaPlot software (Systat): Vmax*[S] vo = KM + S Molecular docking and density functional theory computations The Maestro 11 modeling environment (Schrödinger) was used to generate a docked structure of scFGE containing Cu and a short peptide substrate (Ac-LCTPAR-NH2) with the Glide docking tool. Calculations on Cu(I)-scFGE were performed with the B3LYP functional and 6-311g supplemented with two sets of d functions on heavy atoms and 1 set of p functions on hydrogens as implemented in Gaussian 09. Optimizations were performed in water using the Gaussian keyword ‘scrf=(solvent=water)’ in the input line. The starting model was prepared using the crystal structure of Cu-scFGE. The optimization calculations included residues in the primary coordination sphere [Cys272, Cys277, and Cu] and secondary coordination sphere [Asp274, Arg279, Trp234, Ser269, Tyr276]. All residues were truncated at the Cα-carbons, and were capped with methyl groups to maintain the charge of the model. All second sphere residues were frozen in space, with the exception of the protons of Ser6269 and Tyr276. The two active site Cys residues were allowed to optimize, with the methyl caps fixed in space to mimic the constraints of the protein backbone. DFT calculations of the Cu(I)-scFGE:substrate docked complex were performed with the B3LYP functional and 6-311g(2d,p) as implemented in Gaussian 09. Optimizations were performed in water using the Gaussian keyword ‘scrf=(solvent=water)’ in the input line. The active site cysteine residues (Cys272, Cys277), copper atom, and 6-mer peptide substrate (Ac-LCTPAR-NH2) were included. The active site Cys residues were truncated at the Cα-carbons, which were then converted to methyl caps. One YH- atom from each of the methyl groups of the active site Cys residues were frozen in space to mimic the constraints of the protein backbone. All residues except the substrate Cys were frozen in space and not allowed to freely optimize.

79

X-ray absorption spectroscopy Solutions of Cu(I)-scFGE and Cu(I)-scFGE:scP14 were prepared anaerobically as described in text. The samples were loaded into delrin XAS cells with 38 μm Kapton windows, removed from the glovebox, and immediately frozen and stored in liquid N2 until analysis. The Cu K-edge X- ray absorption spectroscopy (XAS) was conducted at the Stanford Synchrotron Radiation Lightsource (SSRL) on the focused 16-pole, 2 T wiggler side-station beam line 9-3 under storage ring parameters of 3 GeV and ~500 mA. A Si(220) double-crystal monochromator was used for energy selection. A Rh-coated pre-monochromator mirror was used for harmonic rejection and vertical collimation, and a cylindrical Rh-coated post-monochromator mirror was used for beam focusing. The samples were maintained at a constant temperature of ~10 K during data collection using an Oxford Instruments CF 1208 liquid helium cryostat. A Canberra 100-element Ge monolith solid-state detector and soller slits equipped with a Ni filter were used to collect Cu Kα fluorescence data. Internal energy calibration was accomplished by simultaneous measurement of the transmission of a Cu foil placed between two ionization chambers situated after the sample. The first inflection point of the foil spectrum was assigned to 8980.3 eV. The extended X-ray absorption fine structure (EXAFS) data are reported to k = 12.8 Å-1 in order to avoid interference from the Zn K-edge. Photodamage was not observed during the course of data collection. The EXAFS data presented here are 5-scan average spectrum collected over three fresh sample positions and 12-scan average spectrum collected over six fresh sample positions for Cu(I)-FGE and Cu(I)-scFGE:scP14. Background subtraction and normalization of the data were performed using PySpline (Stanford Linear Accelerator Center). The data were processed by fitting a second-order polynomial to the pre-edge region and subtracting this from the entire spectrum as a background. A three-region polynomial spline of orders 2, 3, and 3 was used to model the smoothly decaying post-edge region. The data were normalized by scaling the spline function to an edge jump of 1.0 at 9000 eV. The EXAFS curve-fitting analysis program OPT in EXAFSPAK (George, G. N. Stanford Synchrotron Radiation Laboratory: Stanford, CA, 2000) was used to fit the EXAFS data. The theoretical phase and amplitude functions were generated by FEFF (version 7.0, University of Washington). A starting structural model was obtained from the crystal structure. During the fitting process, the bond distance (R) and the mean-square thermal and static deviation in R (σ2), which is related to the Debye-Waller factor, were allowed to vary. The threshold energy (E0), the point at which the photoelectron wave vector k is 0, was also allowed to vary but was constrained as a common value for all components in a given fit. 2 The amplitude reduction factor (S0 ) was fixed to a value of 1.0 and the coordination numbers (N) were systematically varied to achieve the best fit to the EXAFS data and their Fourier transforms (FTs). Use of the Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science and Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. The SSRL Structural Molecular Biology Program is supported by the DOE Office of Biological and Environmental Research, and by the National Institutes of Health, National Institute of General Medical Sciences (including P41GM103393). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of NIGMS or NIH.

80

References

1. Holder, P. G.; Jones, L. C.; Drake, P. M.; Barfield, R. M.; Bañas, S.; de Hart, G. W.; Baker, J.; Rabuka, D., Reconstitution of formylglycine-generating enzyme with copper(II) for aldehyde tag conversion. J. Biol. Chem. 2015, 290 (25), 15730-15745.

2. Meury, M.; Knop, M.; Seebeck, F. P., Structural basis for copper-oxygen mediated C-H bond activation by the formylglycine-generating enzyme. Angew. Chem. Int. Ed. Engl. 2017, 56 (28), 8115-8119.

3. Roeser, D.; Preusser-Kunze, A.; Schmidt, B.; Gasow, K.; Wittmann, J. G.; Dierks, T.; von Figura, K.; Rudolph, M. G., A general binding mechanism for all human sulfatases by the formylglycine-generating enzyme. Proc. Natl. Acad. Sci. U. S. A. 2006, 103 (1), 81-86.

4. Rush, J. S.; Bertozzi, C. R., New aldehyde tag sequences identified by screening formylglycine generating enzymes in vitro and in vivo. J. Am. Chem. Soc. 2008, 130 (37), 12240-12241.

5. O'Brien, P. J.; Herschlag, D., Catalytic promiscuity and the evolution of new enzymatic activities. Chem. Biol. 1999, 6 (4), R91-R105.

6. Hult, K.; Berglund, P., Enzyme promiscuity: mechanism and applications. Trends Biotechnol. 2007, 25 (5), 231-8.

7. Khersonsky, O.; Tawfik, D. S., Enzyme promiscuity: A mechanistic and evolutionary perspective. Annu. Rev. Biochem 2010, 79 (1), 471-505.

8. Appel, M. J.; Bertozzi, C. R., Formylglycine, a post-translationally generated residue with unique catalytic capabilities and biotechnology applications. ACS Chem. Biol. 2015, 10 (1), 72-84.

9. Knop, M.; Dang, T. Q.; Jeschke, G.; Seebeck, F. P., Copper is a cofactor of the formylglycine-generating enzyme. ChemBioChem 2017, 18 (2), 161-165.

10. Dierks, T.; Schmidt, B.; Borissenko, L. V.; Peng, J.; Preusser, A.; Mariappan, M.; von Figura, K., Multiple sulfatase deficiency is caused by mutations in the gene encoding the human Cα-formylglycine generating enzyme. Cell 2003, 113 (4), 435-444.

11. Reich, H. J.; Hondal, R. J., Why Nature Chose Selenium. ACS Chemical Biology 2016, 11 (4), 821-841.

12. Gonzalez-Flores Jonathan, N.; Shetty Sumangala, P.; Dubey, A.; Copeland Paul, R., The molecular biology of selenocysteine. In BioMolecular Concepts, 2013; Vol. 4, p 349.

81

13. Johansson, L.; Gafvelin, G.; Arnér, E. S. J., Selenocysteine in proteins—properties and biotechnological use. Biochimica et Biophysica Acta (BBA) - General Subjects 2005, 1726 (1), 1-13.

14. Schroll, A. L.; Hondal, R. J.; Flemer, S., The use of 2,2′-dithiobis(5-nitropyridine) (DTNP) for deprotection and diselenide formation in protected selenocysteine-containing peptides. J. Pept. Sci. 2012, 18 (3), 155-162.

15. Fey, J.; Balleininger, M.; Borissenko, L. V.; Schmidt, B.; von Figura, K.; Dierks, T., Characterization of posttranslational formylglycine formation by luminal components of the endoplasmic reticulum. J. Biol. Chem. 2001, 276 (50), 47021-8.

16. Huber, R. E.; Criddle, R. S., Comparison of the chemical properties of selenocysteine and selenocystine with their sulfur analogs. Arch. Biochem. Biophys. 1967, 122 (1), 164-73.

17. Shaked, Z. e.; Szajewski, R. P.; Whitesides, G. M., Rates of thiol-disulfide interchange reactions involving proteins and kinetic measurements of thiol pKa values. Biochemistry 1980, 19 (18), 4156-4166.

18. Knop, M.; Engi, P.; Lemnaru, R.; Seebeck, F. P., In vitro reconstitution of formylglycine- generating enzymes requires copper(I). ChemBioChem 2015, 16 (15), 2147-2150.

19. Grove, T. L.; Ahlum, J. H.; Qin, R. M.; Lanz, N. D.; Radle, M. I.; Krebs, C.; Booker, S. J., Further characterization of cys-type and ser-type anaerobic sulfatase maturating enzymes suggests a commonality in the mechanism of catalysis. Biochemistry 2013.

20. Peng, J.; Alam, S.; Radhakrishnan, K.; Mariappan, M.; Rudolph, M. G.; May, C.; Dierks, T.; von Figura, K.; Schmidt, B., Eukaryotic formylglycine-generating enzyme catalyses a monooxygenase type of reaction. FEBS J. 2015, 282 (17), 3262-3274.

21. Bollinger, J. M.; Krebs, C., Stalking intermediates in oxygen activation by iron enzymes: Motivation and method. J. Inorg. Biochem. 2006, 100 (4), 586-605.

22. Roston, D.; Islam, Z.; Kohen, A., Isotope effects as probes for enzyme catalyzed hydrogen-transfer reactions. Molecules 2013, 18 (5), 5543.

23. Francis, K.; Kohen, A., Standards for the reporting of kinetic isotope effects in enzymology. Perspectives in Science 2014, 1 (1), 110-120.

24. Johnson, K. A.; Begley, T. P., Transient state enzyme kinetics. In Wiley Encyclopedia of Chemical Biology, John Wiley & Sons, Inc.: 2007.

25. Klinman, J. P., The copper-enzyme family of dopamine β-monooxygenase and peptidylglycine α-hydroxylating monooxygenase: Resolving the chemical pathway for substrate hydroxylation. J. Biol. Chem. 2006, 281 (6), 3013-3016.

82

26. Solomon, E. I.; Heppner, D. E.; Johnston, E. M.; Ginsbach, J. W.; Cirera, J.; Qayyum, M.; Kieber-Emmons, M. T.; Kjaergaard, C. H.; Hadt, R. G.; Tian, L., Copper active sites in biology. Chem. Rev. 2014, 114 (7), 3659-853.

27. Meier, K. K.; Jones, S. M.; Kaper, T.; Hansson, H.; Koetsier, M. J.; Karkehabadi, S.; Solomon, E. I.; Sandgren, M.; Kelemen, B., Oxygen activation by Cu LPMOs in recalcitrant carbohydrate polysaccharide conversion to monomer sugars. Chem. Rev. 2017, Article ASAP.

28. Belle, C.; Rammal, W.; Pierre, J.-L., Sulfur ligation in copper enzymes and models. J. Inorg. Biochem. 2005, 99 (10), 1929-1936.

29. Sorrell, T. N.; Malachowski, M. R., Mononuclear three-coordinate copper(I) complexes: synthesis, structure, and reaction with carbon monoxide. Inorg. Chem. 1983, 22 (13), 1883-1887.

30. Shoshan, M. S.; Lehman, Y.; Goch, W.; Bal, W.; Tshuva, E. Y.; Metanis, N., Selenocysteine containing analogues of Atx1-based peptides protect cells from copper ion toxicity. Organic & Biomolecular Chemistry 2016, 14 (29), 6979-6984.

31. Prigge, S. T.; Eipper, B. A.; Mains, R. E.; Amzel, L. M., Dioxygen binds end-on to mononuclear copper in a precatalytic enzyme complex. Science 2004, 304 (5672), 864- 867.

32. Ginsbach, J. W.; Peterson, R. L.; Cowley, R. E.; Karlin, K. D.; Solomon, E. I., Correlation of the electronic and geometric structures in mononuclear copper(II) superoxide complexes. Inorg. Chem. 2013, 52 (22), 12872-12874.

33. Kunishita, A.; Kubo, M.; Sugimoto, H.; Ogura, T.; Sato, K.; Takui, T.; Itoh, S., Mononuclear copper(II)−superoxo complexes that mimic the structure and reactivity of the active centers of PHM and DβM. J. Am. Chem. Soc. 2009, 131 (8), 2788-2789.

34. Klinman, J. P., How do enzymes activate oxygen without inactivating themselves? Acc. Chem. Res. 2007, 40 (5), 325-333.

35. Frey, P. A.; Hegeman, A. D.; Reed, G. H., Free radical mechanisms in enzymology. Chem. Rev. 2006, 106 (8), 3302-3316.

36. Kida, Y.; Class, C. A.; Concepcion, A. J.; Timko, M. T.; Green, W. H., Combining experiment and theory to elucidate the role of supercritical water in sulfide decomposition. PCCP 2014, 16 (20), 9220-9228.

37. Osborne, R. L.; Zhu, H.; Iavarone, A. T.; Hess, C. R.; Klinman, J. P., Inactivation of Met471Cys tyramine β-monooxygenase results from site-specific cysteic acid formation. Biochemistry 2012, 51 (38), 7488-7495.

38. Whittaker, J. W., The radical chemistry of galactose oxidase. Arch. Biochem. Biophys. 2005, 433 (1), 227-239.

83

39. Poole, L. B., The basics of thiols and cysteines in redox biology and chemistry. Free Radical Biol. Med. 2015, 80 (Supplement C), 148-157.

40. Hammes-Schiffer, S., Proton-coupled electron transfer: Moving together and charging forward. J. Am. Chem. Soc. 2015, 137 (28), 8860-8871.

41. Vandeputte, A. G.; Reyniers, M.-F.; Marin, G. B., Kinetic modeling of hydrogen abstractions involving sulfur radicals. ChemPhysChem 2013, 14 (16), 3751-3771.

42. Randall, C. R.; Zang, Y.; True, A. E.; Que, L.; Charnock, J. M.; Garner, C. D.; Fujishima, Y.; Schofield, C. J.; Baldwin, J. E., X-ray absorption studies of the ferrous active site of isopenicillin N synthase and related model complexes. Biochemistry 1993, 32 (26), 6664- 6673.

43. Costas, M.; Mehn, M. P.; Jensen, M. P.; Que, L., Dioxygen activation at mononuclear nonheme iron active sites: Enzymes, models, and intermediates. Chem. Rev. 2004, 104 (2), 939-986.

44. Goldman, P. J.; Grove, T. L.; Sites, L. A.; McLaughlin, M. I.; Booker, S. J.; Drennan, C. L., X-ray structure of an AdoMet radical activase reveals an anaerobic solution for formylglycine posttranslational modification. Proc. Natl. Acad. Sci. U. S. A. 2013.

45. Woertink, J. S.; Tian, L.; Maiti, D.; Lucas, H. R.; Himes, R. A.; Karlin, K. D.; Neese, F.; Würtele, C.; Holthausen, M. C.; Bill, E.; Sundermeyer, J.; Schindler, S.; Solomon, E. I., Spectroscopic and computational studies of an end-on bound superoxo-Cu(II) complex: Geometric and electronic factors that determine the ground state. Inorg. Chem. 2010, 49 (20), 9450-9459.

46. Carlson, B. L.; Ballister, E. R.; Skordalakes, E.; King, D. S.; Breidenbach, M. A.; Gilmore, S. A.; Berger, J. M.; Bertozzi, C. R., Function and structure of a prokaryotic formylglycine-generating enzyme. J. Biol. Chem. 2008, 283 (29), 20117-20125.

47. Tropea, J. E.; Cherry, S.; Waugh, D. S., Expression and purification of soluble His(6)- tagged TEV protease. Methods in molecular biology (Clifton, N.J.) 2009, 498, 297-307.

48. Gill, S. C.; von Hippel, P. H., Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem. 1989, 182 (2), 319-26.

49. El-Faham, A.; Funosas, R. S.; Prohens, R.; Albericio, F., COMU: A safer and more effective replacement for benzotriazole-based uronium coupling reagents. Chemistry – A European Journal 2009, 15 (37), 9404-9416.

50. Felsenfeld, G., The determination of cuprous ion in copper proteins. Arch. Biochem. Biophys. 1960, 87 (2), 247-251.

51. Edelhoch, H., Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 1967, 6 (7), 1948-1954.

84

52. Sanger, F., The terminal peptides of insulin. Biochem. J. 1949, 45 (5), 563-574.

85

Appendix High-resolution mass spectra, and 1H, 13C NMR spectra

86

87

88

89

90

91

92

93

94

95

96

97

98