BIOSYNTHETIC ENGINEERING OF PLANTAZOLICIN NATURAL PRODUCTS

BY

CAITLIN D. DEANE

DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Chemistry in the Graduate College of the University of Illinois at Urbana-Champaign, 2016

Urbana, Illinois

Doctoral Committee:

Associate Professor Douglas A. Mitchell, Chair Professor Wilfred A. van der Donk Professor William W. Metcalf Professor Huimin Zhao

ABSTRACT Plantazolicin (PZN) is a ribosomally synthesized and post-translationally modified (RiPP) that exhibits extraordinarily narrow-spectrum antibacterial activity towards the causative agent of , Bacillus anthracis. During PZN , a cyclodehydratase catalyzes cyclization of , , and residues in the PZN precursor peptide (BamA) to azolines. Subsequently, a dehydrogenase then oxidizes most of these azolines to thiazoles and (methyl)oxazoles. The final biosynthetic steps consist of leader peptide removal and dimethylation of the nascent N-terminus. To simultaneously establish the structure−activity relationship of PZN and the substrate tolerance of the biosynthetic pathway, an Escherichia coli expression strain was engineered to heterologously produce PZN analogs. 72 variant BamA were screened by to assess post-translational modification and export by E. coli, from which 29 PZN variants were detected. The modifying were exquisitely selective, installing heterocycles only at predefined positions within the precursor peptides while leaving neighboring residues unmodified. Nearly all substitutions at positions normally harboring heterocycles prevented maturation of a PZN variant. No variants containing additional heterocycles were detected, although several peptide sequences yielded multiple PZN variants as a result of varying oxidation states of select residues. Eleven PZN variants were produced in sufficient quantity to facilitate purification and assessment of their antibacterial activity, providing insight into the structure- activity relationship of PZN. Using heterologously expressed and purified heterocycle synthetase, the BamA peptide was processed in vitro concordant with the pattern of post-translational modification found in the naturally occurring compound. Using a suite of BamA-derived peptides, including amino acid substitutions as well as contracted and expanded substrate variants, the substrate tolerance of the heterocycle synthetase was elucidated in vitro, and the residues crucial for leader peptide binding were identified. Despite increased promiscuity compared to what was previously observed in E. coli, the synthetase retained selectivity in cyclization of unnatural peptides only at positions which correspond to those cyclized in the natural product. A cleavage site was subsequently introduced to facilitate leader peptide removal, yielding mature PZN variants after enzymatic or chemical dimethylation. In addition, we report the isolation and characterization of two novel PZN-like natural products whose existence was predicted by genome sequencing.

ii

ACKNOWLEDGEMENTS This undertaking would not have been possible without the help and support of many people. First and foremost, thank you to my adviser, Doug Mitchell, for accepting me into his lab, supplying direction and encouragement, keeping me on track, sending me to conferences, and celebrating my (and others’) accomplishments. Thank you to my thesis committee for their guidance and support. Special thanks to Wilfred van der Donk for his leadership of the CBI program and for extremely valuable career assistance. In addition, thank you to the UIUC Department of Chemistry for generous financial support. Utmost thanks to the entire Mitchell lab, past and present, for assistance in all manner of activities ranging from chemistry to cat-sitting. Special thanks to all my co-authors for invaluable scientific expertise and experimental contributions. I literally could not have done this without you! To the undergraduates I mentored, Agnieszka Maniak and Alice Lin, thank you both for being such a pleasure to work with. Thanks to Kyle Dunbar, Courtney Cox, Joel Melby, and Katie Molohon for so ably paving the way for those who came after them. Thanks to Team PZN, Katie Molohon and Patricia Blair, for always making subgroups interesting. Thanks to Tucker Maxson for being there throughout this whole crazy process, starting with day one of our summer rotation. Extra-special thanks to Brandon Burkhart for innumerable contributions, both scientific and otherwise. Your willingness to celebrate or commiserate as needed made all that time in lab more enjoyable (especially on Saturdays). Thanks to all my teachers throughout the years, notably Matt Hall, Brian Seed, and Brian Gilbert, for their inspiration and help in getting me to where I am today. Finally, the greatest of thanks to Brett Bobysud, Alysia Painter, and my parents, Bob and Norma Deane, for their unflagging love, support, and encouragement, even when (perhaps especially when) they didn’t understand what exactly I was doing.

iii

To Dad

iv

TABLE OF CONTENTS

CHAPTER 1: Lessons learned from the transformation of natural product discovery to a genome-driven endeavor...... 1 1.1 Introduction...... 1 1.2 Lesson 1: Structure prediction...... 8 1.3 Lesson 2: Accurate sequence annotation...... 9 1.4 Lesson 3: Continued study of model organisms...... 11 1.5 Lesson 4: Avoiding genome size bias...... 11 1.6 Lesson 5: Genetic manipulation...... 12 1.7 Lesson 6: Heterologous expression...... 12 1.8 Lesson 7: Potential engineering of natural product analogs...... 13 1.9 Outlook...... 14 1.10 Conclusion...... 15 1.11 References...... 16

CHAPTER 2: Engineering unnatural variants of plantazolicin through codon reprogramming...... 28 2.1 Introduction...... 28 2.2 Heterologous production of PZN...... 31 2.3 Design and evaluation of BamA mutants...... 33 2.4 Substrate tolerance at heterocyclized positions...... 54 2.5 Substrate tolerance at non-cyclized positions...... 56 2.6 Pathway tolerance for variations in substrate length...... 58 2.7 Production of other PZN-class natural products...... 59 2.8 Antibacterial activity of PZN analogs...... 62 2.9 Methods...... 64 2.10 References...... 71

v

CHAPTER 3: In vitro biosynthesis and substrate tolerance of the plantazolicin family of natural products...... 75 3.1 Introduction...... 75 3.2 In vitro reconstitution of the PZN heterocycle synthetase...... 78 3.3 Substrate tolerance of the PZN synthetase in vitro...... 86 3.4 Leader peptide recognition and binding by the PZN synthetase...... 110 3.5 Production of novel PZN analogs...... 118 3.6 Methods...... 125 3.7 Sequences...... 136 3.8 References...... 137

APPENDIX A: HIV inhibitors block streptolysin S production...... 142 A.1 Introduction...... 142 A.2 Evidence for the role of SagE as a protease...... 145 A.3 Aspartyl protease inhibitors block SagA proteolysis...... 147 A.4 HIV protease inhibitors block SLS production...... 148 A.5 Structure-activity relationships...... 153 A.6 Biosynthesis inhibition of other TOMMs...... 160 A.7 Target validation studies...... 164 A.8 Discussion...... 168 A.9 Summary and Outlook...... 173 A.10 Methods...... 173 A.11 References...... 187

vi

CHAPTER 1: Lessons learned from the transformation of natural product discovery to a genome-driven endeavor1

1.1 Introduction Natural products have played a significant role in medicine, and fittingly, the discovery of novel natural products continues to hold great promise for the development of new drugs (1). Forward natural product discovery relies on the presence of an observable phenotype or chemical property, such as biological activity, color, or a known mass, which can be tracked through successive rounds of isolation. Successful “grind and find” isolation of a natural product can be followed by identification of the genes responsible for the biosynthesis of the compound of interest. This method was remarkably productive in its early years, but has been met with increasing frustration by researchers as the rediscovery of known compounds has become commonplace, as many of the easily accessible natural products have already been isolated (2). Meanwhile, the increasing ease of DNA sequencing has led to an explosion of fully sequenced genomes and identified biosynthetic pathways over the last 20 years. In many cases, the biosynthetic genes responsible for well-known and long-studied natural products have been identified only recently (Figure 1.1). Furthermore, the elucidation of biosynthetic pathways for natural products is by no means complete, as a number of important compounds (e.g. morphine, paclitaxel) still have major gaps in what is known of their biosynthesis (Figure 1.1). Biosynthetic genes for plant natural products are often particularly difficult to identify because plants do not commonly cluster the responsible genes as bacteria and fungi often do, and plant genomes are much larger than those of bacteria and fungi (3). That is not to say the task is easy in lower organisms, as history has shown us that characterization of bacterial and fungal pathways requires immense time and effort. A noteworthy example is the bottromycin family, whose biosynthetic gene cluster was identified by four independent groups in 2012, which was 55 years after the first report of its isolation (4-7). Likewise, the genes responsible for the production of the thiopeptide remained unknown until being reported upon nearly simultaneously by four independent research groups in 2009 (8-11). In the case of thiostrepton, a well-known thiopeptide, this report came 53 years after its initial isolation.

1Reprinted with permission from Deane, C. D., and Mitchell, D. A. (2014) J. Ind. Microbiol. Technol. 41, 315–331. Copyright 2014 Springer.

1

Figure 1.1. Timeline of traditional “forward” natural product discovery, highlighting the frequently decades-long gap between isolation of a novel natural product and identification of its biosynthetic genes. A date above the arrow indicates when isolation of the indicated natural product was first reported, with the date of structure elucidation (if different) in parentheses. A date below the arrow indicates when the biosynthetic genes were first identified for the natural product, with the years since isolation in parentheses. This figure is not meant to be exhaustive but instead to provide a representative sample of natural products spanning multiple biosynthetic classes, biological targets, and producing organisms. NRP, non-ribosomal peptide; RiPP, ribosomally synthesized and post-translationally modified peptide; *, majority of biosynthetic pathway remains unknown.

With the advent of inexpensive, massively parallel sequencing, a new route to discover natural products has emerged. The genomic revolution has fueled a shift in methods for identifying natural products from the traditional, phenotype-driven “forward” procedure to a genotype-driven, “reverse” discovery process, during which identification of the biosynthetic genes precedes and informs isolation of the natural product. With the availability of the of Streptomyces coelicolor and Streptomyces avermitilis genome sequences (12, 13), it became clear that even some highly- studied strains whose biosynthetic capabilities had appeared to be exhausted harbored a surprising number of previously unknown biosynthetic gene clusters. For example, only three natural product gene clusters had been identified on the S. coelicolor chromosome prior to the completion of its 2

genome sequence: those for actinorhodin (14, 15), prodiginine (16), and calcium-dependent (17). Genome sequencing of S. coelicolor revealed a number of “cryptic” gene clusters without known associated natural products (sometimes called orphan gene clusters). In total, S. coelicolor carries the potential for 29 structurally complex natural products (Figure 1.2) (13, 18). Many of these biosynthetic gene clusters remain cryptic to the present day, over a decade later, indicating that the biosynthetic capabilities of this and other organisms still have yet to be fully understood.

Figure 1.2. Schematic representation of the linear 8.7 Mb Streptomyces coelicolor A3(2) chromosome, highlighting the locations of all known natural product biosynthetic gene clusters. Names in brackets indicate putative compounds for which no predicted structure appears in the literature (18).

The number of complete bacterial genome sequences available in the National Center for Biotechnology Information (NCBI) database increased approximately 25-fold, to over 2500, between 2003 and 2013. With the ready availability of sequenced genomes, particularly those from bacteria, it is increasingly possible to identify putative biosynthetic genes, use their sequence to predict the structure and properties of the potential product, and use those predictions to guide efficient isolation and characterization. One interesting example of this is the case of bacillaene, whose structure eluded scientists due to its chemical instability (19). The genetic sequence of the bacillaene producer led to identification of the protein sequences responsible for its biosynthesis, which in turn enabled a structural prediction for bacillaene that guided purification based on the perceived physical properties. Ultimately, this procedure was successful, and the structure was

3

solved (20, 21). In addition, the prototypical reverse-discovered natural product, coelichelin had its structure predicted with a reasonably high degree of accuracy several years prior to its isolation (22, 23). Table 1.1 presents an extensive list of natural products that have been reverse discovered. For more detailed discussion of many of these natural products, we direct the reader to a number of reviews on this topic (24-26), in addition to those contained in the current issue of this journal. Many of the natural products listed in this table were discovered in 2008 or later, reflecting the exponential rise in available genome sequences. A number of these natural products are from well- studied model organisms (S. coelicolor, Aspergillus nidulans). Most are from bacteria, consistent with the larger number of sequenced bacterial genomes relative to fungi and plants.

Table 1.1. Reverse-discovered natural products, organized by year of reported isolation. In cases where more than one species is known to produce a particular natural product, only the strain from which the compound was first isolated is listed.

Name Class Producer Year isolated Genome Reference(s) (Year sequenced predicted, if different) hopene Polyketide Streptomyces 2000 2002 (13, 65) coelicolor A3(2) collinone Polyketide Streptomyces collinus 2001 n/a (66) DSM 2012 geosmin Terpenoid Streptomyces 2003 2002 (13, 67) coelicolor A3(2) desferrioxamine Other Streptomyces 2004 2002 (13, 68) coelicolor A3(2) halstoctacosanolides Polyketide Streptomyces 2004 n/a (69, 70) halstedii HC34 SapB RiPP Streptomyces 2004 2002 (13, 71) coelicolor A3(2) tetrahydroxy- Polyketide Streptomyces 2004 2002 (13, 72) naphthalene coelicolor A3(2) thalianol Terpenoid Arabidopsis thaliana 2004 2000 (73, 74) aurafurons Polyketide Stigmatella 2005 2011 (75, 76) aurantiaca DW4/3-1 coelichelin NRP Streptomyces 2005 (2000) 2002 (13, 22, 23) coelicolor A3(2) ECO-02301 Polyketide Streptomyces 2005 n/a (77) aizunensis NRRL B- 11277 myxochromides S NRP Stigmatella 2005 2011 (76, 78) aurantiaca DW4/3-1 aspoquinolones Alkaloid Aspergillus nidulans 2006 2005 (43, 79)

4

Table 1.1. (cont.) bacillaene Polyketide/NRP Bacillus 2006 2007 (20, 21, 45) amyloliquefaciens FZB42 DKxanthenes Polyketide/NRP Myxococcus xanthus 2006 2006 (80, 81) DK1050 ECO-501 Polyketide Amycolatopsis 2006 n/a (82) orientalis ATCC 43491 germicidins Polyketide Streptomyces 2006 2002 (13, 83) coelicolor A3(2) haloduracin RiPP Bacillus halodurans 2006 2000 (54, 84, 85) C-125 penocin A RiPP Pediococcus 2006 2006 (86, 87) pentosaceus ATCC 25745 terrequinone A NRP Aspergillus nidulans 2006 2005 (43, 88) trichamide RiPP Trichodesmium 2006 2006 (89) erythraeum IMS101 aeruginosides NRP Planktothrix agardhii 2007 Draft (90) CYA126/8 aspyridones Polyketide/NRP Aspergillus nidulans 2007 2005 (43, 47) CBS-40 Polyketide Streptomyces sp. 2007 n/a (91) CB2544 orfamide A NRP Pseudomonas 2007 2005 (92, 93) fluorescens Pf-5 aerucyclamide C RiPP Microcystis 2008 Draft (94) aeruginosa PCC7806 albaflavenone Terpenoid Streptomyces 2008 2002 (13, 95) coelicolor A3(2) capistruin RiPP Burkholderia 2008 2005 (35, 96) thailandensis E264 diazepinomicin (ECO- Other Micromonospora sp. 2008 n/a (97) 4601) emericellamide Polyketide/NRP Aspergillus nidulans 2008 2005 (43, 98) 2-methylisoborneol Terpenoid Streptomyces 2008 2002 (13, 99) coelicolor A3(2) microcyclamide RiPP Microcystis 2008 Draft (100) 7806A, B aeruginosa PCC7806 thailandamides Polyketide Burkholderia 2008 2005 (96, 101) thailandensis E264 vibi A-K RiPP Viola biflora 2008 n/a (102) asperfuranone Polyketide Aspergillus nidulans 2009 2005 (42, 43) atrochrysone Polyketide Aspergillus terreus 2009 Draft (103) emodin Polyketide Aspergillus nidulans 2009 2005 (43, 104) F9775 A/B (blecanoric Polyketide Aspergillus nidulans 2009 2005 (43, 104, acid) 105) lecanoric acid Polyketide Aspergillus nidulans 2009 2005 (43, 106) lichenicidin RiPP Bacillus licheniformis 2009 2004 (55, 107- 109) monodictyphenone Polyketide Aspergillus nidulans 2009 2005 (43, 104) Mra4, 5 RiPP Melicytus ramiflorus 2009 n/a (110)

5

Table 1.1. (cont.) nygerone A Polyketide/NRP Aspergillus niger 2009 2007 (111, 112) CBS 513.88 orsellinic acid Polyketide Aspergillus nidulans 2009 2005 (43, 105, 106) anacylamides RiPP Anabaena sp. 90 2010 2012 (113, 114) aspernidine Alkaloid Aspergillus nidulans 2010 2005 (43, 115) aureusimine NRP Staphylococcus 2010 2008 (27, 28, 46) aureus Bsa RiPP Staphylococcus 2010 2004 (116, 117) aureus coelimycin P1 Polyketide Streptomyces 2010 (2007) 2002 (13, 118, coelicolor A3(2) 119) csypyrone B1 Polyketide Aspergillus oryzae 2010 2005 (120, 121) diorcinol Polyketide Aspergillus nidulans 2010 2005 (43, 105) erythrochelin NRP Saccharopolyspora 2010 2007 (122-124) erythraea NRRL 2338 gerfelin Polyketide Aspergillus nidulans 2010 2005 (43, 105) Globa A, B RiPP Gloeospermum 2010 n/a (125) blakeanum microcin H47 RiPP Escherichia coli 2010 n/a (126) (various strains) microcin M RiPP Escherichia coli 2010 n/a (126) (various strains) microviridin L RiPP Microcystis 2010 2007 (127, 128) aeruginosa NIES843 pneumococcin RiPP Streptococcus 2010 2001 (129, 130) 6neumonia R6 prochlorosins RiPP Prochlorococcus 2010 2003 (40, 131) marinus MIT9313 TP-1161 RiPP Nocardiopsis sp. 2010 n/a (132) TFS65-07 venezuelin RiPP Streptomyces 2010 2011 (133, 134) venezuelae ATCC10712 austinol Polyketide/ Aspergillus nidulans 2011 2005 (43, 135) terpenoid desmethylbassianin A Polyketide Beauveria bassiana 2011 Draft (136) grisemycin RiPP Streptomyces griseus 2011 (2010) 2008 (137-139) IFO 13350 (iso)flavipucine Polyketide/NRP Aspergillus terreus 2011 Draft (140) koranimine NRP Bacillus spp. 2011 n/a (141) plantazolicin RiPP Bacillus 2011 (2008) 2007 (30, 31, 45) amyloliquefaciens FZB42 stambomycins Polyketide Streptomyces 2011 Draft (49) ambofaciens ATCC23877 thailandepsin Polyketide/NRP Burkholderia 2011 2005 (96, 142) thailandensis E264 alternariol Polyketide Aspergillus nidulans 2012 2005 (43, 143)

6

Table 1.1. (cont.) astexin-1 RiPP Asticcacaulis 2012 2010 (38) excentricus azanigerones Polyketide Aspergillus niger 2012 Draft (144) burkholderic acid Polyketide/NRP Burkholderia 2012 2005 (96, 145) thailandensis E264 catenulipeptin RiPP Catenulispora 2012 2009 (146, 147) acidiphila DSM 44928 cichorine Polyketide Aspergillus nidulans 2012 2005 (43, 143) curvopeptin RiPP Thermomonospora 2012 2011 (148, 149) curvata elgicin RiPP Paenibacillus elgii 2012 Draft (150) B69 fusarielins F-H Polyketide Gibberella zeae 2012 Draft (151) geobacillin RiPP Geobacillus 2012 2007 (152, 153) thermodenitrificans NG80-2 luminmycin Polyketide/NRP Photorhabdus 2012 2003 (154, 155) luminescens subsp. laumondii TT01 malleilactone Polyketide Burkholderia 2012 2005 (48, 96) thailandensis E264 O-methyldiaporthin Polyketide Aspergillus oryzae 2012 2005 (121, 156) pre-shamixanthone Polyketide Aspergillus nidulans 2012 2005 (43, 157) rhizopodin Polyketide/NRP Stigmatella 2012 Draft (158) aurantiaca Sg a15 caulosegnins RiPP Caulobacter segnis 2013 2010 (36) flavipeptin RiPP Kribbella flavida 2013 2010 (159, 160) flavopeptins NRP Streptomyces 2013 2011 (161) flavogriseus ATCC 33331 fumicycline Polyketide Aspergillus fumigatus 2013 2005 (162-164) A/neosartoricin A neosartoricins B-D Polyketide Trichophyton 2013 2012 (165, 166) tonsurans thailanstatins Polyketide/NRP Burkholderia 2013 Draft (167) thailandensis MSMB43

In this review, we focus on seven lessons for the community to keep in mind during the reverse-discovery of natural products: 1) structure prediction, 2) accurate annotation, 3) continued study of model organisms, 4) avoiding genome size bias, 5) genetic manipulation, 6) heterologous expression, and 7) potential engineering of analogs. To illustrate the utility of these lessons, we highlight five natural products from the recent literature (aureusimine, plantazolicin, astexin-1, the prochlorosins, and asperfuranone; Figure 1.3) representing three prominent natural product classes: polyketides, non-ribosomal peptides (NRPs), and ribosomally synthesized and post-

7

translationally modified peptides (RiPPs), as well as a variety of producing organisms (bacteria and fungi).

Figure 1.3. Biosynthetic gene clusters (A) and structures (B) of select reverse-discovered natural products. Gene clusters in panel A are shown to scale and are shown with the name of the organism in which they were first identified.

1.2 Lesson 1: Structure prediction Access to a predicted structure facilitated the isolation of the aureusimines, founding members of the pyrazinone class of natural products, by enabling mass spectrometry-guided isolation. These compounds are biosynthesized by a non-ribosomal peptide synthase (NRPS) pathway in the human pathogen Staphylococcus aureus (Figure 1.3A) (27, 28). The aureusimine biosynthetic gene cluster was discovered by genome mining to identify possible NRPSs that are highly conserved among sequenced S. aureus strains. Homologous gene clusters have been found in over 50 S. aureus strains as well as other human pathogenic Staphylococcus species (27). Prior to its isolation, the structure of aureusimine A was predicted based on the sequence of the NRPS gene and the previously established amino acid specificities for NRPS adenylation domains. The presence of a putative reductase domain at the C-terminus of the S. aureus NRPS was also factored

8

into the structural prediction, as it indicated that the dipeptide was likely released from the synthetase as an aldehyde with the potential to spontaneously cyclize. This prediction was confirmed with the solved structures of aureusimines A and B (Figure 1.3B) (27). Structural prediction and mass spectrometry-guided isolation also proved useful in the isolation of plantazolicin, a member of the thiazole/oxazole-modified microcin (TOMM) subclass of RiPP natural products (29). TOMM biosynthesis is characterized by the post-translational modification of ribosomally synthesized precursor peptides to generate thiazol(in)e and (methyl)oxazol(in)e heterocycles from the side chains of select cysteine, serine, and threonine residues (30). The gene cluster for plantazolicin (Figure 1.3A) was identified in 2008 during a search for genes with homology to the biosynthetic gene cluster responsible for the production of streptolysin S and microcin B17, which are TOMMs from Streptococcus pyogenes and Escherichia coli, respectively (30). Like other RiPPs, the sequence of a predicted precursor peptide gene enabled mass spectrometric-guided separation to isolate plantazolicin from organic surface extracts of modified strains of B. amyloliquefaciens FZB42 (31) and aided significantly in the elucidation of its structure (Figure 1.3B) (32, 33). A concluding example of the utility of structure prediction during natural product discovery is provided by coelichelin, a prominent early example of a reverse-discovered natural product, which had its structure predicted several years prior to its isolation from S. coelicolor (22). Although the initial structure was not entirely correct (23), the ability to predict structure based on genetic sequence is playing a growing role in reverse natural product discovery, whether by providing insight into possible extraction conditions or by simplifying identification by mass spectrometry.

1.3 Lesson 2: Accurate sequence annotation Regardless of how many genome sequences are available, they are useless without interpretation. Complete and accurate sequence annotation also play crucial roles in the efficient reverse discovery of natural products. RiPP precursor peptide genes, in particular, are typically short and are often unannotated in published genomes (29, 34), complicating their identification during genome mining. For example, the open reading frame (ORF) encoding the plantazolicin precursor peptide was unannotated at the time of the initial report (30). The same has also proven

9

to be the case for the precursors of several lasso peptides, an emerging subclass of RiPP natural products (35, 36). Despite the potential difficulty of identifying unannotated precursor genes, the discovery of the lasso peptide astexin-1 through a precursor-guided screen provides a promising avenue for future genome mining in the pursuit of novel RiPP natural products. Lasso peptides are characterized by the post-translational formation of a polypeptide loop structure with a covalent bond between the N-terminus of the precursor peptide and a specific aspartate or glutamate side chain elsewhere on the peptide. The C-terminal end of the precursor peptide is then threaded through the loop and often held in place by the side chains of bulky residues in the tail region (29). The resulting threaded structure typically confers unique heat stability on the lasso peptides (37). Genome mining for RiPP biosynthetic gene clusters commonly focuses on identifying homologs of a known modifying , as was the case for the lasso peptide capistruin (35), the prochlorosins, and plantazolicin. However, in the case of astexin-1, a genetic screen for precursor peptides was employed. This screen identified short ORFs, some unannotated, containing amino acid patterns consistent with the sequences of known lasso peptides. When a potential lasso peptide precursor was found, nearby regions of the genome were then searched for homology to the maturation enzymes (38). This endeavor resulted in the identification of 79 putative lasso peptide gene clusters distributed across nine bacterial phyla and one archaeal phylum (38). The identification of precursor peptide genes located outside of the primary gene cluster further underscores the advantage of searching widely for all possible biosynthetic genes during reverse natural product discovery (39). Such was the case for the prochlorosins, a group of lanthipeptide RiPPs produced by some marine cyanobacteria species, most notably Prochlorococcus marinus (40). The hallmark of lanthipeptide biosynthesis is the formation of (methyl)lanthionine rings between select and dehydroalanine or dehydrobutyrine residues, which arise from the dehydration of and , respectively. Dehydration and lanthionine ring installation on the ribosomal precursor peptide may be catalyzed by one or more biosynthetic enzymes (29). The gene clusters responsible for prochlorosin biosynthesis (Figure 1.3A) were identified in P. marinus and related species through a bioinformatic search for bifunctional lanthipeptide synthetase homologs (40) and, independently, through the homology of prochlorosin precursor peptides to the Nif11 nitrogen-fixing proteins (41). However, only seven

10

of these 29 putative precursor peptide genes are clustered with the lanthipeptide synthetase gene (40), contrary to the norm for biosynthetic genes in bacteria and fungi.

1.4 Lesson 3: Continued study of model organisms The sequencing of the S. coelicolor genome, which brought to light the silent majority of biosynthetic gene clusters, also provides a reminder not to discount continued investigation into well-studied model organisms during the search for new natural products. These strains may contain the genetic potential to synthesize many more natural products than are superficially detectable. The fungus Aspergillus nidulans (42) was a highly-studied model organism even prior to the sequencing of its genome, and since that time, it has been determined that A. nidulans FGSC A4 contains 24 PKS genes and 3 PKS-like genes (43, 44). Several pairs of PKS genes are adjacent to each other (Figure 1.3A), indicating a strong possibility that the pairs might work in concert, though A. nidulans was not previously known to produce any polyketides arising from the combined action of two PKS enzymes. The producer of plantazolicin, B. amyloliquefaciens FZB42, also produces a number of natural products (45), including the polyketide/NRP hybrid bacillaene, which likewise is a reverse-discovered natural product (20, 21).

1.5 Lesson 4: Avoiding genome size bias One common assumption in the natural products field has been that organisms with small genomes are unlikely to produce structurally complex natural products. Indeed, bacteria with larger genomes, such as S. coelicolor (8.7 Mb), are recognized as prolific producers of secondary metabolites (13). However, species with small genomes should not be ignored in the search for natural products, as exemplified by the isolation of structurally complex compounds from organisms whose genomes are a fraction the size of the traditionally recognized “natural product powerhouses”. These species with smaller genomes, such as S. aureus (2.8 Mb) (46), are presumed to be less able to devote genome space to large biosynthetic gene clusters such as NRPSs, making the identification of a single NRPS gene that occupies 0.25% of the S. aureus genome even more surprising (27). Maintenance of multiple precursor genes and the ability to modify them with a single enzyme enables organisms with compact genomes, such as prochlorosin producer P. marinus (2.4 Mb), to produce a variety of structurally complex molecules without the need for large biosynthetic

11

gene clusters. Indeed, the precursor hypervariability of the prochlorosin system endows P. marinus MIT9313 with the potential to produce as many natural products as S. coelicolor, which bears a genome nearly four times as large (13, 40).

1.6 Lesson 5: Genetic manipulation In many instances, natural products are difficult to detect under laboratory cultivation, due to low production levels or masking by the presence of other, more abundant, natural products. In such cases, genetic manipulation may be used to activate these silent gene clusters. The reverse discovery of the polyketide asperfuranone (Figure 1.3B) illustrates how silent gene clusters can be activated via promoter engineering within the native host. In this case, the native promoter for a predicted transcriptional activator gene within the PKS cluster (Figure 1.3A) was replaced with an inducible promoter (42). As asperfuranone was not produced to any significant extent during laboratory cultivation, it would have been unlikely to identify this compound via any phenotype- driven effort. Other silent biosynthetic gene clusters that have been successfully activated by promoter engineering include those responsible for producing aspirydone (47), malleilactone (48), and stambomycin (49). A limitation of this strategy is that it requires a genetically tractable host. Another common genetic manipulation used for natural product discovery is biosynthetic gene disruption. Arguably, the most powerful method to establish gene function is to evaluate the phenotypes of the parent (wild-type) and the genetic deletion (mutant) strains. This reverse genetics method is often employed during the genome-guided discovery of natural products. The phenotype compared between parent and mutant is typically either the metabolite profile (usually obtained by mass spectrometry) or the bioactivity of chromatographic fractions. The detection and isolation of plantazolicin demonstrated a secondary use of such a method to identify a novel compound whose production had been masked by other, more abundant compounds. When production of all known bioactive compounds was eliminated, the remaining bioactivity indicated that at least one additional biosynthetic gene cluster had yet to be identified (31, 50).

1.7 Lesson 6: Heterologous expression In cases where genetic manipulation in the native host is not feasible, the transfer of biosynthetic genes clusters to a genetically tractable host has the potential to facilitate compound production. The biosynthetic gene cluster for the lasso peptide astexin-1 (Figure 1.3A) was

12

identified in the freshwater α-proteobacterium Asticcacaulis exentricus, although compounds with the expected mass could not be detected after laboratory cultivation of this organism, necessitating an alternate expression method (38). Astexin-1 was successfully expressed, purified, and structurally characterized using E. coli as a heterologous host (Figure 1.3B) (37, 38). Following extensive variation of culture conditions, a recent study succeeded in producing astexin-1 from A. exentricus, albeit at low levels compared to those from E. coli (37). Likewise, successful high- level production of a number of prochlorosins required heterologous expression in E. coli and reconstitution in vitro, ultimately enabling structural confirmation of these natural products (Figure 1.3B) (40, 51). For further discussion and examples of heterologous expression, the interested reader is directed to other reviews specifically focused on this topic, also in this Special Issue (52, 53).

1.8 Lesson 7: Potential engineering of natural product analogs Beyond simply providing a novel natural product, identification of new biosynthetic gene clusters also has the potential to facilitate rational engineering of unnatural natural product variants. The production of prochlorosins from P. marinus MIT9313 is notable among lanthipeptides because of the exceptionally high number of precursor peptides (29) (40). While it is not unusual for a lanthipeptide gene cluster to harbor more than one precursor peptide, these typically require multiple lanthionine synthetases to modify all the precursor peptides (54, 55). Indeed, 17 prochlorosins from P. marinus MIT9313 have been successfully biosynthesized both in vitro and in vivo using this single lanthipeptide synthetase, (40). The potential for a single lanthipeptide synthetase enzyme to effect modification of up to 29 precursor peptides suggests that the prochlorosin synthetase is remarkably promiscuous, indicating that it may be a prime candidate for engineering novel lanthipeptides. In addition to the natural combinatorial biosynthesis of prochlorosin and its potential use in engineering, several other examples presented in this review have shown promise for the future engineering of natural product variants. The NRPS responsible for aureusimine production has been heterologously expressed and used to explore the pyrazinone assembly line using alternate substrates (56, 57). The enzymes responsible for asperfuranone biosynthesis have also provided a compliant system for the engineering of unnatural polyketides by domain swapping, an endeavor which has historically been challenging due to the complex nature of the requisite megasynthase

13

enzymes (58). The heterologous expression of plantazolicin in E. coli has also been achieved and exploited using precursor peptide replacement to generate unnatural variants (59). In summary, the thorough understanding of biosynthetic gene clusters afforded by reverse natural product discovery has the potential to expedite the production of unnatural variants for applications in medicinal chemistry and structure-activity relationship studies.

1.9 Outlook The sequencing and annotation of new genomes will continue to provide ample opportunity for the discovery of new natural products through genotype-based methods. It is possible, eventually, that natural product discovery will reach a point of diminishing returns from the investigation of new species, but the steadily increasing rate of reverse natural product discovery (Table 1.1) indicates that the field is still far from that point. While it is relatively straightforward to identify new putative PKS and NRPS clusters through sequence homology, other classes of natural products will require more sophisticated methods. For RiPP natural products, bioinformatics tools such as BAGEL (60) are useful in identifying genes for potential novel compounds, but less well-characterized and yet-to-be-discovered classes of natural products will require significant future investigation. The genes responsible for the biosynthesis of new or poorly understood natural products may currently be annotated as hypothetical proteins or domains of unknown function. The use of tools to group these unknown genes together by sequence homology (61), coupled with improved annotation in sequenced genomes, may allow for entirely new biosynthetic pathways to be discovered. As the field of reverse natural product discovery moves forward, we anticipate an increasing dependence on structure prediction to facilitate isolation of interesting natural products and to provide starting points for structure elucidation. Structure prediction for new RiPP natural products is largely enabled by identification of the precursor peptide sequence, while programs such as ClustScan (62) utilize known domain specificities to predict the structure of novel PKS and NRPS gene clusters. While not strictly a method for reverse natural product discovery, peptidogenomics may also be helpful in this endeavor by identifying new peptide natural products and linking those to their respective biosynthetic genes (63). Perhaps most widely useful, however, are comprehensive tools such as antiSMASH (antibiotics and Secondary Metabolite Analysis Shell), which incorporate information about many natural product classes beyond the

14

comparatively well-studied PKS, NRPS, and RiPP systems. As the collective knowledge of the natural product discovery field increases, the predictive power of these tools will likewise increase and create a positive feedback loop to further enable the future isolation of novel compounds. Natural product discovery will also be aided by the development of new genetic tools. To set a lofty goal, it would be of enormous utility to have a universal, user-friendly DNA manipulation platform for deleting genes of interest. Not only would this platform greatly enable deletion-guided approaches to natural product discovery, the whole of biology would benefit immensely from the development of a simple method for constructing genetic knockouts applicable to a broad range of organisms. If such a technique were developed and freely shared with the community, research groups focused on the chemical aspects of natural product research would now have the ability to construct useful strains without specialized knowledge in genetic manipulation. Perhaps the most significant hurdle to isolating reverse-discovered natural products is that many gene clusters are silent (expression is undetectable) under laboratory cultivation. In such cases, no novel natural product is readily observable in extracts from cultures of the organism harboring the biosynthetic genes of interest. In order to isolate the compound in question, the genes responsible for its production must be activated. A number of methods have been developed to activate silent clusters, including variation of culture conditions, heterologous expression, promoter engineering, induction with gamma-butyrolactones, and co-culturing with other organisms to stimulate production, among others (24, 25, 44, 64). One persistent issue with the use of heterologous hosts is the potential for inefficient export of the natural product, which may be addressed in part by the development of greater numbers of genetically tractable heterologous hosts. These tools will help to streamline the identification and production of natural products from gene clusters which are difficult, if not impossible, to express under laboratory culture conditions.

1.10 Conclusion Rather than relying on the observation of a phenotype, reverse (genotype-driven) natural product discovery bases isolation of the compound(s) of interest on knowledge of the associated biosynthetic genes. The examples of reverse-discovered natural products presented in this review represent several biosynthetic classes, producing organisms, and biological targets. It is plausible that none of the discussed examples would have been discovered by the traditional phenotype-

15 driven method due to unknown biological activity or activity that is extremely selective so as to preclude detection during bioassay-guided isolation. Perhaps more significantly, most of these examples, as well as others listed in Table 1.1, required some measure of genetic manipulation (promoter engineering, comparison of genetic deletion strains, heterologous expression, etc.) in order to produce useful levels of the natural product of interest. In some cases, a low level of production was masked by other, more prominent compounds, while in others, the biosynthetic gene cluster of interest was entirely silent despite extensive modification of culture conditions. The emergence of novel natural products even from well-studied model organisms indicates that the potential for production of structurally complex compounds remains largely untapped at this point, and the constantly increasing number of publicly available genomes will certainly provide a surplus of avenues for natural product discovery.

1.11 References

1. Newman, D. J., and Cragg, G. M. (2012) Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 75, 311-335. 2. Baltz, R. H. (2006) Marcel Faber Roundtable: is our antibiotic pipeline unproductive because of starvation, constipation or lack of inspiration? J. Ind. Microbiol. Technol. 33, 507-513. 3. Winzer, T., Gazda, V., He, Z., Kaminski, F., Kern, M., Larson, T. R., Li, Y., Meade, F., Teodor, R., Vaistij, F. E., Walker, C., Bowser, T. A., and Graham, I. A. (2012) A Papaver somniferum 10-gene cluster for synthesis of the anticancer alkaloid noscapine. Science 336, 1704-1708. 4. Crone, W. J. K., Leeper, F. J., and Truman, A. W. (2012) Identification and characterisation of the gene cluster for the anti-MRSA antibiotic bottromycin: expanding the biosynthetic diversity of ribosomal peptides. Chem. Sci. 3, 3516-3521. 5. Gomez-Escribano, J. P., Song, L. J., Bibb, M. J., and Challis, G. L. (2012) Posttranslational beta-methylation and macrolactamidination in the biosynthesis of the bottromycin complex of ribosomal peptide antibiotics. Chem. Sci. 3, 3522-3525. 6. Hou, Y., Tianero, M. D., Kwan, J. C., Wyche, T. P., Michel, C. R., Ellis, G. A., Vazquez- Rivera, E., Braun, D. R., Rose, W. E., Schmidt, E. W., and Bugni, T. S. (2012) Structure and biosynthesis of the antibiotic bottromycin D. Org. Lett. 14, 5050-5053. 7. Huo, L., Rachid, S., Stadler, M., Wenzel, S. C., and Muller, R. (2012) Synthetic biotechnology to study and engineer ribosomal bottromycin biosynthesis. Chem. Biol. 19, 1278-1287. 8. Kelly, W. L., Pan, L., and Li, C. (2009) Thiostrepton biosynthesis: prototype for a new family of bacteriocins. J. Am. Chem. Soc. 131, 4327-4334. 9. Liao, R., Duan, L., Lei, C., Pan, H., Ding, Y., Zhang, Q., Chen, D., Shen, B., Yu, Y., and Liu, W. (2009) Thiopeptide biosynthesis featuring ribosomally synthesized precursor peptides and conserved posttranslational modifications. Chem. Biol. 16, 141-147.

16

10. Wieland Brown, L. C., Acker, M. G., Clardy, J., Walsh, C. T., and Fischbach, M. A. (2009) Thirteen posttranslational modifications convert a 14-residue peptide into the antibiotic thiocillin. Proc. Natl. Acad. Sci. U S A 106, 2549-2553. 11. Morris, R. P., Leeds, J. A., Naegeli, H. U., Oberer, L., Memmert, K., Weber, E., LaMarche, M. J., Parker, C. N., Burrer, N., Esterow, S., Hein, A. E., Schmitt, E. K., and Krastel, P. (2009) Ribosomally synthesized thiopeptide antibiotics targeting elongation factor Tu. J. Am. Chem. Soc. 131, 5946-5955. 12. Omura, S., Ikeda, H., Ishikawa, J., Hanamoto, A., Takahashi, C., Shinose, M., Takahashi, Y., Horikawa, H., Nakazawa, H., Osonoe, T., Kikuchi, H., Shiba, T., Sakaki, Y., and Hattori, M. (2001) Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites. Proc. Natl. Acad. Sci. U S A 98, 12215-12220. 13. Bentley, S. D., et al. (2002) Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417, 141-147. 14. Malpartida, F., and Hopwood, D. A. (1984) Molecular cloning of the whole biosynthetic pathway of a Streptomyces antibiotic and its expression in a heterologous host. Nature 309, 462-464. 15. Malpartida, F., and Hopwood, D. A. (1986) Physical and genetic characterisation of the gene cluster for the antibiotic actinorhodin in Streptomyces coelicolor A3(2). Mol. Gen. Genet. 205, 66-73. 16. Feitelson, J. S., Malpartida, F., and Hopwood, D. A. (1985) Genetic and biochemical characterization of the red gene cluster of Streptomyces coelicolor A3(2). J. Gen. Microbiol. 131, 2431-2441. 17. Chong, P. P., Podmore, S. M., Kieser, H. M., Redenbach, M., Turgay, K., Marahiel, M., Hopwood, D. A., and Smith, C. P. (1998) Physical identification of a chromosomal locus encoding biosynthetic genes for the lipopeptide calcium-dependent antibiotic (CDA) of Streptomyces coelicolor A3(2). Microbiology 144, 193-199. 18. Craney, A., Ahmed, S., and Nodwell, J. (2013) Towards a new science of secondary metabolism. J. Antibiot. 66, 387–400. 19. Patel, P. S., Huang, S., Fisher, S., Pirnik, D., Aklonis, C., Dean, L., Meyers, E., Fernandes, P., and Mayerl, F. (1995) Bacillaene, a novel inhibitor of procaryotic protein synthesis produced by Bacillus subtilis: production, taxonomy, isolation, physico-chemical characterization and biological activity. J. Antibiot. 48, 997-1003. 20. Chen, X. H., Vater, J., Piel, J., Franke, P., Scholz, R., Schneider, K., Koumoutsi, A., Hitzeroth, G., Grammel, N., Strittmatter, A. W., Gottschalk, G., Sussmuth, R. D., and Borriss, R. (2006) Structural and functional characterization of three polyketide synthase gene clusters in Bacillus amyloliquefaciens FZB 42. J. Bacteriol. 188, 4024-4036. 21. Butcher, R. A., Schroeder, F. C., Fischbach, M. A., Straight, P. D., Kolter, R., Walsh, C. T., and Clardy, J. (2007) The identification of bacillaene, the product of the PksX megacomplex in Bacillus subtilis. Proc. Natl. Acad. Sci. U S A 104, 1506-1509. 22. Challis, G. L., and Ravel, J. (2000) Coelichelin, a new peptide siderophore encoded by the Streptomyces coelicolor genome: structure prediction from the sequence of its non- ribosomal peptide synthetase. FEMS Microbiol. Lett. 187, 111-114. 23. Lautru, S., Deeth, R. J., Bailey, L. M., and Challis, G. L. (2005) Discovery of a new peptide natural product by Streptomyces coelicolor genome mining. Nat. Chem. Biol. 1, 265-269.

17

24. Challis, G. L. (2008) Genome mining for novel natural product discovery. J. Med. Chem. 51, 2618-2628. 25. Gross, H. (2009) Genomic mining—a concept for the discovery of new bioactive natural products. Curr. Opin. Drug Discov. Devel. 12, 207-219. 26. Velasquez, J. E., and van der Donk, W. A. (2011) Genome mining for ribosomally synthesized natural products. Curr. Opin. Chem. Biol. 15, 11-21. 27. Wyatt, M. A., Wang, W., Roux, C. M., Beasley, F. C., Heinrichs, D. E., Dunman, P. M., and Magarvey, N. A. (2010) Staphylococcus aureus nonribosomal peptide secondary metabolites regulate virulence. Science 329, 294-296. 28. Zimmermann, M., and Fischbach, M. A. (2010) A family of pyrazinone natural products from a conserved nonribosomal peptide synthetase in Staphylococcus aureus. Chem. Biol. 17, 925-930. 29. Arnison, P. G., et. al. (2013) Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, 108-160. 30. Lee, S. W., Mitchell, D. A., Markley, A. L., Hensler, M. E., Gonzalez, D., Wohlrab, A., Dorrestein, P. C., Nizet, V., and Dixon, J. E. (2008) Discovery of a widely distributed toxin biosynthetic gene cluster. Proc. Natl. Acad. Sci. U S A 105, 5879-5884. 31. Scholz, R., Molohon, K. J., Nachtigall, J., Vater, J., Markley, A. L., Sussmuth, R. D., Mitchell, D. A., and Borriss, R. (2011) Plantazolicin, a novel microcin B17/streptolysin S- like natural product from Bacillus amyloliquefaciens FZB42. J. Bacteriol. 193, 215-224. 32. Kalyon, B., Helaly, S. E., Scholz, R., Nachtigall, J., Vater, J., Borriss, R., and Sussmuth, R. D. (2011) Plantazolicin A and B: structure elucidation of ribosomally synthesized thiazole/oxazole peptides from Bacillus amyloliquefaciens FZB42. Org. Lett. 13, 2996- 2999. 33. Molohon, K. J., Melby, J. O., Lee, J., Evans, B. S., Dunbar, K. L., Bumpus, S. B., Kelleher, N. L., and Mitchell, D. A. (2011) Structure determination and interception of biosynthetic intermediates for the plantazolicin class of highly discriminating antibiotics. ACS Chem. Biol. 6, 1307-1313. 34. Melby, J. O., Nard, N. J., and Mitchell, D. A. (2011) Thiazole/oxazole-modified microcins: complex natural products from ribosomal templates. Curr. Opin. Chem. Biol. 15, 369-378. 35. Knappe, T. A., Linne, U., Zirah, S., Rebuffat, S., Xie, X., and Marahiel, M. A. (2008) Isolation and structural characterization of capistruin, a lasso peptide predicted from the genome sequence of Burkholderia thailandensis E264. J. Am. Chem. Soc. 130, 11446- 11454. 36. Hegemann, J. D., Zimmermann, M., Xie, X., and Marahiel, M. A. (2013) Caulosegnins I- III: a highly diverse group of lasso peptides derived from a single biosynthetic gene cluster. J. Am. Chem. Soc. 135, 210-222. 37. Zimmermann, M., Hegemann, J. D., Xie, X., and Marahiel, M. A. (2013) The astexin-1 lasso peptides: biosynthesis, stability, and structural studies. Chem. Biol. 20, 558-569. 38. Maksimov, M. O., Pelczer, I., and Link, A. J. (2012) Precursor-centric genome-mining approach for lasso peptide discovery. Proc. Natl. Acad. Sci. U S A 109, 15223-15228. 39. Haft, D. H. (2009) A strain-variable bacteriocin in Bacillus anthracis and Bacillus cereus with repeated Cys-Xaa-Xaa motifs. Biol. Direct 4, 15. 40. Li, B., Sher, D., Kelly, L., Shi, Y., Huang, K., Knerr, P. J., Joewono, I., Rusch, D., Chisholm, S. W., and van der Donk, W. A. (2010) Catalytic promiscuity in the biosynthesis

18

of cyclic peptide secondary metabolites in planktonic marine cyanobacteria. Proc. Natl. Acad. Sci. U S A 107, 10430-10435. 41. Haft, D. H., Basu, M. K., and Mitchell, D. A. (2010) Expansion of ribosomally produced natural products: a - and Nif11-related precursor family. BMC Biol. 8, 70. 42. Chiang, Y. M., Szewczyk, E., Davidson, A. D., Keller, N., Oakley, B. R., and Wang, C. C. (2009) A gene cluster containing two fungal polyketide synthases encodes the biosynthetic pathway for a polyketide, asperfuranone, in Aspergillus nidulans. J. Am. Chem. Soc. 131, 2965-2970. 43. Galagan, J. E., et al. (2005) Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature 438, 1105-1115. 44. Sanchez, J. F., Somoza, A. D., Keller, N. P., and Wang, C. C. (2012) Advances in Aspergillus secondary metabolite research in the post-genomic era. Nat. Prod. Rep. 29, 351-371. 45. Chen, X. H., Koumoutsi, A., Scholz, R., Eisenreich, A., Schneider, K., Heinemeyer, I., Morgenstern, B., Voss, B., Hess, W. R., Reva, O., Junge, H., Voigt, B., Jungblut, P. R., Vater, J., Sussmuth, R., Liesegang, H., Strittmatter, A., Gottschalk, G., and Borriss, R. (2007) Comparative analysis of the complete genome sequence of the plant growth- promoting bacterium Bacillus amyloliquefaciens FZB42. Nat. Biotechnol. 25, 1007-1014. 46. Baba, T., Bae, T., Schneewind, O., Takeuchi, F., and Hiramatsu, K. (2008) Genome sequence of Staphylococcus aureus strain Newman and comparative analysis of staphylococcal genomes: polymorphism and evolution of two major pathogenicity islands. J. Bacteriol. 190, 300-310. 47. Bergmann, S., Schumann, J., Scherlach, K., Lange, C., Brakhage, A. A., and Hertweck, C. (2007) Genomics-driven discovery of PKS-NRPS hybrid metabolites from Aspergillus nidulans. Nat. Chem. Biol. 3, 213-217. 48. Biggins, J. B., Ternei, M. A., and Brady, S. F. (2012) Malleilactone, a polyketide synthase- derived virulence factor encoded by the cryptic secondary metabolome of Burkholderia pseudomallei group pathogens. J. Am. Chem. Soc. 134, 13192-13195. 49. Laureti, L., Song, L., Huang, S., Corre, C., Leblond, P., Challis, G. L., and Aigle, B. (2011) Identification of a bioactive 51-membered macrolide complex by activation of a silent polyketide synthase in Streptomyces ambofaciens. Proc. Natl. Acad. Sci. U S A 108, 6258- 6263. 50. Chen, X. H., Scholz, R., Borriss, M., Junge, H., Mogel, G., Kunz, S., and Borriss, R. (2009) Difficidin and bacilysin produced by plant-associated Bacillus amyloliquefaciens are efficient in controlling fire blight disease. J. Biotechnol. 140, 38-44. 51. Tang, W., and van der Donk, W. A. (2012) Structural characterization of four prochlorosins: a novel class of lantipeptides produced by planktonic marine cyanobacteria. Biochemistry 51, 4271-4279. 52. Ikeda, H., Shin-ya, K., and Omura, S. (2014) Genome mining of the Streptomyces avermitilis genome and development of genome-minimized hosts for heterologous expression of biosynthetic gene clusters. J. Ind. Microbiol. Technol. 41, 233-250. 53. Gomez-Escribano, J. P., and Bibb, M. J. (2014) Heterologous expression of natural product biosynthetic gene clusters in Streptomyces coelicolor: from genome mining to manipulation of biosynthetic pathways. J. Ind. Microbiol. Technol. 41, 425-431.

19

54. McClerren, A. L., Cooper, L. E., Quan, C., Thomas, P. M., Kelleher, N. L., and van der Donk, W. A. (2006) Discovery and in vitro biosynthesis of haloduracin, a two-component lantibiotic. Proc. Natl. Acad. Sci. U S A 103, 17243-17248. 55. Begley, M., Cotter, P. D., Hill, C., and Ross, R. P. (2009) Identification of a novel two- peptide lantibiotic, lichenicidin, following rational genome mining for LanM proteins. Appl. Environ. Microbiol. 75, 5451-5460. 56. Wyatt, M. A., Mok, M. C., Junop, M., and Magarvey, N. A. (2012) Heterologous expression and structural characterisation of a pyrazinone natural product assembly line. Chembiochem 13, 2408-2415. 57. Wyatt, M. A., and Magarvey, N. A. (2013) Optimizing dimodular nonribosomal peptide synthetases and natural dipeptides in an Escherichia coli heterologous host. Biochem. Cell Biol. 91, 203-208. 58. Liu, T., Chiang, Y. M., Somoza, A. D., Oakley, B. R., and Wang, C. C. (2011) Engineering of an "unnatural" natural product by swapping polyketide synthase domains in Aspergillus nidulans. J. Am. Chem. Soc. 133, 13314-13316. 59. Deane, C. D., Melby, J. O., Molohon, K. J., Susarrey, A. R., and Mitchell, D. A. (2013) Engineering unnatural variants of plantazolicin through codon reprogramming. ACS Chem. Biol. 8, 1998-2008. 60. de Jong, A., van Hijum, S. A., Bijlsma, J. J., Kok, J., and Kuipers, O. P. (2006) BAGEL: a web-based bacteriocin genome mining tool. Nucl. Acids Res. 34, W273-279. 61. Dunbar, K. L., Melby, J. O., and Mitchell, D. A. (2012) YcaO domains use ATP to activate amide backbones during peptide cyclodehydrations. Nat. Chem. Biol. 8, 569-575. 62. Starcevic, A., Zucko, J., Simunkovic, J., Long, P. F., Cullum, J., and Hranueli, D. (2008) ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucl. Acids Res. 36, 6882-6892. 63. Kersten, R. D., Yang, Y. L., Xu, Y., Cimermancic, P., Nam, S. J., Fenical, W., Fischbach, M. A., Moore, B. S., and Dorrestein, P. C. (2011) A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nat. Chem. Biol. 7, 794-802. 64. Takano, E. (2006) Gamma-butyrolactones: Streptomyces signalling molecules regulating antibiotic production and differentiation. Curr. Opin. Microbiol. 9, 287-294. 65. Poralla, K., Muth, G., and Hartner, T. (2000) Hopanoids are formed during transition from substrate to aerial hyphae in Streptomyces coelicolor A3(2). FEMS Microbiol. Lett. 189, 93-95. 66. Martin, R., Sterner, O., Alvarez, M. A., de Clercq, E., Bailey, J. E., and Minas, W. (2001) Collinone, a new recombinant angular polyketide antibiotic made by an engineered Streptomyces strain. J. Antibiot. 54, 239-249. 67. Cane, D. E., and Watt, R. M. (2003) Expression and mechanistic analysis of a germacradienol synthase from Streptomyces coelicolor implicated in geosmin biosynthesis. Proc. Natl. Acad. Sci. U S A 100, 1547-1551. 68. Barona-Gomez, F., Wong, U., Giannakopulos, A. E., Derrick, P. J., and Challis, G. L. (2004) Identification of a cluster of genes that directs desferrioxamine biosynthesis in Streptomyces coelicolor M145. J. Am. Chem. Soc. 126, 16282-16283. 69. Tohyama, S., Eguchi, T., Dhakal, R. P., Akashi, T., Otsuka, M., and Kakinuma, K. (2004) Genome-inspired search for new antibiotics. Isolation and structure determination of new

20

28-membered polyketide macrolactones, halstoctacosanolides A and B, from Streptomyces halstedii HC34. Tetrahedron 60, 3999-4005. 70. Tohyama, S., Kakinuma, K., and Eguchi, T. (2006) The complete biosynthetic gene cluster of the 28-membered polyketide macrolactones, halstoctacosanolides, from Streptomyces halstedii HC34. J. Antibiot. 59, 44-52. 71. Kodani, S., Hudson, M. E., Durrant, M. C., Buttner, M. J., Nodwell, J. R., and Willey, J. M. (2004) The SapB morphogen is a lantibiotic-like peptide derived from the product of the developmental gene ramS in Streptomyces coelicolor. Proc. Natl. Acad. Sci. U S A 101, 11448-11453. 72. Austin, M. B., Izumikawa, M., Bowman, M. E., Udwary, D. W., Ferrer, J. L., Moore, B. S., and Noel, J. P. (2004) Crystal structure of a bacterial type III polyketide synthase and enzymatic control of reactive polyketide intermediates. J. Biol. Chem. 279, 45162-45174. 73. Fazio, G. C., Xu, R., and Matsuda, S. P. (2004) Genome mining to identify new plant triterpenoids. J. Am. Chem. Soc. 126, 5678-5679. 74. Arabidopsis Initiative, The. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815. 75. Kunze, B., Reichenbach, H., Muller, R., and Hofle, G. (2005) Aurafuron A and B, new bioactive polyketides from Stigmatella aurantiaca and Archangium gephyra (Myxobacteria). Fermentation, isolation, physico-chemical properties, structure and biological activity. J. Antibiot. 58, 244-251. 76. Huntley, S., Hamann, N., Wegener-Feldbrugge, S., Treuner-Lange, A., Kube, M., Reinhardt, R., Klages, S., Muller, R., Ronning, C. M., Nierman, W. C., and Sogaard- Andersen, L. (2011) Comparative genomic analysis of fruiting body formation in Myxococcales. Mol. Biol. Evol. 28, 1083-1097. 77. McAlpine, J. B., Bachmann, B. O., Piraee, M., Tremblay, S., Alarco, A. M., Zazopoulos, E., and Farnet, C. M. (2005) Microbial genomics as a guide to drug discovery and structural elucidation: ECO-02301, a novel antifungal agent, as an example. J. Nat. Prod. 68, 493- 496. 78. Wenzel, S. C., Kunze, B., Hofle, G., Silakowski, B., Scharfe, M., Blocker, H., and Muller, R. (2005) Structure and biosynthesis of myxochromides S1-3 in Stigmatella aurantiaca: evidence for an iterative bacterial type I polyketide synthase and for module skipping in nonribosomal peptide biosynthesis. Chembiochem 6, 375-385. 79. Scherlach, K., and Hertweck, C. (2006) Discovery of aspoquinolones A-D, prenylated quinoline-2-one alkaloids from Aspergillus nidulans, motivated by genome mining. Org. Biomol. Chem. 4, 3517-3520. 80. Meiser, P., Bode, H. B., and Muller, R. (2006) The unique DKxanthene secondary metabolite family from the myxobacterium Myxococcus xanthus is required for developmental sporulation. Proc. Natl. Acad. Sci. U S A 103, 19128-19133. 81. Goldman, B. S., Nierman, W. C., Kaiser, D., Slater, S. C., Durkin, A. S., Eisen, J. A., Ronning, C. M., Barbazuk, W. B., Blanchard, M., Field, C., Halling, C., Hinkle, G., Iartchuk, O., Kim, H. S., Mackenzie, C., Madupu, R., Miller, N., Shvartsbeyn, A., Sullivan, S. A., Vaudin, M., Wiegand, R., and Kaplan, H. B. (2006) Evolution of sensory complexity recorded in a myxobacterial genome. Proc. Natl. Acad. Sci. U S A 103, 15200-15205. 82. Banskota, A. H., McAlpine, J. B., Sorensen, D., Ibrahim, A., Aouidate, M., Piraee, M., Alarco, A. M., Farnet, C. M., and Zazopoulos, E. (2006) Genomic analyses lead to novel

21

secondary metabolites. Part 3. ECO-0501, a novel antibacterial of a new class. J. Antibiot. 59, 533-542. 83. Song, L., Barona-Gomez, F., Corre, C., Xiang, L., Udwary, D. W., Austin, M. B., Noel, J. P., Moore, B. S., and Challis, G. L. (2006) Type III polyketide synthase beta-ketoacyl-ACP starter unit and ethylmalonyl-CoA extender unit selectivity discovered by Streptomyces coelicolor genome mining. J. Am. Chem. Soc. 128, 14754-14755. 84. Lawton, E. M., Cotter, P. D., Hill, C., and Ross, R. P. (2007) Identification of a novel two- peptide lantibiotic, haloduracin, produced by the alkaliphile Bacillus halodurans C-125. FEMS Microbiol. Lett. 267, 64-71. 85. Takami, H., Nakasone, K., Takaki, Y., Maeno, G., Sasaki, R., Masui, N., Fuji, F., Hirama, C., Nakamura, Y., Ogasawara, N., Kuhara, S., and Horikoshi, K. (2000) Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis. Nucleic Acids Res. 28, 4317-4331. 86. Diep, D. B., Godager, L., Brede, D., and Nes, I. F. (2006) Data mining and characterization of a novel pediocin-like bacteriocin system from the genome of Pediococcus pentosaceus ATCC 25745. Microbiology 152, 1649-1659. 87. Makarova, K., et al. (2006) Comparative genomics of the lactic acid bacteria. Proc. Natl. Acad. Sci. U S A 103, 15611-15616. 88. Bok, J. W., Hoffmeister, D., Maggio-Hall, L. A., Murillo, R., Glasner, J. D., and Keller, N. P. (2006) Genomic mining for Aspergillus natural products. Chem. Biol. 13, 31-37. 89. Sudek, S., Haygood, M. G., Youssef, D. T., and Schmidt, E. W. (2006) Structure of trichamide, a cyclic peptide from the bloom-forming cyanobacterium Trichodesmium erythraeum, predicted from the genome sequence. Appl. Environ. Microbiol. 72, 4382- 4387. 90. Ishida, K., Christiansen, G., Yoshida, W. Y., Kurmayer, R., Welker, M., Valls, N., Bonjoch, J., Hertweck, C., Borner, T., Hemscheidt, T., and Dittmann, E. (2007) Biosynthesis and structure of aeruginoside 126A and 126B, cyanobacterial peptide glycosides bearing a 2-carboxy-6-hydroxyoctahydroindole moiety. Chem. Biol. 14, 565- 576. 91. Hornung, A., Bertazzo, M., Dziarnowski, A., Schneider, K., Welzel, K., Wohlert, S. E., Holzenkampfer, M., Nicholson, G. J., Bechthold, A., Sussmuth, R. D., Vente, A., and Pelzer, S. (2007) A genomic screening approach to the structure-guided identification of drug candidates from natural sources, Chembiochem 8, 757-766. 92. Gross, H., Stockwell, V. O., Henkels, M. D., Nowak-Thompson, B., Loper, J. E., and Gerwick, W. H. (2007) The genomisotopic approach: a systematic method to isolate products of orphan biosynthetic gene clusters. Chem. Biol. 14, 53-63. 93. Paulsen, I. T., et al. (2005) Complete genome sequence of the plant commensal Pseudomonas fluorescens Pf-5. Nat. Biotechnol. 23, 873-878. 94. Portmann, C., Blom, J. F., Kaiser, M., Brun, R., Juttner, F., and Gademann, K. (2008) Isolation of aerucyclamides C and D and structure revision of microcyclamide 7806A: heterocyclic ribosomal peptides from Microcystis aeruginosa PCC 7806 and their antiparasite evaluation. J. Nat. Prod. 71, 1891-1896. 95. Zhao, B., Lin, X., Lei, L., Lamb, D. C., Kelly, S. L., Waterman, M. R., and Cane, D. E. (2008) Biosynthesis of the sesquiterpene antibiotic albaflavenone in Streptomyces coelicolor A3(2). J. Biol. Chem. 283, 8183-8189.

22

96. Kim, H. S., Schell, M. A., Yu, Y., Ulrich, R. L., Sarria, S. H., Nierman, W. C., and DeShazer, D. (2005) Bacterial genome adaptation to niches: divergence of the potential virulence genes in three Burkholderia species of different survival strategies. BMC Genomics 6, 174. 97. McAlpine, J. B., Banskota, A. H., Charan, R. D., Schlingmann, G., Zazopoulos, E., Piraee, M., Janso, J., Bernan, V. S., Aouidate, M., Farnet, C. M., Feng, X., Zhao, Z., and Carter, G. T. (2008) Biosynthesis of diazepinomicin/ECO-4601, a Micromonospora secondary metabolite with a novel ring system. J. Nat. Prod. 71, 1585-1590. 98. Chiang, Y. M., Szewczyk, E., Nayak, T., Davidson, A. D., Sanchez, J. F., Lo, H. C., Ho, W. Y., Simityan, H., Kuo, E., Praseuth, A., Watanabe, K., Oakley, B. R., and Wang, C. C. (2008) Molecular genetic mining of the Aspergillus secondary metabolome: discovery of the emericellamide biosynthetic pathway. Chem. Biol. 15, 527-532. 99. Komatsu, M., Tsuda, M., Omura, S., Oikawa, H., and Ikeda, H. (2008) Identification and functional analysis of genes controlling biosynthesis of 2-methylisoborneol. Proc. Natl. Acad. Sci. U S A 105, 7422-7427. 100. Ziemert, N., Ishida, K., Quillardet, P., Bouchier, C., Hertweck, C., de Marsac, N. T., and Dittmann, E. (2008) Microcyclamide biosynthesis in two strains of Microcystis aeruginosa: from structure to genes and vice versa. Appl. Environ. Microbiol. 74, 1791- 1797. 101. Nguyen, T., Ishida, K., Jenke-Kodama, H., Dittmann, E., Gurgui, C., Hochmuth, T., Taudien, S., Platzer, M., Hertweck, C., and Piel, J. (2008) Exploiting the mosaic structure of trans-acyltransferase polyketide synthases for natural product discovery and pathway dissection. Nat. Biotechnol. 26, 225-233. 102. Herrmann, A., Burman, R., Mylne, J. S., Karlsson, G., Gullbo, J., Craik, D. J., Clark, R. J., and Goransson, U. (2008) The alpine violet, Viola biflora, is a rich source of cyclotides with potent cytotoxicity. Phytochemistry 69, 939-952. 103. Awakawa, T., Yokota, K., Funa, N., Doi, F., Mori, N., Watanabe, H., and Horinouchi, S. (2009) Physically discrete beta-lactamase-type thioesterase catalyzes product release in atrochrysone synthesis by iterative type I polyketide synthase. Chem. Biol. 16, 613-623. 104. Bok, J. W., Chiang, Y. M., Szewczyk, E., Reyes-Dominguez, Y., Davidson, A. D., Sanchez, J. F., Lo, H. C., Watanabe, K., Strauss, J., Oakley, B. R., Wang, C. C., and Keller, N. P. (2009) Chromatin-level regulation of biosynthetic gene clusters. Nat. Chem. Biol. 5, 462-464. 105. Sanchez, J. F., Chiang, Y. M., Szewczyk, E., Davidson, A. D., Ahuja, M., Elizabeth Oakley, C., Woo Bok, J., Keller, N., Oakley, B. R., and Wang, C. C. (2010) Molecular genetic analysis of the orsellinic acid/F9775 gene cluster of Aspergillus nidulans. Mol. Biosyst. 6, 587-593. 106. Schroeckh, V., Scherlach, K., Nutzmann, H. W., Shelest, E., Schmidt-Heck, W., Schuemann, J., Martin, K., Hertweck, C., and Brakhage, A. A. (2009) Intimate bacterial- fungal interaction triggers biosynthesis of archetypal polyketides in Aspergillus nidulans. Proc. Natl. Acad. Sci. U S A 106, 14558-14563. 107. Dischinger, J., Josten, M., Szekat, C., Sahl, H. G., and Bierbaum, G. (2009) Production of the novel two-peptide lantibiotic lichenicidin by Bacillus licheniformis DSM 13. PloS ONE 4, e6788. 108. Shenkarev, Z. O., Finkina, E. I., Nurmukhamedova, E. K., Balandin, S. V., Mineev, K. S., Nadezhdin, K. D., Yakimenko, Z. A., Tagaev, A. A., Temirov, Y. V., Arseniev, A. S., and

23

Ovchinnikova, T. V. (2010) Isolation, structure elucidation, and synergistic antibacterial activity of a novel two-component lantibiotic lichenicidin from Bacillus licheniformis VK21. Biochemistry 49, 6462-6472. 109. Rey, M. W., et al. (2004) Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species. Genome Biol. 5, R77. 110. Trabi, M., Mylne, J. S., Sando, L., and Craik, D. J. (2009) Circular proteins from Melicytus (Violaceae) refine the conserved protein and gene architecture of cyclotides. Org. Biomol. Chem. 7, 2378-2388. 111. Henrikson, J. C., Hoover, A. R., Joyner, P. M., and Cichewicz, R. H. (2009) A chemical epigenetics approach for engineering the in situ biosynthesis of a cryptic natural product from Aspergillus niger. Org. Biomol. Chem. 7, 435-438. 112. Pel, H. J., et al. (2007) Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88. Nat. Biotechnol. 25, 221-231. 113. Leikoski, N., Fewer, D. P., Jokela, J., Wahlsten, M., Rouhiainen, L., and Sivonen, K. (2010) Highly diverse cyanobactins in strains of the genus Anabaena. Appl. Environ. Microbiol. 76, 701-709. 114. Wang, H., Sivonen, K., Rouhiainen, L., Fewer, D. P., Lyra, C., Rantala-Ylinen, A., Vestola, J., Jokela, J., Rantasarkka, K., Li, Z., and Liu, B. (2012) Genome-derived insights into the biology of the hepatotoxic bloom-forming cyanobacterium Anabaena sp. strain 90. BMC Genomics 13, 613. 115. Scherlach, K., Schuemann, J., Dahse, H. M., and Hertweck, C. (2010) Aspernidine A and B, prenylated isoindolinone alkaloids from the model fungus Aspergillus nidulans. J. Antibiot. 63, 375-377. 116. Daly, K. M., Upton, M., Sandiford, S. K., Draper, L. A., Wescombe, P. A., Jack, R. W., O'Connor, P. M., Rossney, A., Gotz, F., Hill, C., Cotter, P. D., Ross, R. P., and Tagg, J. R. (2010) Production of the Bsa lantibiotic by community-acquired Staphylococcus aureus strains. J. Bacteriol. 192, 1131-1142. 117. Holden, M. T., et al. (2004) Complete genomes of two clinical Staphylococcus aureus strains: evidence for the rapid evolution of virulence and drug resistance. Proc. Natl. Acad. Sci. U S A 101, 9786-9791. 118. Pawlik, K., Kotowska, M., and Kolesinski, P. (2010) Streptomyces coelicolor A3(2) produces a new yellow pigment associated with the polyketide synthase Cpk. J. Mol. Microb. Biotech. 19, 147-151. 119. Pawlik, K., Kotowska, M., Chater, K. F., Kuczek, K., and Takano, E. (2007) A cryptic type I polyketide synthase (cpk) gene cluster in Streptomyces coelicolor A3(2). Arch. Microbiol. 187, 87-99. 120. Seshime, Y., Juvvadi, P. R., Kitamoto, K., Ebizuka, Y., and Fujii, I. (2010) Identification of csypyrone B1 as the novel product of Aspergillus oryzae type III polyketide synthase CsyB. Bioorg. Med. Chem. 18, 4542-4546. 121. Machida, M., et al. (2005) Genome sequencing and analysis of Aspergillus oryzae. Nature 438, 1157-1161. 122. Lazos, O., Tosin, M., Slusarczyk, A. L., Boakes, S., Cortes, J., Sidebottom, P. J., and Leadlay, P. F. (2010) Biosynthesis of the putative siderophore erythrochelin requires unprecedented crosstalk between separate nonribosomal peptide gene clusters. Chem. Biol. 17, 160-173.

24

123. Robbel, L., Knappe, T. A., Linne, U., Xie, X., and Marahiel, M. A. (2010) Erythrochelin- -a hydroxamate-type siderophore predicted from the genome of Saccharopolyspora erythraea. FEBS J. 277, 663-676. 124. Oliynyk, M., Samborskyy, M., Lester, J. B., Mironenko, T., Scott, N., Dickens, S., Haydock, S. F., and Leadlay, P. F. (2007) Complete genome sequence of the erythromycin- producing bacterium Saccharopolyspora erythraea NRRL23338. Nat. Biotechnol. 25, 447- 453. 125. Burman, R., Gruber, C. W., Rizzardi, K., Herrmann, A., Craik, D. J., Gupta, M. P., and Goransson, U. (2010) Cyclotide proteins and precursors from the genus Gloeospermum: filling a blank spot in the cyclotide map of Violaceae. Phytochemistry 71, 13-20. 126. Vassiliadis, G., Destoumieux-Garzon, D., Lombard, C., Rebuffat, S., and Peduzzi, J. (2010) Isolation and characterization of two members of the siderophore-microcin family, microcins M and H47. Antimicrob. Agents Chemother. 54, 288-297. 127. Ziemert, N., Ishida, K., Weiz, A., Hertweck, C., and Dittmann, E. (2010) Exploiting the natural diversity of microviridin gene clusters for discovery of novel tricyclic depsipeptides. Appl. Environ. Microbiol. 76, 3568-3574. 128. Kaneko, T., et al. (2007) Complete genomic structure of the bloom-forming toxic cyanobacterium Microcystis aeruginosa NIES-843. DNA Res. 14, 247-256. 129. Majchrzykiewicz, J. A., Lubelski, J., Moll, G. N., Kuipers, A., Bijlsma, J. J., Kuipers, O. P., and Rink, R. (2010) Production of a class II two-component lantibiotic of Streptococcus pneumoniae using the class I nisin synthetic machinery and leader sequence. Antimicrob. Agents Chemother. 54, 1498-1505. 130. Hoskins, J., et al. (2001) Genome of the bacterium Streptococcus pneumoniae strain R6, J. Bacteriol. 183, 5709-5717. 131. Rocap, G., et al. (2003) Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424, 1042-1047. 132. Engelhardt, K., Degnes, K. F., Kemmler, M., Bredholt, H., Fjaervik, E., Klinkenberg, G., Sletta, H., Ellingsen, T. E., and Zotchev, S. B. (2010) Production of a new thiopeptide antibiotic, TP-1161, by a marine Nocardiopsis species. Appl. Environ. Microbiol. 76, 4969- 4976. 133. Goto, Y., Li, B., Claesen, J., Shi, Y., Bibb, M. J., and van der Donk, W. A. (2010) Discovery of unique lanthionine synthetases reveals new mechanistic and evolutionary insights. PLoS Biol. 8, e1000339. 134. Pullan, S. T., Chandra, G., Bibb, M. J., and Merrick, M. (2011) Genome-wide analysis of the role of GlnR in Streptomyces venezuelae provides new insights into global nitrogen regulation in actinomycetes. BMC Genomics 12, 175. 135. Nielsen, M. L., Nielsen, J. B., Rank, C., Klejnstrup, M. L., Holm, D. K., Brogaard, K. H., Hansen, B. G., Frisvad, J. C., Larsen, T. O., and Mortensen, U. H. (2011) A genome-wide polyketide synthase deletion library uncovers novel genetic links to polyketides and meroterpenoids in Aspergillus nidulans. FEMS Microbiol. Lett. 321, 157-166. 136. Heneghan, M. N., Yakasai, A. A., Williams, K., Kadir, K. A., Wasil, Z., Bakeer, W., Fisch, K. M., Bailey, A. M., Simpson, T. J., Cox, R. J., and Lazarus, C. M. (2011) The programming role of trans-acting enoyl reductases during the biosynthesis of highly reduced fungal polyketides. Chem. Sci. 2, 972-979.

25

137. Claesen, J., and Bibb, M. J. (2011) Biosynthesis and regulation of grisemycin, a new member of the linaridin family of ribosomally synthesized peptides produced by Streptomyces griseus IFO 13350. J. Bacteriol. 193, 2510-2516. 138. Claesen, J., and Bibb, M. (2010) Genome mining and genetic analysis of cypemycin biosynthesis reveal an unusual class of posttranslationally modified peptides. Proc. Natl. Acad. Sci. U S A 107, 16297-16302. 139. Ohnishi, Y., Ishikawa, J., Hara, H., Suzuki, H., Ikenoya, M., Ikeda, H., Yamashita, A., Hattori, M., and Horinouchi, S. (2008) Genome sequence of the streptomycin-producing microorganism Streptomyces griseus IFO 13350. J. Bacteriol. 190, 4050-4060. 140. Qiao, K., Zhou, H., Xu, W., Zhang, W., Garg, N., and Tang, Y. (2011) A fungal nonribosomal peptide synthetase module that can synthesize thiopyrazines. Org. Lett. 13, 1758-1761. 141. Evans, B. S., Ntai, I., Chen, Y., Robinson, S. J., and Kelleher, N. L. (2011) Proteomics- based discovery of koranimine, a cyclic imine natural product. J. Am. Chem. Soc. 133, 7316-7319. 142. Wang, C., Henkes, L. M., Doughty, L. B., He, M., Wang, D., Meyer-Almes, F. J., and Cheng, Y. Q. (2011) Thailandepsins: bacterial products with potent histone deacetylase inhibitory activities and broad-spectrum antiproliferative activities. J. Nat. Prod. 74, 2031- 2038. 143. Ahuja, M., Chiang, Y. M., Chang, S. L., Praseuth, M. B., Entwistle, R., Sanchez, J. F., Lo, H. C., Yeh, H. H., Oakley, B. R., and Wang, C. C. (2012) Illuminating the diversity of aromatic polyketide synthases in Aspergillus nidulans. J. Am. Chem. Soc. 134, 8212-8221. 144. Zabala, A. O., Xu, W., Chooi, Y. H., and Tang, Y. (2012) Characterization of a silent azaphilone gene cluster from Aspergillus niger ATCC 1015 reveals a hydroxylation- mediated pyran-ring formation. Chem. Biol. 19, 1049-1059. 145. Franke, J., Ishida, K., and Hertweck, C. (2012) Genomics-driven discovery of burkholderic acid, a noncanonical, cryptic polyketide from human pathogenic Burkholderia species. Angew. Chem. Int. Ed. 51, 11611-11615. 146. Wang, H., and van der Donk, W. A. (2012) Biosynthesis of the class III lantipeptide catenulipeptin. ACS Chem. Biol. 7, 1529-1535. 147. Copeland, A., et al. (2009) Complete genome sequence of Catenulispora acidiphila type strain (ID 139908). Stand. Genomic Sci. 1, 119-125. 148. Krawczyk, B., Voller, G. H., Voller, J., Ensle, P., and Sussmuth, R. D. (2012) Curvopeptin: a new lanthionine-containing class III lantibiotic and its co-substrate promiscuous synthetase. Chembiochem 13, 2065-2071. 149. Chertkov, O., et al. (2011) Complete genome sequence of Thermomonospora curvata type strain (B9). Stand. Genomic Sci. 4, 13-22. 150. Teng, Y., Zhao, W. P., Qian, C. D., Li, O., Zhu, L., and Wu, X. C. (2012) Gene cluster analysis for the biosynthesis of elgicins, novel lantibiotics produced by Paenibacillus elgii B69. BMC Microbiol. 12, 45. 151. Sorensen, J. L., Hansen, F. T., Sondergaard, T. E., Staerk, D., Lee, T. V., Wimmer, R., Klitgaard, L. G., Purup, S., Giese, H., and Frandsen, R. J. (2012) Production of novel fusarielins by ectopic activation of the polyketide synthase 9 cluster in Fusarium graminearum. Environ. Microbiol. 14, 1159-1170. 152. Garg, N., Tang, W., Goto, Y., Nair, S. K., and van der Donk, W. A. (2012) Lantibiotics from Geobacillus thermodenitrificans. Proc. Natl. Acad. Sci. U S A 109, 5241-5246.

26

153. Feng, L., Wang, W., Cheng, J., Ren, Y., Zhao, G., Gao, C., Tang, Y., Liu, X., Han, W., Peng, X., Liu, R., and Wang, L. (2007) Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deep-subsurface oil reservoir. Proc. Natl. Acad. Sci. U S A 104, 5602-5607. 154. Bian, X., Plaza, A., Zhang, Y., and Muller, R. (2012) Luminmycins A-C, cryptic natural products from Photorhabdus luminescens identified by heterologous expression in Escherichia coli. J. Nat. Prod. 75, 1652-1655. 155. Duchaud, E., et al. (2003) The genome sequence of the entomopathogenic bacterium Photorhabdus luminescens. Nat. Biotechnol. 21, 1307-1313. 156. Nakazawa, T., Ishiuchi, K., Praseuth, A., Noguchi, H., Hotta, K., and Watanabe, K. (2012) Overexpressing transcriptional regulator in Aspergillus oryzae activates a silent biosynthetic pathway to produce a novel polyketide. Chembiochem 13, 855-861. 157. Sarkar, A., Funk, A. N., Scherlach, K., Horn, F., Schroeckh, V., Chankhamjon, P., Westermann, M., Roth, M., Brakhage, A. A., Hertweck, C., and Horn, U. (2012) Differential expression of silent polyketide biosynthesis gene clusters in chemostat cultures of Aspergillus nidulans. J. Biotechnol. 160, 64-71. 158. Pistorius, D., and Muller, R. (2012) Discovery of the rhizopodin biosynthetic gene cluster in Stigmatella aurantiaca Sg a15 by genome mining. Chembiochem 13, 416-426. 159. Voller, G. H., Krawczyk, B., Ensle, P., and Sussmuth, R. D. (2013) Involvement and unusual substrate specificity of a prolyl oligopeptidase in class III lanthipeptide maturation. J. Am. Chem. Soc. 135, 7426-7429. 160. Pukall, R., et al. (2010) Complete genome sequence of Kribbella flavida type strain (IFO 14399). Stand. Genomic Sci. 2, 186-193. 161. Chen, Y., McClure, R. A., Zheng, Y., Thomson, R. J., and Kelleher, N. L. (2013) Proteomics guided discovery of flavopeptins: anti-proliferative aldehydes synthesized by a reductase domain-containing nonribosomal peptide synthetase. J. Am. Chem. Soc. 162. Konig, C. C., Scherlach, K., Schroeckh, V., Horn, F., Nietzsche, S., Brakhage, A. A., and Hertweck, C. (2013) Bacterium induces cryptic meroterpenoid pathway in the pathogenic fungus Aspergillus fumigatus. Chembiochem 14, 938-942. 163. Chooi, Y. H., Fang, J., Liu, H., Filler, S. G., Wang, P., and Tang, Y. (2013) Genome mining of a prenylated and immunosuppressive polyketide from pathogenic fungi. Org. Lett. 15, 780-783. 164. Nierman, W. C., et al. (2005) Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature 438, 1151-1156. 165. Yin, W. B., Chooi, Y. H., Smith, A. R., Cacho, R. A., Hu, Y., White, T. C., and Tang, Y. (2013) Discovery of cryptic polyketide metabolites from dermatophytes using heterologous expression in Aspergillus nidulans. ACS Synth. Biol. 2, 629-634. 166. Martinez, D. A., et al. (2012) Comparative genome analysis of Trichophyton rubrum and related dermatophytes reveals candidate genes involved in infection. mBio 3, e00259- 00212. 167. Liu, X., Biswas, S., Berg, M. G., Antapli, C. M., Xie, F., Wang, Q., Tang, M. C., Tang, G. L., Zhang, L., Dreyfuss, G., and Cheng, Y. Q. (2013) Genomics-guided discovery of thailanstatins A, B, and C as pre-mRNA splicing inhibitors and antiproliferative agents from Burkholderia thailandensis MSMB43. J. Nat. Prod. 76, 685-693.

27

CHAPTER 2: Engineering unnatural variants of plantazolicin through codon reprogramming1

2.1 Introduction The clinical deployment of antibiotics revolutionized medical practice in the 20th century, but beginning in the late 1960s, the development of new classes of antibiotics slowed drastically while bacteria continued to acquire resistance (1). Natural products have historically been the most valuable source of antibiotics and still today present a promising avenue to combat the growing problem of infections caused by antibiotic-resistant bacteria. Indeed, of the new antibacterial compounds approved by the FDA between 1981 and 2010, 66% were natural products or natural product-derived (2). In recent years, the increased ease of bacterial genome sequencing has facilitated the development of new strategies to tap into unexplored regions of natural product chemical space (3). Natural product biosynthesis can potentially be “reprogrammed” to produce novel variants through genetic manipulation and/or supplying alternate substrates. Natural product engineering in modular biosynthetic pathways, such as polyketide synthases or non-ribosomal peptide synthetases, remains difficult due to the complex nature of the multi-domain megasynthases (4-7). In contrast, the reprogramming of ribosomally synthesized and post- translationally modified peptides (RiPPs) is straightforward, as the natural product structure is directly encoded by the sequence of the precursor peptide gene (8). Thus, site-directed mutagenesis leads to a predictable outcome in the resultant natural product, provided the modification enzymes tolerate the unnatural sequence. Precursor peptide replacement in vivo has been used to study multiple RiPPs, most notably the lantibiotic nisin (9-11), leading to the identification of multiple variants with increased activity. Similar approaches have also been applied to the study of other lanthipeptides, including lacticin 3147 (12, 13), mersacidin (14), lichenicidin (15), and actagardine A (16). This strategy has also garnered recent attention outside of the lanthipeptide family, where it has been successfully employed to engineer novel variants and explore the substrate requirements for the precursor peptides of several thiopeptides (17-21), cyanobactins (22) and microcin J25 (23). The thiazole/oxazole-modified microcins (TOMMs) (8) comprise one RiPP subfamily characterized by thiazoline, (methyl)oxazoline, and the corresponding 2-electron oxidized azoles, 1Reprinted with permission from Deane, C. D., Melby, J. O., Molohon, K. J., Susarrey, A. R., and Mitchell, D. A. (2013) ACS Chem. Biol. 8, 1998-2008. Copyright 2013 American Chemical Society. J.O.M. performed MS/MS analysis; K.J.M. and A.R.S. provided fosmids. 28

deriving from cysteines, serines, and threonines (Figure 2.1) (24). A trimeric enzymatic complex of a cyclodehydratase (C and D proteins) and a FMN-dependent dehydrogenase (B protein) catalyzes heterocycle formation and oxidation (Figure 2.1c) (24, 25). Plantazolicin (PZN), a TOMM produced by the soil bacterium, Bacillus amyloliquefaciens FZB42, displays a unique polyheterocyclic structure. After heterocycle formation, the unmodified N-terminal (leader peptide) region of the precursor peptide is proteolytically removed, likely by the putative peptidase (E protein) encoded in the gene cluster (Figure 2.1a) (26, 27). Lastly, two methyl groups are installed onto the new N-terminus of the core peptide by an S-adenosylmethionine (SAM)- dependent methyltransferase (L protein) to yield the mature natural product (Figure 2.1d), which is then predicted to be exported by a dedicated transporter (G and H proteins), also within in the gene cluster (26, 27). PZN exhibits highly discriminating antibacterial activity against the causative agent of anthrax, Bacillus anthracis (27). The spectrum of PZN activity excludes all Gram-negative strains and most Gram-positive strains tested thus far. Exploration within the closely related B. cereus subgroup revealed B. anthracis as the only strain significantly susceptible to PZN (27). To date, similar biosynthetic gene clusters have been identified in four other bacterial species, indicating that these species have (or had at one time) the genetic capacity to produce PZN-like compounds, although only B. pumilus has been shown to do so during laboratory cultivation (27). The precursor peptides in these species display a high degree of sequence similarity to the PZN precursor peptide (BamA), including many conserved residues in the core peptide (Figure 2.1b) (27). However, the role that these conserved residues play in the posttranslational modification of PZN and their im- portance in the bioactivity of the mature compound remained unknown. The current study sought to simultaneously address the structure-activity relationship of PZN and establish the substrate requirements for the biosynthetic pathway using heterologous expression of PZN analogs in E. coli.

29

Figure 2.1. Genetics and enzymology of PZN biosynthesis. (a) The 10 kb PZN biosynthetic gene cluster from Bacillus amyloliquefaciens FZB42. (b) The precursor peptides for PZN and potential analogs from other species exhibit remarkable similarity. Bam, B. amyloliquefaciens FZB42; Bpum, B. pumilus ATCC 7061; Cms, Clavibacter michiganensis subsp. sepedonicus; Cur, Coryne- bacterium urealyticum DSM 7109; Blin, Brevibacterium linens BL2. Color-coding (predicted for Cms, Cur, and Blin): green, Nα,Nα-dimethylarginine; red, thiazoles; blue (methyl)oxazoles; brown, methyloxazoline; *, peptidase cleavage site. (c) Posttranslational modification of cysteine, serine, and threonine residues in TOMM biosynthesis proceeds by cyclodehydration (mass loss of 18 Da) and subsequent oxidation (additional mass loss of 2 Da). (d) The chemical structure of PZN. Color- coding is identical to panel (b).

30

2.2 Heterologous production of PZN Although B. amyloliquefaciens FZB42 exhibits natural competency (28), the inability to replicate plasmids, coupled with low transformation efficiency of integrative plasmids, necessitated the development of a heterologous system for facile expression and screening of PZN variants. Genomic DNA from the RS32 strain of B. amyloliquefaciens, in which the precursor peptide gene (bamA) is insertionally disrupted by a spectinomycin resistance cassette (26), was digested and ligated into a DNA vector backbone to generate a fosmid library in E. coli. A fosmid bearing the ΔbamA PZN biosynthetic gene cluster (Figure 2.2) was identified through PCR screening of this library. The bamA PZN cluster fosmid was under rhamnose-dependent copy control in E. coli (29, 30), while the PZN biosynthetic genes were under control of the native FZB42 promoters. To determine whether the fosmid-borne PZN biosynthetic enzymes were capable of modifying BamA supplied in trans, the PZN precursor peptide gene, bamA, was cloned into E. coli expression vector pBAD24 with an N-terminal fusion to maltose-binding protein (MBP) using standard methods. Expression of bamA from this construct was under the control of

the arabinose-inducible promoter, Para (31).

Figure 2.2. Heterologous production of PZN in E. coli via trans-complementation. Shown are MALDI mass spectra of methanolic surface extracts from E. coli containing a fosmid bearing the PZN cluster ΔbamA both alone (top, blue) and with the pBAD24-MBP-BamA plasmid construct grown in the absence (middle, green) and presence (bottom, red) of 10 mM arabinose. These data indicate the successful rescue of PZN production by the plasmid-borne, MBP-tagged bamA gene. Although not strictly quantitative, the intensity of ions is a crude measure of analyte abundance, as these spectra were acquired with identical laser power and number of laser shots, and are shown to scale. The small but detectable amount of PZN produced in the absence of arabinose (middle, green spectrum) is likely due to the leakiness of the Para promoter. Omission of rhamnose from the culture produced data indistinguishable from the “fosmid alone” spectrum (blue).

31

Following growth in Luria-Bertani (LB) media supplemented with rhamnose and arabinose, PZN was obtained from the cell surface of the E. coli host by non-lytic methanolic extraction. Successful production and export of mature PZN was detected by high-performance liquid chromatography (HPLC) and matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) analysis (Figures 2.2-2.3), albeit at ~5% of the 200-300 µg L–1 output from the native B. amyloliquefaciens producer (27). It is plausible that the lower production level stems from an inefficient usage of B. amyloliquefaciens promoters by E. coli. E. coli cells containing the ΔbamA PZN fosmid only produced PZN when complemented with pBAD24-MBP- BamA after induction with rhamnose and arabinose (Figure 2.2). As short peptides have poor half- lives in E. coli when lacking a fusion partner (32, 33), inclusion of MBP was necessary for detection of PZN produced from plasmid-borne BamA. When the MBP tag was omitted, heterologous production of PZN was below the detection limit within the context of the crude cell surface extract, <20 pg (data not shown).

Figure 2.3. Purification of heterologously expressed PZN. (a) UV chromatogram (266 nm) from HPLC purification of PZN produced from E. coli bearing the PZN cluster ΔbamA fosmid and the pBAD24-MBP-BamA plasmid construct. Dashed lines indicate the fraction collected for isolation of purified PZN. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a. The m/z 1354 species derives from methyloxazoline hydrolysis, yielding a reinstated threonine residue (27).

32

To provide an internal standard for evaluating the production level of PZN variants, a second fosmid was constructed bearing the PZN biosynthetic gene cluster from B. amylolique- faciens FZB42 with the precursor peptide gene (bamA) intact. MALDI-MS and HPLC analysis of methanolic surface extracts from E. coli bearing this fosmid indicated production and export of mature PZN, even in the absence of the pBAD24-MBP-BamA construct (Figure 2.4).

Figure 2.4. Production of PZN from E. coli containing the entire PZN cluster fosmid (including bamA). (a) MALDI-MS of a methanolic surface extract from E. coli containing a fosmid bearing the entire PZN cluster grown in LB supplemented with 10 mM rhamnose and 10 mM arabinose, indicating heterologous production of PZN. (b) Single-ion monitoring of m/z 1336 during LC-MS analysis of the surface extract shown in panel a. The peak at 25.55 minutes co-elutes with an authentic PZN standard (data not shown).

2.3 Design and evaluation of BamA mutants In order to assess the ability of the PZN posttranslational pathway to accept alternate BamA substrates, nine residues in the core region of BamA were targeted for mutagenesis, based on the sequence similarity among the potential PZN producers (Figure 2.1b, Figure 2.5). This subset of the precursor peptide was selected in order to (i) include the most-conserved as well as the least- conserved residues; (ii) cover both heterocyclized and non-heterocyclized positions; and (iii) span the length of the precursor peptide core region. Rather than employ saturation mutagenesis, directed mutant panels were designed at eight of these positions (Figure 2.5). Each panel contained conservative and non-conservative mutations in order to sufficiently explore the available chemical space without generating an unnecessarily large number of mutants. Mutants were prepared by site-directed mutagenesis using primers with rationally chosen degenerate codons at

33

select positions of pBAD24-MBP-BamA. Following mutagenesis, polyclonal plasmid collections were transformed into competent E. coli cells bearing the complete PZN gene cluster fosmid. A medium-throughput method was employed to screen clones in groups of 20-40 for production of unnatural PZN analogs. Individual clones were cultured, PZN biosynthetic gene expression was induced with rhamnose and arabinose, and unnatural PZNs were isolated by non-lytic methanolic surface extraction, conditions which preclude isolation of intracellular contaminants. MALDI-MS was used to screen the surface extracts for unnatural PZN variants (Figure 2.6) and the positive clones were subjected to DNA sequencing (Figure 2.7). In addition, a selection of plasmids which did not produce a PZN variant were also sequenced in order to identify which precursor peptides were not processed by the posttranslational modifying enzymes (Figure 2.7). The confirmed DNA sequence and the observed masses enabled deduction of the number and type of posttranslational modifications for each variant (Figure 2.6). Using data from MALDI-MS analysis, comparison of the peak heights of wild-type PZN (fosmid-borne BamA, m/z 1336) and mutant PZN (plasmid- borne mutant BamA, variable mass) yielded a qualitative measure of production levels of the PZN variants (Figure 2.7). This preliminary assessment of production level enabled determination of which variants most warranted large-scale expression and purification, circumventing the time- and labor-intensive process of doing so with every variant.

Figure 2.5. Design of site-directed PZN mutant panels. Mutagenesis was performed using primers containing the indicated degenerate codons, resulting in a panel of plasmids containing bamA genes with selected positions individually replaced by the designated amino acids. N = any nucleic acid; B = C, G, or T; K = G or T; M = A or C; S = C or G; V = A, C, or G; Y = C or T.

34

Figure 2.6. Representative MALDI mass spectra of PZN analogs. (a) Methanolic surface extracts from E. coli containing the PZN cluster ΔbamA fosmid plus BamA mutant plasmids. Color-coding: red, I34X mutants; blue, T40X mutants; green, F41X mutants. As indicated by the peak labels and the lack of m/z 1336 in each sample, production of PZN variants is only possible from plasmid- borne mbp-bamA. The m/z 1336 signal in the T40C spectrum corresponds to a variant containing a thiazole at position 40. The peaks at m/z 1324 and 1340 in the F41C spectrum correspond to oxidized C-terminal Cys (R-SOxH, x = 2-3). Other labeled peaks correspond to PZN variants with wild-type-like modifications. (b) Same as panel a, but with the full PZN cluster fosmid plus BamA mutant plasmids. Wild-type PZN (m/z 1336) appears in every sample.

35

Figure 2.7. Summary of PZN variant production. PZN variants were heterologously expressed in E. coli prior to methanolic surface extraction and MALDI-MS analysis. Using this method, 29 point mutants of BamA were successfully converted to PZN variants (listed above core sequence), while 43 were not detected (listed below core sequence). Colored outlines indicate modifications to the altered residue as specified in the figure legend. Of the 29 analogs produced above the limit of detection (>20 pg in crude surface extracts), 10 were produced at levels greater than or equal to that of wild-type PZN (bold outlines), as assessed by relative peak heights during MALDI-MS analysis. The T40C variant produced two analogs, one with a thiazole at position 40 and the other with a thiazoline.

36

Assessment of the 72 mutants of BamA resulted in 29 clones that successfully produced and exported PZN variants within our detection limit (Figure 2.7). In this in vivo production system, a mutant precursor peptide must be successfully accepted as a substrate by multiple steps of the biosynthetic pathway (cyclodehydration, dehydrogenation, and leader peptidolysis), in addition to export, in order to yield a detectable product. Failure of any of these processes to accept an unnatural precursor peptide substrate led to an absence of detectable PZN variant. It is worth noting that such a result would not likely arise from failure at the methylation step of the biosynthetic pathway, as desmethylPZN can be successfully exported by B. amyloliquefaciens (27). Of the 29 clones that successfully produced mature PZN variants, 26 had masses consistent with “wild-type-like” modifications: nine azole rings, one azoline ring, leader peptidolysis N- terminal to Arg28, and N-terminal dimethylation (Figure 2.1d). Three exceptions, PZN-T40A, - T40I, and -T40V, all produced MALDI-MS data consistent with the presence of one fewer heterocyclizable residue (nine azoles, no azolines) than wild-type PZN, as expected (Figure 2.7). Three other clones, I34M, T40C, and F41C, each yielded one or more modifications over “wild- type-like” processing, consistent with varying oxidation states at the mutated position (Figures 2.6a, 2.8-2.10).

37

Figure 2.8. High resolution, Fourier transform mass spectrometric (FTMS) analysis of PZN-T40C variants. (a) Structure, predicted fragments, and the calculated monoisotopic masses for PZN- T40C (deca-azole variant). One of the fragments was derived from multiple bond cleavages, denoted by the superscript. (b) Same as panel a, but for PZN-T40C (thiazoline variant). The thiazoline moiety is shown in orange. (c) Collision-induced dissociation of PZN-T40C (deca-azole variant, C62H65N17O12S3, calc. m/z 1336.4239; expt. m/z 1336.4243, error 0.2 ppm) was consistent with the predicted structure. (d) Collision-induced dissociation of PZN-T40C (thiazoline variant, C62H67N17O12S3, calc. m/z 1338.4395; expt. m/z 1338.4395, error 0.0 ppm) was consistent with the predicted structure, including localization of the thiazoline to position 40.

38

Figure 2.9. Fourier transform mass spectrometric (FTMS) analysis of PZN-F41C. (a) Structure, predicted fragments, and calculated monoisotopic masses for a sulfonic acid derivative of PZN- F41C. One of the fragments was derived from multiple bond cleavages, denoted by the superscript. (b) Collision-induced dissociation of PZN-F41C (sulfonic acid derivative, C57H65N17O16S3, calc. m/z 1340.4036; expt. m/z 1340.4035, error 0.1 ppm) permitted localization of the sulfonic acid moiety to position 41.

39

Figure 2.10. Purification of heterologously-expressed PZN-I34M. (a) UV chromatogram (266 nm) from HPLC purification of PZN-I34M produced from E. coli engineered to produce the I34M analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-I34M. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a. The m/z 1370 species derives from air oxidation of the methionine side chain to a sulfoxide.

Notably, the mutant bamA sequences differed by three nucleotides at most (out of 1394 bp, including MBP tag) and did not contain any rare E. coli codons (34). Thus, the expression levels of the MBP-BamA peptides were expectedly comparable (Figure 2.11). The lack of detectable PZN variants for some mutants is attributed to the selectivity of the biosynthetic pathway and not to poor expression of the precursor peptide. Ten of the 29 variants which were produced in the greatest quantity (Figure 2.7), as defined by ion intensity in MALDI-MS relative to that of the PZN internal standard (fosmid-borne bamA), were purified by HPLC (Figures 2.10, 2.12-2.20). Subsequent tandem, high-resolution Fourier transform mass spectrometry (FTMS) data were consistent with the predicted structures (Figures 2.8, 2.9, 2.21-2.29). Eight of the ten PZN variants that were produced at “high” levels in E. coli harbored mutations at positions in PZN that do not receive side chain modifications: Ile34, Ile35, and Phe41 (Figure 2.7). This production level trend suggested that the biosynthetic pathway was more tolerant to variation at non-cyclized positions relative to cyclized positions. The two remaining high production variants, PZN-S38C and -T40C

40

(Figure 2.7), both replace a Ser/Thr with the heterocycle-competent Cys. In each case, the increased production level of the PZN variant was consistent with the increased nucleophilicity of the Cys side chain, which typically enhances the rate of cyclodehydration (35-37).

Figure 2.11. Protein expression of select MBP-BamA mutants compared to wild-type levels. MBP tagged proteins were purified by amylose affinity chromatography from E. coli bearing the PZN cluster fosmid and pBAD24-MBP-BamA, following growth with arabinose (to induce expression from pBAD24) and rhamnose (to induce greater copy number of the fosmid). Equal volumes of purified protein extracts were analyzed by SDS-polyacrylamide gel electrophoresis and detected by Coomassie staining.

Figure 2.12. Purification of heterologously-expressed PZN-I34T. (a) UV chromatogram (266 nm) from HPLC purification of PZN-I34T produced from E. coli engineered to produce the I34T analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-I34T. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

41

Figure 2.13. Purification of heterologously-expressed PZN-I34V. (a) UV chromatogram (266 nm) from HPLC purification of PZN-I34V produced from E. coli engineered to produce the I34V analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-I34V. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

Figure 2.14. Purification of heterologously-expressed PZN-I35V. (a) UV chromatogram (266 nm) from HPLC purification of PZN-I35V produced from E. coli engineered to produce the I35V analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-I35V. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

42

Figure 2.15. Purification of heterologously-expressed PZN-S38C. (a) UV chromatogram (266 nm) from HPLC purification of PZN-S38C produced from E. coli engineered to produce the S38C analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-S38C. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

Figure 2.16. Purification of heterologously-expressed PZN-T40C. (a) UV chromatogram (266 nm) from HPLC purification of PZN-T40C produced from E. coli engineered to produce the T40C analog of PZN. Dashed lines indicate the fractions collected for isolation of purified PZN-T40C variants. The peak at 23 min corresponds to the T40C variant containing a thiazoline at position 40, while the peak at 25.5 min corresponds to the variant containing a thiazole at position 40. (b) MALDI-MS spectra of the fractions denoted by dashed lines in panel a.

43

Figure 2.17. Purification of heterologously-expressed PZN-F41A. (a) UV chromatogram (266 nm) from HPLC purification of PZN-F41A produced from E. coli engineered to produce the F41A analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-F41A. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

Figure 2.18. Purification of heterologously-expressed PZN-F41L. (a) UV chromatogram (266 nm) from HPLC purification of PZN-F41L produced from E. coli engineered to produce the F41L analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-F41L. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

44

Figure 2.19. Purification of heterologously-expressed PZN-F41V. (a) UV chromatogram (266 nm) from HPLC purification of PZN-F41V produced from E. coli engineered to produce the F41V analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-F41V. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

Figure 2.20. Purification of heterologously-expressed PZN-F41Y. (a) UV chromatogram (266 nm) from HPLC purification of PZN-F41Y produced from E. coli engineered to produce the F41Y analog of PZN. Dashed lines indicate the fraction collected for isolation of purified PZN-F41Y. (b) MALDI-MS spectrum of the fraction denoted by dashed lines in panel a.

45

Figure 2.21. Mass spectrometric analysis of PZN-I34M. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-I34M. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-I34M. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

46

Figure 2.22. Mass spectrometric analysis of PZN-I34T. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-I34T. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-I34T. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

47

Figure 2.23. Mass spectrometric analysis of PZN-I34V. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-I34V. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-I34V. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

48

Figure 2.24. Mass spectrometric analysis of PZN-I35V. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-I35V. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-I35V. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

49

Figure 2.25. Mass spectrometric analysis of PZN-S38C. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-S38C. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-S38C. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

50

Figure 2.26. Mass spectrometric analysis of PZN-F41A. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-F41A. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-F41A. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

51

Figure 2.27. Mass spectrometric analysis of PZN-F41L. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-F41L. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-F41L. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

52

Figure 2.28. Mass spectrometric analysis of PZN-F41V. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-F41V. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-F41V. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

53

Figure 2.29. Mass spectrometric analysis of PZN-F41Y. (a) High resolution, Fourier Transform MS was used to structurally characterize PZN-F41Y. A m/z scan showed an ion in the 1+ charge state with the observed monoisotopic m/z value. (b) After isolating the peak observed in panel a, collision-induced dissociation produced the given spectrum. (c) Predicted fragments and their calculated monoisotopic m/z for observed ions from MS/MS in panel b were consistent with the predicted structure of PZN-F41Y. Some of the observed fragments are derived from multiple bond cleavages, denoted by superscripts.

2.4 Substrate tolerance at heterocyclized positions As the majority of analyzed mutant sequences failed to yield mature PZN variants, the PZN biosynthetic enzymes were less promiscuous in vivo than expected based on studies of other TOMMs (17, 35, 37, 38). Cys29 and Cys31, which are invariant throughout the PZN class (Figure 2.1b) (27), were critical for posttranslational processing of BamA. No mutation at position 29 or 31 led to detectable PZN product formation in vivo (Figure 2.7). Even conservative mutations of

54

Cys to Ser/Thr, or to the azol(in)e mimic Pro (38), failed to produce mature analogs (Figure 2.7), demonstrating an unusual chemoselectivity for cysteine at these positions (35-37, 39). Multiple attempts to isolate intermediates in the posttranslational cascade through utilization of the MBP tag via amylose affinity purification, coupled with MALDI-MS, were unsuccessful. For both accepted and unaccepted sequences, only the full-length unmodified BamA peptide could be detected in cell lysates. Identical results where obtained even when the arabinose concentration was lowered (serial dilutions to a final concentration of 3 µM) in an attempt to prevent any processed peptide signal from being overwhelemed by newly synthesized BamA (31). This result implies that failure to cyclize a particular residue aborts downstream modification required for maturation and export, and that cyclodehydration is the first committed step in PZN maturation, but this hypothesis will require further confirmation. In contrast to the invariant Cys at positions 29 and 31 (Figure 2.7), the PZN biosynthetic pathway tolerated conservative, heterocyclizable mutations at Thr33 and Ser38, but allowed more non-conservative mutations at Thr40 (Figure 2.7). Consistent with the variation at these positions in other PZN-like precursor peptides (Figure 2.1b), all other heterocyclizable residues substituted at these three positions resulted in full (“wild-type-like”) maturation of the PZN analog, including cyclodehydration of the substituted residue (Figure 2.7). The absence of -nucleophile-containing amino acids at these positions disrupts the contiguous polyheterocyclic structure of PZN. In the case of Thr33 and Ser38, we posit that this disruption prevents the formation of downstream heterocycles, as observed for other TOMM natural products (37, 40). However, the relative promiscuity of the cyclodehydratase was markedly increased at Thr40, as several non-cyclizable, nonpolar residues (Ala, Val, Ile) were tolerated by the biosynthetic pathway, in addition to other heterocyclizable residues (Cys, Ser). The above-described intolerance towards mutations at Cys29 and Cys31, contrasted with the promiscuity at Thr40, suggests N- to C-terminal processing for PZN, which is the canonical direction of RiPP modification (36, 37, 41). Alternatively, if position 40 is not the last position to be cyclodehydrated, other modifications of PZN certainly do not depend on position 40 being cyclized, unlike the case for positions 29 and 31. In native PZN, Thr40 is converted to a methyloxazoline (Figure 2.1d) and is the only heterocycle to evade dehydrogenation (27). Initial MALDI-MS results found that the PZN-T40S and -T40C variants each contained an azoline, presumably at position 40 (Figure 2.7). Given that thiazolines hydrolyze under mildly acidic conditions much more slowly than the corresponding

55

(methyl)oxazolines (42, 43), we wondered if differential hydrolysis would be useful to rapidly assess the azoline content of natural products (44). Upon exposing native PZN, -T40S, and -T40C to mild aqueous acid, only T40C was hydrolytically stability under the conditions employed (Figure 2.30). Intriguingly, when the T40C analog was produced in the strain bearing the PZN ΔbamA fosmid, the absence of wild-type PZN (m/z 1336) permitted observation of a fully-oxidized variant (deca-azole, m/z 1336), which was unreactive towards acid hydrolysis (Figure 2.30). FTMS/MS analysis confirmed the proposed structures (Figure 2.8).

Figure 2.30. Hydrolytic stability of PZN analogs varied at Thr40. In naturally occurring PZN, Thr40 exists as a methyloxazoline. (a) MALDI-MS spectra from methanolic surface extracts of E. coli engineered to produced native PZN, -T40S, and -T40C. (b) Same as panel a, after treatment with 0.2% HCl for 4 h at 23 °C to induce hydrolysis. Under these conditions, the (methyl)oxazolines were virtually quantitatively hydrolyzed while the T40C thiazoline was minimally hydrolyzed. The deca-azole variant contains a thiazole at position 40, which was unreactive towards hydrolysis.

2.5 Substrate tolerance at non-cyclized positions As Thr40 could be substituted with noncyclizable residues, the substrate tolerance at naturally occuring noncyclizable residues was evaluated. Ile34 and Phe41 both lie on the C- terminal side of a contiguous polyazol(in)e moiety (Figure 2.1d), prompting the question of whether an additional heterocycle could be formed at these positions. While substitution with cyclizable residues was tolerated at these positions, the side chains did not undergo cyclodehydration (Figure 2.7), demonstrating that the PZN cyclodehydratase is remarkably

56 regioselective. These structural assignments were supported by MS/MS data on the PZN-F41C and -I34T analogs (Figures 2.9, 2.22). The free thiol of PZN-F41C was clearly susceptible to oxidation, as indicated by the MS-based detected of the sulfinic (R-SO2H) and sulfonic (R-SO3H) acid derivatives (Figures 2.6a, 2.9). Installation of a stop codon at position 41, however, failed to produce the expected PZN truncation (Figure 2.7), suggesting that the biosynthetic pathway required the presence of a C-terminal flanking residue for a cyclized position, as previously observed for other TOMMs (37, 39). In the case of PZN-I34T, the newly installed Thr at position 34 could not be definitively proven to remain uncyclized, due to the difficulty of obtaining dissociation between heterocycles during MS/MS (Figure 2.22). However, the data were consistent with the presence of five fully- oxidized heterocycles in the N-terminal half of the molecule, as seen in wild-type PZN (27). Thus, if the substituted Thr was cyclized, it would require the omission of a normally-installed heterocycle, which seems unlikely. Although the PZN biosynthetic enzymes could tolerate PZN- I34S/T, the pathway was unable to fully process PZN-I34C (Figure 2.7). It is possible that cyclodehydration of this newly installed Cys resulted in an intermediate that was incompatible with downstream processing. Unfortunately, all attempts to isolate potential biosynthetic intermediates of PZN-I34C resulted in the detection of only the full-length unmodified peptide (data not shown). Indirect support for the incompatibility of cyclization from PZN-I34C comes from variant -I34P, the only other assessed mutant at position 34 that did not produce PZN (Figure 2.7). As proline is known to mimic azol(in)e heterocycles, the intolerance towards PZN-I34P suggests the inability of the biosynthetic pathway to cope with the lack of conformational flexibility at this position (37, 38). Among the four residues with unmodified side chains in the core sequence of wild-type BamA, the PZN biosynthetic pathway was least tolerant to mutation at Ile35 and Arg28 (Figure 2.7). Only conservative mutations were tolerated at Ile35: Ala, Val, and the sterically similar Asn (Figure 2.7). Many of the substitutions tolerated at Ile34 did not yield detectable mature PZN analogs when probed at Ile35, including Gly, aromatic residues, and polar residues. This trend was consistent with the flanking residue requirements observed with other TOMM cyclodehydratases (37, 39), where the constraints for the residue in the –1 (N-terminal) position flanking a heterocycle are stricter than for the residue in the +1 (C-terminal) position. In PZN, with its two contiguous polyazol(in)e moieties, Ile35 occupies the –1 position for the C-terminal ring system, while Ile34

57 and Phe41 lie in +1 positions. By extension, Arg28 (Nα,Nα-dimethylarginine in mature PZN) is in the –1 position for the N-terminal ring system (Figure 2.1d). The most conservative mutation possible, R28K, was not tolerated by the PZN biosynthetic pathway, and as such, did not garner further attention (Figure 2.7). Conceivably, the failure to process the R28K mutant was due to its proximity to the leader peptidase cleavage site, thus inhibiting leader peptide cleavage and export. Numerous attempts to detect intracellular PZN intermediates with intact leader peptides by affinity chromatography were unsuccessful. Post-proteolysis, the methylation state of PZN should not affect export, as desmethylPZN was successfully exported in a ∆bamL strain of B. amyloliquefaciens (26, 27).

2.6 Pathway tolerance for variations in substrate length In the continued interest of exploring the substrate tolerance of the PZN biosynthetic pathway, several BamA analogs were constructed to expand the PZN sequence by one residue in the middle of each polyazol(in)e moiety, between the two central Ile residues, and at the C- terminus (Figure 2.31a). Production of PZN variants based on these peptides was assessed as described above. BamA peptides expanded by an internal alanine residue were not processed. However, expansion of the peptide by a single residue at the C-terminus was tolerated in all cases tested, with the exception of proline (Figure 2.31b). In addition to these expanded substrates, a contracted version of BamA was constructed by deletion of Ser38, potentially reducing the C- terminal polyazole system from five heterocycles to four. However, this substrate also eluded processing. Taken together with the failure to process substrates truncated at positions 35 and 41 (Figure 2.7), these data indicated that one or more of the PZN biosynthetic enzymes have precise length requirements as to the arrangement of heterocyclized residues within the BamA precursor peptide. In this regard, PZN biosynthesis stands in stark contrast to several other characterized TOMM biosynthetic pathways (18, 37, 39).

58

Figure 2.31. Tolerance of the PZN biosynthetic pathway towards expanded substrates. (a) PZN variants were heterologously expressed in E. coli prior to methanolic surface extraction and MALDI-MS analysis. Peptides elongated by a single residue at the C-terminus were modified and exported, with the exception of Pro. Internally expanded peptides were not tolerated by the biosynthetic pathway. Color-coding: same as in Fig. 3. (b) MALDI-MS of methanolic surface extracts of E. coli engineered to produce PZN with one residue added at the C-terminus. Addition of Ala, Gln, Glu, or Gly to the C-terminus of PZN led to the appearance of a +43 Da species. Although this species remains to be characterized, a +43 mass shift is consistent with carbamylation (45-48). Peaks at m/z 1336 correspond to wild-type PZN (dashed line). Peaks at m/z 1363 and 1385 derive from the E. coli host and are not related to PZN.

2.7 Production of other PZN-class natural products Although four other publicly available bacterial genomes encode PZN-like biosynthetic gene clusters (Figure 2.1), to date, only Bacillus pumilus has been shown to produce PZN during laboratory cultivation (27). The amino acid sequences of the core peptides from B. pumilus ATCC 7061 and B. amyloliquefaciens FZB42 are identical (Figure 2.1b) (27), rendering it necessary to

59

look to the other potential PZN producers in order to determine if E. coli could biosynthesize non- Bacillus PZN compounds. As substrate binding is mediated by the leader sequence (38, 49-51), the PZN biosynthetic enzymes from B. amyloliquefaciens were unlikely to accept significantly different leader peptides as substrates. Indeed, attempts to heterologously produce the Clavibacter michiganensis subsp. sepedonicus (Cms) encoded PZN (via pBAD24-MBP-CmsA) using the FZB42 biosynthetic pathway (fosmid-borne) proved unsuccessful. To circumvent the potential failure of leader peptide recognition during E. coli heterologous expression, precursor peptide chimeras were generated that consisted of the leader peptide sequence from the native producer (FZB42) and the core peptide sequence from other family members, yielding BamA-CmsA, BamA-CurA, and BamA-BlinA (Cur, urealyticum DSM 7109; Blin, Brevibacterium linens BL2, Figure 2.32). Similar strategies employing precursor peptide chimeras have been previously used in other RiPP systems to enable the processing of unnatural core peptides (37, 38). Surprisingly, none of the products from the PZN chimeras were detected, although production of the unmodified peptides was verified by MALDI-MS (Figure 2.33). The inability to detect PZN-class natural products derived from the chimeric precursor peptides, despite their overall similarity to BamA/PZN, provided further evidence for the extraordinary specificity exhibited by the FZB42 biosynthetic pathway.

Figure 2.32. Sequence alignment of PZN chimera peptides. The BamA leader peptide was used to replace the naturally occurring sequences for other members of the PZN class of natural products. Green, predicted dimethylation; blue, predicted heterocycles; red, mutation made to BamA-CmsA chimera; *, leader peptide cleavage site (assumed for chimeric peptides). Bam, B. amyloliquefaciens; Cms, Clavibacter michiganensis subsp. sepedonicus; Cur, Corynebacterium urealyticum; Blin, Brevibacterium linens.

60

Figure 2.33. Expression of PZN chimera peptides. MBP-tagged proteins were purified by amylose affinity chromatography from E. coli bearing the PZN cluster fosmid and pBAD24-MBP-BamA- CmsA, MBP-BamA-CurA, and MBP-BamA-BlinA, following growth with arabinose and rhamnose. After cleavage by TEV protease, the purified proteins were analyzed by MALDI-MS, which indicated that the peptides had not been processed by the PZN biosynthetic enzymes. The experimentally determined (Exp) versus calculated (Calc) masses deviate due to the measured masses falling outside of reflector mode calibration range (900-3700 Da).

To shed further light on the inability of the chimeric peptides to produce PZN variants in E. coli, efforts were focused on the PZN cluster from C. michiganensis. This cluster bears the highest degree of sequence conservation to FZB42 (aside from B. pumilus) among: (i) the precursor peptide (51/80; % identity / % similarity), (ii) the heterocycle-forming proteins (B: 57/87, C: 41/77, D: 57/76; % identity / % similarity), (iii) the other modifying enzymes (E: 16/55, L: 25/60), and (iv) the export apparatus (G: 33/71, H: 19/59) (27, 52, 53). In an effort to render the BamA-CmsA chimera more similar to BamA, two constructs were generated. These included the point mutant BamA-CmsA-S32T and truncation of the C-terminal Gly, BamA-CmsA-G42term (Figure 2.32). Unexpectedly, these altered BamA-CmsA sequences also failed to produce PZN analogs, indicating that neither of these positions was individually responsible for the failure of BamA-CmsA to be tolerated by the FZB42 pathway. The previously analyzed mutants BamA- I34V and -S38T both yielded mature PZN analogs (Figure 2.7), indicating that the Val34 or Thr38 residues found in CmsA (Figure 2.32) were also not responsible for processing failure. It is not

61

obvious how the two residues in BamA-CmsA flanking Thr38 (i.e. Thr37 and Cys39) could pose an obstacle to posttranslational modification. Trp41 also seems an unlikely problem, given that PZN variants at position Phe41 were successfully biosynthesized with a wide variety of amino acid replacements (Figure 2.7). Taken together, these data indicate that the failure of the PZN pathway to process the BamA-CmsA chimera was not due to a single residue difference in the precursor peptide, but instead a culmination of several amino acid changes.

2.8 Antibacterial activity of PZN analogs Of the 29 PZN variants detected, ten were produced in sufficient quantity (Figure 2.7) to permit assessment of their antibacterial activity against B. anthracis Sterne. These ten analogs, as well as the wild-type BamA sequence, were expressed in conjunction with the PZN ΔbamA fosmid, extracted in methanol, and purified by HPLC (Figures 2.10, 2.12-2.20), each yielding 3-5 µg of purified PZN analog per 15 cm diameter plate of LB agar. The minimum inhibitory concentrations (MICs) of the purified PZN analogs were then determined by microbroth dilution assay against B. anthracis Sterne (Table 2.1). The activity of heterologously produced wild-type PZN was in good agreement with the published value for PZN from the native host (27). The analogs exhibited a range of antibacterial potencies, although no analogs were observed to have increased potency against B. anthracis.

Table 2.1. Antibacterial activity of PZN analogs against B. anthracis Sterne.

Variant MICa (µg mL–1) MIC (µg mL–1) heterocycle at 40 Thr at 40 PZN 2 32b I34M 32 >32 I34T 8 >32 I34V 8 >32 I35V 16 32 S38C 32 >32 T40C (thiazoline) 8 N.D.c T40C (thiazole) 32 N/Ad F41A 8 32 F41L 4 >32 F41V 16 32 F41Y >32 >32 aMIC = minimum inhibitory concentration. bRef. (27). cN.D. = not determined. dN/A = not applicable.

62

As expected, conservative mutations to the non-cyclized Ile residues (I34V, I34T, I35V) resulted in relatively modest reductions (four- to eight-fold) in antibacterial potency (Table 2.1). In contrast, PZN-I34M, which lacks a -methyl side chain substituent, resulted in a 16-fold reduction in potency, indicating that this region of the PZN structure does contribute to the anti-B. anthracis activity. Similarly, relatively conservative mutations to the C-terminal Phe (F41A, F41L, F41V) also led to modest reductions (two- to eight-fold) in potency. Surprisingly, a conservative mutation at this position, F41Y, abolished the antibacterial activity of PZN (Table 2.1). The similarly conservative S38C mutation (replaces an oxazole with a thiazole) also resulted in a 16- fold reduction in activity (Table 2.1). As these moieties are isosteric, the altered activity of the PZN analog must arise from electronic effects. Analysis of PZN-T40C yielded two analogs differing in heterocycle oxidation at the penultimate position (Figure 2.8). These species were separable by HPLC (Figure 2.16) and tested for anti-B. anthracis activity. PZN-T40C(thiazoline) resulted in a four-fold reduction in anti-B. anthracis potency, while the presence of the fully oxidized, “deca-azole” T40C(thiazole) resulted in a 16-fold reduction in activity (Table 2.1). These data indicate that the posttranslational status of position 40 plays an important role in the activity of PZN. These results concur with the previously noted 16-fold loss of activity upon hydrolysis of the Thr40 methyloxazoline in native PZN (Table 2.1) (27). As expected, none of the oxazoline-containing PZN variants assayed in this study exhibited significant anti-B. anthracis activity (MIC <32 µg/mL) when hydrolyzed (Table 2.1). Analogs lacking a heterocycle at position 40 (PZN-T40A, -T40I, and -T40V; Figure 2.7) were not isolated in sufficient quantity (>10 g) to facilitate purification and subsequent determination of antibacterial potency. However, based on the results for hydrolyzed PZN (reinstated Thr at position 40) and T40C, it is likely that these mutations would also severely reduce, if not abolish, activity. In accord with the observation that desmethylPZN had no measurable activity (27), it is clear that multiple regions of PZN, including both N- and C-termini, as well as the internal Ile residues and the polyazole(in)e moieties, collectively contribute to the observed anti-B. anthracis activity. In conclusion, we have developed a facile system for the heterologous expression of PZN in E. coli, which enabled the production of analogs from variant precursor peptides. Using this system, we have elucidated the scope of substrate tolerance in the PZN biosynthetic pathway and found it to be much less permissive than previously characterized RiPPs. From PZN analogs that

63

were successfully produced, we have also provided significant insight into the structure-activity relationship for the PZN pharmacophore. This heterologous expression system has the potential to be applied to the study of any RiPP natural product by facilitating production of analogs in E. coli.

2.9 Methods 2.9.1 Genomic DNA extraction and fosmid construction Genomic DNA (gDNA) for fosmid construction was prepared from Bacillus amyloliquefaciens strains FZB42 and RS32 (26) using cells grown in 40 mL Luria-Bertani (LB) broth [for RS32, supplemented with spectinomycin (90 μg mL–1), chloramphenicol (7 μg mL–1), and erythromycin (1 μg mL–1)] for 16 h at 37 °C. Harvested cells were resuspended in 9.5 mL TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0) with 20 mg lysozyme (Sigma) and incubated for 20 min at 37 °C. 0.5 mL of 10% (w/v) sodium dodecyl sulfate and 50 μL of 20 mg mL–1 proteinase K (New England Biolabs) were added to the tube and incubated at 37 °C for 1 h. After the addition of 1.8 mL of 5 M aqueous NaCl and 1.5 mL of 10% (w/v) aqueous cetyl trimethylammonium bromide solution, the sample was incubated at 65 °C for 20 min. The resulting lysate was sequentially extracted with buffer saturated phenol, chloroform, and isoamyl alcohol (25:24:1) and chloroform and isoamyl alcohol (24:1). Genomic DNA was precipitated from the aqueous layer by addition of 0.1 volumes 5 M NaCl and 0.7 volumes isopropanol, spooled onto a glass rod, washed three times with 70% (v/v) EtOH and once with neat EtOH, dried, and resuspended in water. Fosmid construction was performed as previously reported (29). In this heterologous expression system, supplementing of the growth medium with rhamnose induces the fosmid copy number to increase approximately twenty-fold (from 1-2 copies per cell to 25-50). Fosmid libraries were selected using chloramphenicol (12.5 μg mL–1) for FZB42 or chloramphenicol (12.5 μg mL– 1) and spectinomycin (90 μg mL–1) for RS32. Positive clones were screened by PCR for the presence of the PZN biosynthetic cluster using primers for bamF and bamL (Table 2.2). The gDNA sequence from RS32 covered on the PZN ΔbamA cluster fosmid spans RBAM_007350 through RBAM_007570 (nucleotides 719228 to 745050). The gDNA sequence from FZB42 covered on the full PZN cluster fosmid spans RBAM_007190 through RBAM_007560 (partial) (nucleotides 708,082 to 743,407).

64

2.9.2 Cloning and mutagenesis The gene encoding the BamA precursor peptide was amplified by PCR from B. amyloliquefaciens FZB42 genomic DNA using Taq polymerase and primers “BamHI BamA fw” and “NotI BamA rv” (Table 2.2). The purified insert was digested with restriction enzymes BamHI-HF and NotI-HF (New England Biolabs) and ligated into a similarly digested modified pET28 vector containing an N-terminal maltose-binding protein (MBP) fusion with a TEV protease site (pET28-MBP) (24). From this pET28-MBP-BamA construct, the gene encoding MBP-BamA was amplified by PCR using Pfu polymerase and primers “SalI MBP fw” and “HindIII BamA rv” (Table 2.2). The purified insert was digested with restriction enzymes SalI-HF and HindIII-HF (New England Biolabs) and ligated into similarly digested pBAD24 (gift of John Cronan, UIUC) (31) using T4 DNA (New England Biolabs) to construct the pBAD24-MBP- BamA plasmid. DNA sequences encoding chimeric peptides BamA-CmsA, BamA-BlinA, and BamACurA were codon optimized and commercially synthesized by GenScript and PCR amplified using Pfu polymerase and “BamHI BamA (opt) fw” with the appropriate “HindIII rv” primer (Table 2.2). PCR amplicons were digested with BamHI-HF and HindIII-HF (New England Biolabs) and ligated into similarly digested pET28-MBP using T4 DNA ligase. These pET28- MBP-chimera constructs were then used as templates for PCR amplification using Pfu and primers “SalI MBP fw” and the appropriate “HindIII rv” primer (Table 2.2). PCR amplicons were digested with SalI-HF and HindIII-HF (New England Biolabs) and ligated into similarly digested pBAD24 using T4 DNA ligase (New England Biolabs). All site-directed mutagenesis of BamA and BamA-CmsA was performed on pBAD24- MBP-BamA or the pBAD24-MBP-BamA-CmsA construct using either (i) the Quikchange™ method with Pfu Turbo polymerase (Agilent) and primer pairs listed in Table 2.3, or (ii) subcloning analogous to initial generation of the pBAD24-MBP-BamA construct using the “SalI MBP fw” primer (Table 2.2) and the “HindIII rv” primers (Table 2.3). For degenerate mutagenesis, clones in E. coli DH5α were pooled prior to isolation of plasmid DNA to generate polyclonal plasmid libraries. All DNA sequencing was performed by ACGT, Inc. (Wheeling, IL) or the UIUC Core Sequencing Facility using the primer “MBP fwd” (Table 2.3), which anneals near the 3’ end of the mbp gene, upstream of the bamA sequence.

65

Table 2.2. DNA sequences used for cloning and fosmid screening.

Oligonucleotide Sequence (5’ to 3’)

BamHI BamA fw AAAAGGATCCATGGAGGAGGTAACAATTATGACTCAAATTAAAGTGC

NotI BamA rv AAAAGCGGCCGCTTAAAACGTAGATGAACTAGAGATGATTGTGGTACAGG

SalI MBP fw AAAAGTCGACAATGGGCAGCAG

HindIII BamA rv AAAAAAGCTTTTAAAACGTAGATGAACTAGAG

BamHI BamA (opt) fw AAAAGGATCCATGACCCAGATTAAAGTGCC

HindIII BlinA rv AATTAAGCTTTTAACCGCCAGAACAGC

HindIII CmsA rv AATTAAGCTTTTAGCCCCAGGTGC

HindIII CurA rv AATTAAGCTTTTAACCGCCGCAACAG

BamA-BlinA CCATGGCGATGACCCAGATTAAAGTGCCGACGGCACTGATCGCGAGCGTTC ATGGCGAGGGTCAGCACCTGTTTGAACCGATGGCGGCACGTTGCAGCTGTA CCACGCTGCCGTGCTGTTGCTGTTCTGGCGGTTAACCCGGG

BamA-CmsA CCATGGCGATGACCCAGATTAAAGTGCCGACGGCACTGATCGCGAGCGTTC ATGGCGAGGGTCAGCACCTGTTTGAACCGATGGCGGCCCGTTGCACCTGTA GCACGGTGATTTCTACCACGTGCACCTGGGGCTAACCCGGG

BamA-CurA CCATGGCGATGACCCAGATTAAAGTGCCGACGGCACTGATCGCGAGCGTTC ATGGCGAGGGTCAGCACCTGTTTGAACCGATGGCGGCCCGTTGTTCTTGCAC CACGATTCCGTGCTGTTGCTGTTGCGGCGGTTAACCCGGG

BamF fw AAAAGGATCCATGCCAAATCAGAATTTTC

BamF rv AAAACCCGGGTTACTCATTTCCTTGCC

BamL fw AAAAGGATCCATGGAAATTGAAACAATTGTCAGAGAGT

BamL rv AAAAGCGGCCGCTCACGTATACCTTTTGTTTTTTATAATCCAAC

66

Table 2.3. DNA sequences used for generating core peptide mutants.

Oligonucleotide Sequence (5’ to 3’)

C29-NBT fw GAGCCAATGGCTGCACGCNBTACCTGTACCACAATCAT

C29-NBT rv ATGATTGTGGTACAGGTAVNGCGTGCAGCCATTGGCTC

C31-NBT fw AATGGCTGCACGCTGTACCNBTACCACAATCATCTCTAGT

C31-NBT rv ACTAGAGATGATTGTGGTAVNGGTACAGCGTGCAGCCATT

T33-NBC fw GCTGCACGCTGTACCTGTACCNBCATCATCTCTAGTTCATCTACG

T33-NBC rv CGTAGATGAACTAGAGATGATGVNGGTACAGGTACAGCGTGCAGC

I34-NNC fw CACGCTGTACCTGTACCACANNCATCTCTAGTTCATCTACG

I34-NNC rv CGTAGATGAACTAGAGATGNNTGTGGTACAGGTACAGCGTG

I35-NNC fw CGCTGTACCTGTACCACAATCNNCTCTAGTTCATCTACGTTTTA

I35-NNC rv TAAAACGTAGATGAACTAGAGNNGATTGTGGTACAGGTACAGCG

S38-NBC fw GTACCTGTACCACAATCATCTCTAGTNBCTCTACGTTTTAAAAGCTTGGC

S38-NBC rv GCCAAGCTTTTAAAACGTAGAGVNACTAGAGATGATTGTGGTACAGGTAC

T40-NBC fw TACCACAATCATCTCTAGTTCATCTNBCTTTTAAAAGCTTGGCTGTTTTGGCG

T40-NBC rv CGCCAAAACAGCCAAGCTTTTAAAAGVNAGATGAACTAGAGATGATTGTGG TA

F41-KNS fw CAATCATCTCTAGTTCATCTACGKNSTAAAAGCTTGGCTGTTTTGGCGG

F41-KNS rv CCGCCAAAACAGCCAAGCTTTTASNMCGTAGATGAACTAGAGATGATTG

BamA T33 HindIII rv TTTTAAGCTTTTAAAACGTAGATGAACTAGAGATGATGHNGGTACAGGTAC AGCGTG

BamA S38 HindIII rv TTTTAAGCTTTTAAAACGTAGAABNACTAGAGATGATTGTGGTACAG

BamA T40 HindIII rv TTTTAAGCTTTTAAAACRKAGATGAACTAGAGATGATTGTGGT

BamA I34V fw CTGCACGCTGTACCTGTACCACAGTCATCTCTAGTT

BamA I34V rv AACTAGAGATGACTGTGGTACAGGTACAGCGTGCAG

BamA I34A fw ACGCTGTACCTGTACCACAGCCATCTCTAGTTCATCTACG

BamA I34A rv CGTAGATGAACTAGAGATGGCTGTGGTACAGGTACAGCGT

67

Table 2.3. (cont.)

BamA I34M fw GCTGTACCTGTACCACAATGATCTCTAGTTCATCTACG

BamA I34M rv CGTAGATGAACTAGAGATCATTGTGGTACAGGTACAGC

BamA I35* fw GCACGCTGTACCTGTACCACAATCTAGTCTAGTTCATCTACGTTTTAAAAG

BamA I35* rv CTTTTAAAACGTAGATGAACTAGACTAGATTGTGGTACAGGTACAGCGTGC

BamA S38T fw TACCTGTACCACAATCATCTCTAGTACGTCTACGTTTTAAAAGCTTGG

BamA S38T rv CCAAGCTTTTAAAACGTAGACGTACTAGAGATGATTGTGGTACAGGTA

CmsA S32T fw CGGCCCGTTGCACCTGTACCACGGTGATT

CmsA S32T rv AATCACCGTGGTACAGGTGCAACGGGCCG

CmsA G42* fw ATTTCTACCACGTGCACCTGGTAATAAAAGCTTGGCTGTTTTGGC

CmsA G42* rv GCCAAAACAGCCAAGCTTTTATTACCAGGTGCACGTGGTAGAAAT

BamA K43* fw CAATCATCTCTAGTTCATCTACGTTTTAATAGCTTGGCTGTTTTG

BamA K43* rv CAAAACAGCCAAGCTATTAAAACGTAGATGAACTAGAGATGATTG

*42-SVA fw TACCACAATCATCTCTAGTTCATCTACGTTTSVATAGCTTGGCTGTTTTG

*42-SVA rv CAAAACAGCCAAGCTATBSAAACGTAGATGAACTAGAGATGATTGTGGTA

A32 insert fw ATGGCTGCACGCTGTACCTGTGCCACCACAATCATC

A32 insert rv GATGATTGTGGTGGCACAGGTACAGCGTGCAGCCAT

A35 insert fw CTGTACCTGTACCACAATCGCGATCTCTAGTTCATCTACGT

A35 insert rv ACGTAGATGAACTAGAGATCGCGATTGTGGTACAGGTACAG

A39 insert fw GTACCTGTACCACAATCATCTCTAGTTCAGCGTCTACGTTTTAAAAGC

A39 insert rv GCTTTTAAAACGTAGACGCTGAACTAGAGATGATTGTGGTACAGGTAC

S38 del fw ACCTGTACCACAATCATCTCTAGTTCTACGTTTTAAAAGCTTG

S38 del rv CAAGCTTTTAAAACGTAGAACTAGAGATGATTGTGGTACAGGT

MBP fwd (sequencing ATGAAGCCCTGAAAGACG primer)

68

2.9.3 Heterologous PZN production To heterologously produce PZN and analogs, E. coli cells containing a fosmid harboring the PZN biosynthetic gene cluster ΔbamA were transformed with purified pBAD24-MBP-BamA or a variant thereof. Transformants were selected using Luria-Bertani (LB) agar supplemented with ampicillin (50 g mL–1) and chloramphenicol (12.5 g mL–1). Individual colonies were used to inoculate 10 mL cultures of LB broth supplemented with ampicillin (50 g mL–1) and chloramphenicol (12.5 g mL–1) in test tubes. Cultures were grown on a roller drum at 37 °C to an

OD600 measurement of 0.6, at which point cells were harvested by centrifugation (4,000 x g) and resuspended in 1 mL of the LB supernatant. To induce production of PZN or analogs, 300 L of cell suspension was spread on a 15 cm plate of LB agar supplemented with ampicillin (50 g mL– 1), chloramphenicol (12.5 g mL–1), rhamnose (10 mM) to increase copy number of the fosmid

(29), and arabinose (10 mM) to induce expression of MBP-BamA from the Para promoter of pBAD24. These cultures were grown at 30 °C for 40 h. Cells were harvested by resuspension in TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0) and subsequent centrifugation (4,000 x g).

2.9.4 PZN extraction Crude PZN or analogs were obtained by a non-lytic, methanolic cell surface extraction, similar to methods described previously (27). Briefly, cells were resuspended in MeOH (10 mL), agitated by vortex for 45 s, and equilibrated for 15 min at 22 °C. Cells were harvested by centrifugation (4,000 x g), and the MeOH supernatant was rotary evaporated to near dryness. The crude extract was resuspended in deionized water and lyophilized to dryness. Dried, crude extracts were resuspended in 500 L of 80% (v/v) aqueous MeCN for purification by HPLC or in 500 L MeOH for analysis of crude extract by MALDI-MS.

2.9.5 Screening of mutant panels To screen BamA mutant panels for production of PZN variants, purified polyclonal plasmid libraries or individual plasmid sequences were used to transform chemically competent E. coli cells containing the entire PZN biosynthetic gene cluster (including bamA) on a fosmid, as described above. Individual colonies were used to inoculate 10 mL cultures of LB broth supplemented with ampicillin (50 g mL–1) and chloramphenicol (12.5 g mL–1) in test tubes.

69

Cultures were grown on a roller drum at 37 °C to an OD600 of 0.6, at which point PZN production was induced by addition of rhamnose (10 mM) and arabinose (10 mM) to the liquid cultures. Following induction, cultures were grown with shaking at 30 °C for 24 h. Cells were harvested by centrifugation (4,000 x g), washed with TE buffer, and harvested again (8,000 x g). PZN variants were extracted analogously to the procedure described above using 1 mL MeOH. Cells were harvested by centrifugation (8,000 x g), and the MeOH supernatant was evaporated in a speed-vac concentrator with minimal heating. Dry, crude extracts were resuspended in 10 L MeOH for analysis by MALDI-MS. Since the Quikchange™ site-directed mutagenesis method does not guarantee that all possible DNA sequences will be produced due to potential sequence biases, a sufficient number of clones were sequenced from each library to ensure 95% confidence in having detected all generated mutants, as calculated by 0.95 = 1–(1–f)n, where f = frequency of the least represented mutants and n = number of clones screened (54).

2.9.6 Reverse phase HPLC purification Crude PZN or analogs [suspended in 80% (v/v) aqueous MeCN] were reverse phase purified as described previously (27) using a Thermo BETASIL C18 column (250 mm x 10 mm; pore size: 100 Å; particle size: 5 m) at a flow rate of 4 mL min–1. A gradient of 65-95% MeOH with 0.1% (v/v) formic acid over 32 min was used. The fractions containing PZN or an analog (as

monitored by A266 and later verified by MALDI-MS) were collected into 20 mL borosilicate vials

and the solvent was removed in vacuo. Purified PZN analogs were quantified by A260 in DMSO solution.

2.9.7 Mass spectrometry MALDI-MS analysis used a Bruker Daltonics UltrafleXtreme MALDI-TOF/TOF instrument operating in reflector/positive mode using sinapic acid as the matrix. LC-MS analysis used an Agilent 1200 HPLC system coupled to a G1956B quadrupole mass spectrometer with an

electrospray ionization (ESI) source. LC used a 2.1 mm x 150 mm Grace Vydac Denali C18 column (120 Å, 5 m) with a gradient of 65-95% MeOH with 0.1% (v/v) formic acid over 25 min, with the analytes eluted directly into the MS. HPLC-purified samples for high-resolution and tandem mass spectrometry were resuspended in 12 L of 80% (v/v) MeCN with 1% (v/v) formic acid. An Advion Nanomate 100

70

directly infused the sample to a ThermoFisher Scientific LTQ-FT hybrid linear ion trap, operating at 11 T (calibrated weekly). The FTMS was operated using the following parameters: minimum target signal counts, 5000; resolution, 100,000; m/z range detected, dependent on target m/z; isolation width (MS/MS), 4-5 m/z; normalized collision energy (MS/MS), 35; activation q value (MS/MS), 0.4; activation time (MS/MS), 30 ms.

2.9.8 Antibacterial activity assay Determination of MIC values for PZN analogs was performed as described previously (27) for B. anthracis Sterne with the only modification being that PZN analogs were added from DMSO solutions to a maximum final concentration of 32 g mL–1 in bacterial cultures. Hydrolyzed samples were prepared by incubation of purified PZN analogs (suspended in MeOH) with 0.2% (w/v) HCl for 4 h at 23 °C, followed by removal of solvent in a speed-vac concentrator and subsequent resuspension in DMSO.

2.10 References

1. Fischbach, M. A., and Walsh, C. T. (2009) Antibiotics for emerging pathogens. Science 325, 1089-1093. 2. Newman, D. J., and Cragg, G. M. (2012) Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 75, 311-335. 3. Challis, G. L. (2008) Genome mining for novel natural product discovery. J. Med. Chem. 51, 2618-2628. 4. Kapur, S., Chen, A. Y., Cane, D. E., and Khosla, C. (2010) Molecular recognition between ketosynthase and acyl carrier protein domains of the 6-deoxyerythronolide B synthase. Proc. Natl. Acad. Sci. U S A 107, 22066-22071. 5. Kapur, S., Lowry, B., Yuzawa, S., Kenthirapalan, S., Chen, A. Y., Cane, D. E., and Khosla, C. (2012) Reprogramming a module of the 6-deoxyerythronolide B synthase for iterative chain elongation. Proc. Natl. Acad. Sci. U S A 109, 4110-4115. 6. Liu, T., Chiang, Y. M., Somoza, A. D., Oakley, B. R., and Wang, C. C. (2011) Engineering of an “unnatural” natural product by swapping polyketide synthase domains in Aspergillus nidulans. J. Am. Chem. Soc. 133, 13314-13316. 7. Hughes, A. J., and Keatinge-Clay, A. (2011) Enzymatic extender unit generation for in vitro polyketide synthase reactions: structural and functional showcasing of Streptomyces coelicolor MatB. Chem. Biol. 18, 165-176. 8. Arnison, P. G., et al. (2013) Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, 108-160.

71

9. Piper, C., Hill, C., Cotter, P. D., and Ross, R. P. (2011) Bioengineering of a Nisin A- producing Lactococcus lactis to create isogenic strains producing the natural variants Nisin F, Q and Z. Microb. Biotechnol. 4, 375-382. 10. Molloy, E. M., Field, D., Connor, P. M., Cotter, P. D., Hill, C., and Ross, R. P. (2013) Saturation mutagenesis of lysine 12 leads to the identification of derivatives of nisin a with enhanced antimicrobial activity. PloS ONE 8, e58530. 11. Molloy, E. M., Ross, R. P., and Hill, C. (2012) ‘Bac’ to the future: bioengineering lantibiotics for designer purposes. Biochem. Soc. Trans. 40, 1492-1497. 12. Cotter, P. D., Deegan, L. H., Lawton, E. M., Draper, L. A., O'Connor, P. M., Hill, C., and Ross, R. P. (2006) Complete alanine scanning of the two-component lantibiotic lacticin 3147: generating a blueprint for rational drug design. Mol. Microbiol. 62, 735-747. 13. Field, D., Molloy, E. M., Iancu, C., Draper, L. A., O’ Connor, P. M., Cotter, P. D., Hill, C., and Ross, R. P. (2013) Saturation mutagenesis of selected residues of the alpha- peptide of the lantibiotic lacticin 3147 yields a derivative with enhanced antimicrobial activity. Microb. Biotechnol. 6, 564-575. 14. Appleyard, A. N., Choi, S., Read, D. M., Lightfoot, A., Boakes, S., Hoffmann, A., Chopra, I., Bierbaum, G., Rudd, B. A., Dawson, M. J., and Cortes, J. (2009) Dissecting structural and functional diversity of the lantibiotic mersacidin. Chem. Biol. 16, 490-498. 15. Caetano, T., Krawczyk, J. M., Mosker, E., Sussmuth, R. D., and Mendo, S. (2011) Heterologous expression, biosynthesis, and mutagenesis of type II lantibiotics from Bacillus licheniformis in Escherichia coli. Chem. Biol. 18, 90-100. 16. Boakes, S., Ayala, T., Herman, M., Appleyard, A. N., Dawson, M. J., and Cortes, J. (2012) Generation of an actagardine A variant library through saturation mutagenesis. Appl. Microbiol. Biotechnol. 95, 1509-1517. 17. Bowers, A. A., Acker, M. G., Koglin, A., and Walsh, C. T. (2010) Manipulation of thiocillin variants by prepeptide gene replacement: structure, conformation, and activity of heterocycle substitution mutants. J. Am. Chem. Soc. 132, 7519-7527. 18. Bowers, A. A., Acker, M. G., Young, T. S., and Walsh, C. T. (2012) Generation of thiocillin ring size variants by prepeptide gene replacement and in vivo processing by Bacillus cereus. J. Am. Chem. Soc. 134, 10313-10316. 19. Li, C., Zhang, F., and Kelly, W. L. (2011) Heterologous production of thiostrepton A and biosynthetic engineering of thiostrepton analogs. Mol. Biosyst. 7, 82-90. 20. Li, C., Zhang, F., and Kelly, W. L. (2012) Mutagenesis of the thiostrepton precursor peptide at Thr7 impacts both biosynthesis and function. Chem. Commun. 48, 558-560. 21. Young, T. S., Dorrestein, P. C., and Walsh, C. T. (2012) Codon randomization for rapid exploration of chemical space in thiopeptide antibiotic variants. Chem. Biol. 19, 1600- 1610. 22. Tianero, M. D., Donia, M. S., Young, T. S., Schultz, P. G., and Schmidt, E. W. (2012) Ribosomal route to small-molecule diversity. J. Am. Chem. Soc. 134, 418-425. 23. Pan, S. J., and Link, A. J. (2011) Sequence diversity in the lasso peptide framework: discovery of functional microcin J25 variants with multiple amino acid substitutions. J. Am. Chem. Soc. 133, 5016-5023. 24. Lee, S. W., Mitchell, D. A., Markley, A. L., Hensler, M. E., Gonzalez, D., Wohlrab, A., Dorrestein, P. C., Nizet, V., and Dixon, J. E. (2008) Discovery of a widely distributed toxin biosynthetic gene cluster. Proc. Natl. Acad. Sci. U S A 105, 5879-5884.

72

25. Dunbar, K. L., Melby, J. O., and Mitchell, D. A. (2012) YcaO domains use ATP to activate amide backbones during peptide cyclodehydrations. Nat. Chem. Biol. 8, 569-575. 26. Scholz, R., Molohon, K. J., Nachtigall, J., Vater, J., Markley, A. L., Sussmuth, R. D., Mitchell, D. A., and Borriss, R. (2011) Plantazolicin, a novel microcin B17/streptolysin S-like natural product from Bacillus amyloliquefaciens FZB42. J. Bacteriol. 193, 215- 224. 27. Molohon, K. J., Melby, J. O., Lee, J., Evans, B. S., Dunbar, K. L., Bumpus, S. B., Kelleher, N. L., and Mitchell, D. A. (2011) Structure determination and interception of biosynthetic intermediates for the plantazolicin class of highly discriminating antibiotics. ACS Chem. Biol. 6, 1307-1313. 28. Idris, E. E., Iglesias, D. J., Talon, M., and Borriss, R. (2007) Tryptophan-dependent production of indole-3-acetic acid (IAA) affects level of plant growth promotion by Bacillus amyloliquefaciens FZB42, Mol. Plant Microbe Interact. 20, 619-626. 29. Eliot, A. C., Griffin, B. M., Thomas, P. M., Johannes, T. W., Kelleher, N. L., Zhao, H., and Metcalf, W. W. (2008) Cloning, expression, and biochemical characterization of Streptomyces rubellomurinus genes required for biosynthesis of antimalarial compound FR900098. Chem. Biol. 15, 765-770. 30. Bates, P. F., and Swift, R. A. (1983) Double cos site vectors: simplified cosmid cloning. Gene 26, 137-146. 31. Guzman, L. M., Belin, D., Carson, M. J., and Beckwith, J. (1995) Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J. Bacteriol. 177, 4121-4130. 32. Kuliopulos, A., and Walsh, C. T. (1994) Production, purification, and cleavage of tandem repeats of recombinant peptides. J. Am. Chem. Soc. 116, 4599-4607. 33. Kapust, R. B., Routzahn, K. M., and Waugh, D. S. (2002) Processive degradation of nascent polypeptides, triggered by tandem AGA codons, limits the accumulation of recombinant tobacco etch virus protease in Escherichia coli BL21(DE3). Protein Expr. Purif. 24, 61-70. 34. Bouallag, N., Gaillard, C., Marechal, V., and Strauss, F. (2009) Expression of Epstein- Barr virus EBNA1 protein in Escherichia coli: purification under nondenaturing conditions and use in DNA-binding studies. Protein Expr. Purif. 67, 35-40. 35. Belshaw, P. J., Roy, R. S., Kelleher, N. L., and Walsh, C. T. (1998) Kinetics and regioselectivity of peptide-to-heterocycle conversions by microcin B17 synthetase. Chem. Biol. 5, 373-384. 36. Kelleher, N. L., Hendrickson, C. L., and Walsh, C. T. (1999) Posttranslational heterocyclization of cysteine and serine residues in the antibiotic microcin B17: distributivity and directionality. Biochemistry 38, 15623-15630. 37. Melby, J. O., Dunbar, K. L., Trinh, N. Q., and Mitchell, D. A. (2012) Selectivity, directionality, and promiscuity in peptide processing from a Bacillus sp. Al Hakam cyclodehydratase. J. Am. Chem. Soc. 134, 5309-5316. 38. Mitchell, D. A., Lee, S. W., Pence, M. A., Markley, A. L., Limm, J. D., Nizet, V., and Dixon, J. E. (2009) Structural and functional dissection of the heterocyclic peptide cytotoxin streptolysin S. J. Biol. Chem. 284, 13004-13012. 39. Sinha Roy, R., Belshaw, P. J., and Walsh, C. T. (1998) Mutational analysis of posttranslational heterocycle biosynthesis in the gyrase inhibitor microcin B17: distance

73

dependence from propeptide and tolerance for substitution in a GSCG cyclizable sequence. Biochemistry 37, 4125-4136. 40. Roy, R. S., Allen, O., and Walsh, C. T. (1999) Expressed protein ligation to probe regiospecificity of heterocyclization in the peptide antibiotic microcin B17. Chem. Biol. 6, 789-799. 41. Lee, M. V., Ihnken, L. A., You, Y. O., McClerren, A. L., van der Donk, W. A., and Kelleher, N. L. (2009) Distributive and directional behavior of lantibiotic synthetases revealed by high-resolution tandem mass spectrometry. J. Am. Chem. Soc. 131, 12258- 12264. 42. Martin, R. B., and Parcell, A. (1961) Hydrolysis of 2-methyl-delta-oxazoline - an intramolecular O-N-acetyl transfer reaction. J. Am. Chem. Soc. 83, 4835-4838. 43. Martin, R. B., Hedrick, R. I., and Parcell, A. (1964) Thiazoline + oxazoline hydrolyses + sulfur-nitrogen + oxygen-nitrogen acyl transfer reactions. J. Org. Chem. 29, 3197-3206. 44. Dunbar, K. L., and Mitchell, D. A. (2013) Insights into the mechanism of peptide cyclodehydrations achieved through the chemoenzymatic generation of amide derivatives. J. Am. Chem. Soc. 135, 8692-8701. 45. Van Driessche, G., Vandenberghe, I., Jacquemotte, F., Devreese, B., and Van Beeumen, J. J. (2002) Mass spectrometric identification of in vivo carbamylation of the amino terminus of Ectothiorhodospira mobilis high-potential iron-sulfur protein, isozyme 1, J. Mass Spectrom. 37, 858-866. 46. Golemi, D., Maveyraud, L., Vakulenko, S., Samama, J. P., and Mobashery, S. (2001) Critical involvement of a carbamylated lysine in catalytic function of class D beta- lactamases, Proc Natl Acad Sci U S A 98, 14280-14285. 47. Lapko, V. N., Smith, D. L., and Smith, J. B. (2001) In vivo carbamylation and acetylation of water-soluble human lens alphaB-crystallin lysine 92, Prot. Sci. 10, 1130-1136. 48. Pearson, M. A., Schaller, R. A., Michel, L. O., Karplus, P. A., and Hausinger, R. P. (1998) Chemical rescue of Klebsiella aerogenes urease variants lacking the carbamylated-lysine nickel ligand, Biochemistry 37, 6214-6220. 49. Roy, R. S., Kim, S., Baleja, J. D., and Walsh, C. T. (1998) Role of the microcin B17 propeptide in substrate recognition: solution structure and mutational analysis of McbA1- 26. Chem. Biol. 5, 217-228. 50. Madison, L. L., Vivas, E. I., Li, Y. M., Walsh, C. T., and Kolter, R. (1997) The leader peptide is essential for the post-translational modification of the DNA-gyrase inhibitor microcin B17. Mol. Microbiol. 23, 161-168. 51. Oman, T. J., and van der Donk, W. A. (2010) Follow the leader: the use of leader peptides to guide natural product biosynthesis. Nat. Chem. Biol. 6, 9-18. 52. Bentley, S. D., Corton, C., Brown, S. E., Barron, A., Clark, L., Doggett, J., Harris, B., Ormond, D., Quail, M. A., May, G., Francis, D., Knudson, D., Parkhill, J., and Ishimaru, C. A. (2008) Genome of the actinomycete plant pathogen Clavibacter michiganensis subsp. sepedonicus suggests recent niche adaptation. J. Bacteriol. 190, 2150-2160. 53. Chen, X. H., et al. (2007) Comparative analysis of the complete genome sequence of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42, Nat. Biotechnol. 25, 1007-1014. 54. Jeltsch, A., and Lanio, T. (2002) Site-directed mutagenesis by polymerase chain reaction. Methods Mol. Biol. 182, 85-94.

74

CHAPTER 3: In vitro biosynthesis and substrate tolerance of the plantazolicin family of natural products1

3.1 Introduction Natural products are a rich source of chemically and mechanistically diverse antibiotics (1). Within this broad space, the ribosomally synthesized and post-translationally modified peptide (RiPP) natural products comprise a class garnering increased attention (2). Plantazolicin (PZN) is an anti-B. anthracis RiPP produced by Bacillus velezensis (formerly Bacillus amyloliquefaciens) FZB42 (3-6) and a member of a subclass of RiPPs termed the linear azol(in)e-containing peptides (LAPs) (7, 8). LAPs, thiopeptides, and azol(in)e-containing cyanobactins can also collectively be referred to as thiazole/oxazole-modified microcins (TOMMs) (2). During TOMM biosynthesis, a trimeric heterocycle synthetase (BCD) converts select Cys, Ser, and Thr residues in the C-terminal (core) region of the precursor peptide to thiazole, oxazole, and methyloxazol(in)e moieties (Figure 3.1A) (7). The D-protein (a member of the YcaO superfamily), in partnership with the C-protein, first installs azolines through an ATP-mediated cyclodehydration which proceeds through a phosphorylated hemiorthoamide (9, 10), and the B-protein [flavin mononucleotide (FMN)- dependent dehydrogenase] then oxidizes select azolines to azoles (11). The C-protein serves to bind the N-terminal (leader) region of the precursor peptide, enhancing the rate of processing (12- 14). Following heterocyclization, the biosynthesis of PZN is completed by removal of the leader peptide, likely by the locally encoded type II CaaX protease (15, 16), and dimethylation of the new N-terminus by an S-adenosylmethionine (SAM)-dependent methyltransferase (17, 18). Though originating from a peptide, the structure of mature PZN after post-translational modification is remarkably non-peptide-like (Figure 3.1A), with two sets of five contiguous heterocycles conferring rigidity and hydrophobicity (4, 5). In addition to the PZN biosynthetic gene cluster (BGC) found in B. velezensis, homologous gene clusters have been identified in a number of other bacterial strains (Figures 3.1B, 3.2) (8), although only Bacillus pumilus has been demonstrated to produce PZN (4). The putative precursor peptides associated with these BGCs exhibit a highly conserved core region, which would result in similar groupings of contiguous heterocycles if indeed post-translationally modified like PZN (Figures 3.1C, 3.2). The unique structure of PZN endows it with an extremely narrow spectrum of activity and a membrane-

1Reprinted with permission from Deane, C. D., Burkhart, B. J., Blair, P. M., Tietz, J. I., Lin, A., and Mitchell, D. A. (2016) ACS Chem. Biol. DOI: 10.1021/acschembio.6b00369. Copyright 2016 American Chemical Society. B.J.B. performed size-exclusion chromatography and leader peptide binding studies; P.M.B. performed MS/MS analysis; J.I.T. performed NMR and some kinetic analysis; A.L. isolated BZN. 75 targeting mode of action unlike other known antibiotics (19). PZN exhibits bactericidal activity against B. anthracis, the causative agent of anthrax, with a minimum inhibitory concentration (MIC) of 1–2 µg/mL, while not killing other species, even closely related ones (4, 19). It remains to be seen whether the other potential natural products in the PZN family, if produced, would display similar activities.

Figure 3.1. (a) Structure of plantazolicin (PZN). (b) Biosynthetic gene clusters responsible for the production of PZN and an abridged list of homologous clusters. Bam, Bacillus velezensis FZB42; Bpum, B. pumilus ATCC 7061; Bbad, B. badius; Cur, Corynebacterium urealyticum DSM 7109. (c) Precursor peptide sequences for PZN and select related family members. Colors indicate post- translational modifications in the mature product: green, Nα,Nα-dimethylarginine; red, thiazole; blue, (methyl)oxazole; brown, azoline; *, leader peptide cleavage site. A full listing of gene clusters and precursor peptides for the PZN family are shown in Figure 3.2.

76

Figure 3.2. (a) Biosynthetic gene clusters responsible for the production of PZN and putative PZN-like nature products. Bam, Bacillus velezensis; Bpum, B. pumilus ATCC 7061; Bsaf, B. safensis; Bbad, B. badius; Cms, Clavibacter michiganensis sepedonicus subsp. sepedonicus; Cur, Corynebacterium urealyticum DSM 7109; Blin, Brevibacterium linens; Nhap, Nocardiopsis halophila; Nbai, Nocardiopsis baichengensis; Nxin, Nocardiopsis xinjiangensis; Alop, Allosalinactinospora lopnorensis; Derm, Dermacoccus sp. DE3. (b) Precursor peptide sequences associated with the gene clusters shown in (a). Colors, same as Figure 3.1.

The characterization of structural analogs of bioactive compounds can benefit the study of mode of action and aid in the establishment of structure-activity relationships. RiPPs are particularly attractive natural product engineering targets, as their genetically encoded precursor

77

peptides provide an avenue for the facile introduction of diversity via mutagenesis of the coding gene. The RiPP biosynthetic enzymes can then be used to generate variant compounds from the unnatural precursor peptides, provided that the enzymes accept those peptides as substrates. Precursor peptide replacement has been employed to investigate and capitalize on the biosynthetic pathways of lanthipeptides (20-26), lasso peptides (27-29), and thiopeptides (30-35). The patellamide biosynthetic enzymes are notably permissive toward unnatural substrates, allowing for the generation of many new natural product analogs without the need for intensive total synthesis (36-40). Previously, we used an Escherichia coli expression system to examine the substrate tolerance of the PZN biosynthetic pathway and to generate analogs to explore structure-activity relationships (41). Heterologous production of PZN analogs in E. coli prevented the interception of biosynthetic intermediates and also, determination of the order of post-translational events. Nonetheless, the enzymes involved in PZN biosynthesis proved to be extremely selective for their substrates, with the cyclodehydratase in particular appearing incapable of processing any substrates with non-cyclizable residues in positions that were normally cyclized (41). In the present study, we reconstitute the PZN synthetase in vitro and use the expanded substrate scope of this cell-free system to more thoroughly probe the biosynthesis of PZN, including the identification of residues crucial for leader peptide binding. In addition, we demonstrate the production of new PZN analogs, including two novel natural products from other BGCs in the PZN family.

3.2 In vitro reconstitution of the PZN heterocycle synthetase In order to determine the activity of the PZN heterocycle synthetase in vitro, the precursor peptide (BamA), dehydrogenase (BamB), and two-component cyclodehydratase (BamC, BamD) were each heterologously expressed in E. coli as fusions to maltose-binding protein (MBP) and affinity purified (Figure 3.3). Unfortunately, all attempts to express MBP-BamC yielded only

degraded protein. A codon-optimized version of the gene for E. coli expression (BamCopt) was synthesized and heterologously expressed to yield a modest amount of enzyme (~2 mg/L, Figure

3.3), but size-exclusion chromatography indicated that BamCopt did not form an observable complex with BamB and BamD, which could impair the activity of the synthetase (Figure 3.4A). In order to obtain more significant quantities of properly functioning protein, we elected to substitute BamC with a closely related homolog, which is an approach that has been successful for

78

other heterocycle synthetases (42, 43). The C-protein from the PZN BGC in Bacillus pumilus (BpumC, accession number WP_041117410; Figure 3.1B) shares 91% amino acid similarity (63% identity) with BamC (4), and when expressed as an MBP fusion protein in E. coli, yielded over 90 mg/L following affinity purification (Figure 3.3). In contrast to BamCopt, BpumC formed a complex with BamB and BamD as assessed by size-exclusion chromatography (Figure 3.4B). Gratifyingly, when these MBP-tagged proteins (BamB, BpumC, and BamD) were combined with MBP-BamA, TEV protease (to remove the MBP tags), and ATP, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) indicated a mass loss consistent with the formation of nine azoles and one azoline (-198 Da) onto the precursor peptide (Figure 3.5A). Subsequent Fourier transform ion cyclotron resonance tandem mass spectrometry (FT-ICR-MS/MS) was used to localize these modifications, which was identical to the bona fide

natural product (Figure 3.6). While BamCopt did not appear to form a stable complex with other components of the synthetase (Figure 3.4A), overnight reactions with BamCopt did form the expected -198 Da product (Figure 3.7). Ultimately, the more robust protein yield and activity of BpumC warranted its usage in all subsequent assays.

Figure 3.3. SDS-PAGE of enzymes used in this study. All proteins were purified as MBP fusions. BamCopt is the E. coli codon-optimized BamC (see methods).

79

Figure 3.4. Size-exclusion chromatography used to assess complex formation between the components of the Bam synthetase complex. (a) BamCopt is the E. coli codon-optimized BamC, yet it still does not form a significant amount of complex with BamB and BamD. (b) BpumC is the homolog of BamC from B. pumilus and more readily forms a complex with BamB and BamD than does BamC. (c) BamB-R93A does not bind the FMN , yet still forms a complex with BpumC and BamD. BamB-R93A is used when an inactive dehydrogenase is needed or in fluorescence polarization assays to avoid the inherent fluorescence of the flavin. * denotes void, including protein aggregate. The single-protein controls are duplicated in each panel for clarity in comparison with the complex.

Figure 3.5. (a) Reaction of the PZN precursor peptide BamA with the Bam heterocycle synthetase in vitro results in the installation of nine azoles and one azoline, consistent with PZN produced from the native host. (b) Reaction of the precursor peptide CurA with the Cur heterocycle synthetase in vitro results in the installation of ten azoles.

80

Figure 3.6. (a) FT-ICR-MS of the [M+5H]5+ species of heterocyclized and TEV-cleaved BamA. (b) CID-based fragmentation of modified BamA, allowing localization of heterocycles. In subsequent MS/MS figures, the assignment of leader peptide fragment ions will be omitted for clarity.

Figure 3.7. BamCopt (expressed from an E. coli codon-optimized construct) is less active than that of BpumC, as assessed by MALDI-TOF-MS. Values above peaks indicate the number of heterocycles installed on BamA after overnight reaction at 22 °C (100 µM BamA, 10 µM each BamB, BpumC or BamC, and BamD).

81

Bolstered by our success with in vitro reconstitution of the Bam heterocycle synthetase, we also set about reconstituting the homologous synthetase from Corynebacterium urealyticum (Cur, Figure 3.1B) (4, 8). Size-exclusion chromatography indicated that CurB, CurC, and CurD formed a complex (Figure 3.8), and combination of the MBP-tagged synthetase proteins with ATP, TEV, and MBP-CurA resulted in a mass loss of 200 Da for CurA, consistent with the installation of ten azoles (Figure 3.5B).

Figure 3.8. Size-exclusion chromatography was used to assess complex formation between the components of the Cur heterocycle synthetase. * denotes void, including protein aggregate.

Although some TOMM cyclodehydratase enzymes are capable of activity in the absence of their cognate dehydrogenase, or even in the absence of their cognate C-protein (9, 43), this was not the case for either the Bam or Cur cylodehydratases, as the exclusion of any one component of the synthetase from either BGC abolished activity (Figure 3.9). For these synthetases, all three components of the heterocycle synthetase were required for activity, as has been demonstrated for most other studied TOMM clusters, including microcin B17 and streptolysin S (7, 42, 44). Thus, in order to reveal the order of biosynthetic events, the enzymatic activities of the cyclodehydratase and dehydrogenase needed to be disentangled. Towards this goal, we prepared two inactive forms of the dehydrogenase, BamB-Y206A and BamB-R93A. BamB-Y206A contains an alanine substitution at a key residue (11, 45), while BamB-R93A no longer co-purified with FMN upon heterologous expression in E. coli but still associated with BpumC/BamD by size-

82

exclusion chromatography (Figure 3.4C). When MBP-BamA was treated with BpumC, BamD, and either BamB-R93A or BamB-Y206A, the cyclodehydratase installed at most four azoline heterocycles, which were localized to the four most N-terminal positions in the core peptide (Figures 3.10, 3.11). Subsequent treatment of this species with synthetase containing an active dehydogenase yielded a fully-modified substrate (nine azoles and one azoline). While the four- azoline species proved to be a competent in vitro intermediate, it is not necessarily a natural, on- pathway biosynthetic intermediate. Nevertheless, one of these four azolines is most likely the first to be installed during the normal course of biosynthesis. All attempts to isolate additional intermediates from the Bam or Cur synthetase reactions were unsuccessful due to the apparent high processivity of the enzymes (Figure 3.12). However, the observation that the cyclodehydratase cannot install all ten azolines in the absence of an active dehydrogenase is strongly suggestive that during the normal reaction course, azole formation begins at the N- terminal residues prior to completion of azoline formation at the C-terminal residues.

Figure 3.9. The PZN pathways from both Bam (a) and Cur (b) require the presence of all three components of the heterocycle synthetase for precursor peptide modification.

83

Figure 3.10. Reaction of BamA (1) with a heterocycle synthetase containing the inactive BamB- R93A results in the installation of up to four azolines, without their oxidation to azoles (2). Subsequent treatment of this species mixture with acid hydrolyzes the azolines to their original amino acids (3), whereas treatment with a fully wild-type synthetase yields the full complement of nine azoles and one azoline (4), comparable to treatment of unmodified BamA with wild-type synthetase (5). Substitution of the catalytically inactive BamB-Y206A for BamB-R93A yielded equivalent results.

Figure 3.11. The use of apo BamB-R93A in the Bam synthetase enables the installation of up to four azolines without their oxidation to azoles. CID-based fragmentation of the 2-azoline (a), 3- azoline (b), and 4-azoline (c) species localize the azolines to the first four heterocyclizable residues. Brown, azoline. The use of BamB-Y206A, which is catalytically inactive, in place of BamB-R93A yields comparable results.

84

Figure 3.11 (cont.):

85

Figure 3.12. The Bam synthetase appears to exhibit high processivity, preventing isolation of partially-cyclized intermediates, as assessed by MALDI-TOF-MS. Values above peaks indicate the number of heterocycles installed on BamA after reaction at 22 °C (100 µM BamA, 2 µM each BamB, BpumC, and BamD).

3.3 Substrate tolerance of the PZN synthetase in vitro With the PZN heterocycle synthetase reconstituted in vitro, we next turned our attention to determining the parameters of substrate tolerance for these enzymes with variant precursor peptides. MBP-tagged precursors with core region substitutions were generated using site-directed mutagenesis and E. coli heterologous expression. After treating the panel of purified variant peptides with the heterocycle synthetase in vitro, the extent of processing was determined by MALDI-TOF-MS, with subsequent FT-ICR-MS/MS to localize the sites of post-translational modification (Figure 3.13). Overall, the ability of the synthetase to process variant substrates was increased over what was previously observed in E. coli (41), presumably due to a greater level of control over the reaction conditions (e.g. higher concentrations of reaction components are possible in vitro). Nonetheless, many of the variant peptides remained incompletely processed, reaffirming that the PZN cyclodehydratase is remarkably substrate selective.

86

Figure 3.13. Processing of mutant BamA peptides by the Bam synthetase in vitro. Mutated and uncyclized residues are shown in solid black circles; half-filled circles indicate a mixture of species. (a) Mutation of most residues to Pro prevents cyclization both C-terminal and in the -1 position to the Pro. Wild-type BamA (WT) is shown for comparison. (b) Mutation of a normally- cyclized position to Ala, or insertion of Ala between two wild-type residues, generally prevents cyclization C-terminal to the Ala. (c) A Cys residue in the next heterocyclizable position relative to Ala enables cyclodehydration of that residue, though Cys in a position not normally cyclized remains unmodified. (d) Deletion of a residue does not affect processing by the synthetase, though one residue must remain uncyclized at the C-terminus. (e) The Bam synthetase in vitro is capable of processing substrates with the core peptides of other members of the PZN family.

Initially, we examined the limits of substrate scope for the Bam synthetase by creating a series of variant peptides wherein a Cys, Ser, or Thr residue was replaced with Pro, which has been used as a crude steric mimic for an azoline in previous studies (43, 46). While the Bam synthetase was able to install heterocycles in the expected positions N-terminal to the interrogated Pro residue, neither the residue proximal to Pro, nor any residues following Pro were cyclized (Figure 3.13A, Figures 3.14-3.19). Presumably, the conformational restriction imparted on the peptide backbone by the presence of Pro in the +1 position prevented the Bam cyclodehydratase from installing a heterocycle at that position, perhaps due to an inability to properly fit the substrate in the active site or some other catalytic deficiency. While such an intolerance for Pro in the +1 position is

87 consistent with previous studies using the cyclodehydratase from Bacillus sp. Al Hakam (43), the Cur cyclodehydratase, among other TOMM natural products (47, 48), is capable of accepting Pro in the –1 position (Figure 3.1C, Figure 3.5C).

Figure 3.14. CID-based fragmentation of the [M+4H]4+ species of heterocyclized and TEV- cleaved BamA-C4P. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6 for comparison. Red, thiazole; bold, mutated residue.

Figure 3.15. CID-based fragmentation of the [M+4H]4+ species of heterocyclized and TEV- cleaved BamA-T6P. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

88

Figure 3.16. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-I7P. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

Figure 3.17. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-I8P. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

89

Figure 3.18. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-S11P. (top) 5-azole species; (bottom) 6-azole species. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue; *, unrelated peaks.

90

Figure 3.19. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-T13P. (a) 8-azole species; (b) 7-azole species. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

91

Similar N-terminal but not C-terminal processing was also observed for a number of Ala substitutions, though the cyclodehydratase was able to cyclize residues with Ala in the +1 position (Figures 3.13B, 3.20-3.29). In a few cases involving Ala near the middle of the core peptide, the cyclodehydratase was capable of installing an azoline in the next heterocyclizable position after the Ala, but this was always a minor species, perhaps due in part to hydrolysis of the azoline. Dehydrogenation of this azoline to the azole was never observed in the position following Ala, nor was cyclodehydration observed in more than one position following Ala. This abortive trend was also observed in cases where an Ala residue was inserted between two wild-type residues, including between the two central Ile residues in the core peptide (^8A, ^12A; ^ indicates insertion). Together, these results further support the hypothesis that heterocyclization during PZN biosynthesis proceeds in an N- to C-terminal direction.

Figure 3.20. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-T3A. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; bold, mutated residue.

92

Figure 3.21. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-C4A. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

93

Figure 3.22. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-T5A. (a) 4-ring species; (b) 3-ring species. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; brown, azoline; bold, mutated residue.

94

Figure 3.23. CID-based fragmentation of the [M+4H]4+ species of heterocyclized and TEV- cleaved BamA-T6A. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; brown, azoline; bold, mutated residue.

Figure 3.24. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-S9A/S11A. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

95

Figure 3.25. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-S11A. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

Figure 3.26. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-S12A. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

96

Figure 3.27. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-T13A. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

Figure 3.28. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-^8A. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Contiguous heterocycles prevent fragmentation; thus, more core peptide fragments are observed when enzymatic processing is disrupted. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

97

Figure 3.29. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-^12A. Only one major fragment ions from the leader peptide is labeled for clarity. Red, thiazole; blue, (methyl)oxazole; bold, mutated residue.

The single notable exception to this trend of Ala preventing significant heterocyclization of C-terminal residues was in the case of the BamA-T3A substrate, for which the synthetase fully cyclized both Cys2 and Cys4 to thiazoles (Figures 3.13B, 3.20). Further heterocyclization C- terminal to Cys4 was not observed, consistent with the other Ala-substituted precursor peptides. It is possible that the increased nucleophilicity of the Cys side chain relative to that of Ser/Thr, plus the more electrochemically favorable oxidation to thiazole relative to oxazole, were responsible for the ability of the synthetase to overcome the introduction of Ala in the preceding (–1) position. To test this hypothesis, three double substitutions were constructed and assayed for modification in which the Ala residue was followed by Cys in the next position normally heterocyclized: T5A/T6C, T6A/S9C, and S12A/T13C. In each of these cases, the synthetase was able to convert the unnatural Cys to a thiazole, though some Cys remained unprocessed in the S12A/T13C peptide (Figures 3.13C, 3.30-3.32). Together, these results demonstrate that the increased reactivity of Cys, in essence, chemically rescues processing when the activity of the synthetase would normally be halted by Ala. However, when Cys was placed at positions not heterocyclized in the wild-type PZN (i.e. I7C and I8C), susceptibility to iodoacetamide labeling demonstrated that these Cys residues remained unmodified, whereas other positions of BamA were processed as in the wild-

98 type peptide (Figures 3.13C, 3.33). Such exquisite site-selectivity is consistent with that previously observed in E. coli (41).

Figure 3.30. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-T5A/T6C. (a) 5-azole species; (b) 4-azole species. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residues.

99

Figure 3.31. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-T6A/S9C. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; bold, mutated residues.

100

Figure 3.32. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-S12A/T13C. (a) 9-azole species; (b) 8-azole species. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; bold, mutated residues.

101

Figure 3.33. Iodoacetamide labeling of heterocyclized and TEV-cleaved BamA-I7C and BamA- I8C demonstrates that one Cys residue in each peptide remains unmodified. The peak at m/z 5203 in the labeled spectra represents a doubly-labeled species.

To further examine the effects of disrupting the natural arrangement of heterocyclizable residues, several precursors with contracted core peptides were constructed and assayed for modification by the Bam synthetase. Shortening the precursor peptide by one residue in the center of either block of five heterocyclizable residues (C4Δ, S11Δ; Δ indicates deletion) did not significantly disrupt processing, as assessed by the MALDI-TOF-MS endpoint assay, resulting in the maximum of nine heterocycles installed (Figures 3.13D, 3.34, 3.35). Likewise, truncation of the precursor peptide to remove the C-terminal half of the core peptide (I8*; * indicates a stop codon) also led to the installation of five heterocycles (Figures 3.13D, 3.36). Finally, removal of the C-terminal Phe residue (F14*) prevented heterocyclization only at the preceding residue, Thr13 (Figures 3.13D, 3.37), indicating that the Bam synthetase is unable to install a C-terminal heterocycle.

102

Figure 3.34. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-C4Δ. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; brown, azoline.

Figure 3.35. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-S11Δ. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole; brown, azoline.

103

Figure 3.36. MALDI-TOF-MS of heterocyclized and TEV-cleaved BamA-I8*, demonstrating installation of all five possible azoles.

Figure 3.37. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-F14*. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole.

In addition to the precursors with one or two substituted residues, three potential substrates with more extensive alterations were also assayed. These chimeric peptides consisted of the leader peptide from BamA and the core peptides from three other BGCs in the PZN family. These peptides were previously assayed in E. coli, where no post-translational modifications were observed (41). However, the reaction conditions using purified enzymes now enabled modification of all three chimeric peptides (Figures 3.13E, 3.38-3.40). The ability of the synthetase to modify the BamA-CurA and BamA-BlinA chimeras is especially notable, as each contains an internal Pro residue. As previously demonstrated in this work, the presence of a Pro residue in the BamA core peptide prevented heterocyclization in positions C-terminal to the Pro (Figure 3.13A). Yet in the

104 case of BamA-CurA and BamA-BlinA, Pro8 is followed immediately by several Cys residues. We hypothesize that the increased nucleophilicity of the Cys sidechain again may be chemically rescuing an otherwise unprocessed substrate, as was the case for the Ala/Cys double mutants (Figure 3.13C). The processing of BamA-CurA also demonstrates site-selectivity on the part of the PZN dehydrogenase, as it did not oxidize the penultimate position to the thiazole, whereas CurB oxidized all azolines in CurA to the corresponding azoles (Figures 3.5B, 3.41).

Figure 3.38. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-CmsA chimera. (a) 10-ring species; (b) 7-ring species. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; brown, azoline.

105

Figure 3.39. CID-based fragmentation of the [M+5H]5+ species of heterocyclized and TEV- cleaved BamA-BlinA chimera. (a) 9-ring species; (b) 7-ring species. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; brown, azoline.

106

Figure 3.40. CID-based fragmentation of the [M+4H]4+ species of heterocyclized and TEV- cleaved BamA-CurA chimera. (a) 10-ring species; (b) 9-ring species. Peaks due to fragmentation of the leader peptide are not labeled for clarity; see Figure 3.6. Red, thiazole; blue, (methyl)oxazole; brown, azoline.

107

Figure 3.41. CID-based fragmentation of the [M+4H]4+ species of heterocyclized and TEV- cleaved CurA. Only a few of the major fragment ions from the leader peptide are labeled for clarity. Red, thiazole; blue, (methyl)oxazole.

In addition to the heterocyclizable residues, the other key position in BamA at which substitution was not tolerated by the heterocycle synthetase in E. coli was Arg1 (41). Previously, it was unclear whether the intolerance toward even the most conservative substitution (Arg to Lys) at this position was due to selectivity by the cyclodehydratase or prevention of leader peptide cleavage. With the Bam synthetase reconstituted in vitro, its tolerance for alternative residues at the N-terminus of the core peptide could now be assessed in the absence of the leader peptidase. In vitro, the R1K construct was accepted by the Bam synthetase, though not robustly, as indicated by the presence of a number of incompletely cyclized species (Figure 3.42A). In addition to the “conservative” R1K substitution, a variety of other alterations to this position were also assayed. The Bam synthetase was revealed to be more permissive to small residues (R1G, R1S) than to bulky (R1I, R1F) or negatively-charged (R1D) residues, though only in the case of R1K could the synthetase install the full complement of ten heterocycles (Figure 3.42A). The initial enzymatic rate was also determined for the synthetase in the presence of each of these Arg1 mutants, and those substrates which permitted the installation of more heterocycles also generally had greater initial velocities (Figure 3.42B). The Bam synthetase thus appeared to strongly favor Arg1.

108

Figure 3.42. Processing of Arg1 mutants of BamA by the Bam synthetase in vitro. (a) MALDI- TOF-MS of peptides after overnight treatment with the Bam synthetase. Numbers above peaks indicate the number of heterocycles installed. (b) Initial reaction rate for the substrates shown in (a), as determined by PNP assay for phosphate concentration (see Methods) and normalized to the wild-type rate. Error bars represent standard deviation; n=3.

In addition to determining the relative rates of processing for the Arg1 variants, several other precursor peptides were kinetically assayed using the Bam synthetase (Table 3.1). The observed Vo values for these substrate variants were relatively unperturbed regardless of the number of heterocycles installed or the substituted position in the core. Such similarity suggested that the first cyclodehydration event (Cys2) is the rate-determining step, and this result offers further support that modification proceeds in an N- to C-terminal direction. Indeed, many data reported herein would be difficult to reconcile if an alternative processing order was operating.

109

Meanwhile, in contrast to the similar Vo values, the apparent Km values varied greatly. Although it can be difficult to interpret kinetic parameters for a multistep reaction where substrate binding is primarily dictated at a location distal to the active site (i.e. the C-protein engages the leader peptide,

vide infra) (12), lower apparent Km values correlated with an increased total number of modifications, such as with the native substrate and T13P (Figure 3.13A). Such a trend suggests that substrates receiving more modifications interact more productively with the active site, which

is expected for any substrate with a lower Km. The higher apparent Km for the T3A and T6A variants also suggests that these Thr residues enhance productive interactions with the enzyme, perhaps through the β-methyl substituent on their side chains.

Table 3.1. Kinetics of select BamA substrates with the Bam synthetase.

Mutant Vo (µM Pi Apparent Heterocycles -1 min ) Km (µM) installed WT 1.76 ± 0.09 3.22 ± 1.08 10 T3A 1.71 ± 0.13 26.8 ± 5.9 2 T6A 2.74 ± 0.25 55.5 ± 11.3 4 I8P 2.35 ± 0.12 7.61 ± 2.28 5 S12A 2.35 ± 0.19 8.36 ± 2.86 8 T13P 2.48 ± 0.09 5.73 ± 1.26 8 C4Δ 1.88 ± 0.22 9.37 ± 4.3 9 BamA-CurA 1.78 ± 0.12 12.3 ± 2.8 10

3.4 Leader peptide recognition and binding by the PZN synthetase In order to more fully understand the substrate scope of the heterocycle synthetases, we next investigated the contribution of the leader peptide towards substrate binding and processing. Previous work has shown that CurC, and not CurD, interacts with a CurA leader peptide, exemplifying the previously described role for the C-protein component of the heterocycle synthetase in leader peptide binding via the RiPP recognition element (RRE) domain (12). Building on these previous results, we examined the interaction between an N-terminally FITC- labeled BamA leader peptide (FITC-BamA-LP, Table 3.2) and the synthetase proteins by fluorescence polarization (FP). Unexpectedly, BpumC on its own was incapable of avid FITC-

BamA-LP binding, with an extrapolated Kd of approximately 100 µM (Figure 3.43). In the

presence of BamD, the Kd decreased to 20 ± 2 µM, though BamD alone did not substantially bind

110

the leader peptide (Figure 3.43), indicating that BpumC was indeed the primary engaging protein, in line with previous results. The surprisingly weak binding by BpumC alone was not a result of the non-cognate nature of the BamA/BpumC interaction, since the binding of FITC-BpumA-LP

(the cognate sequence) to BpumC gave comparable results (Kd ~100 µM, extrapolated; Figure 3.43). In light of the observation that the full, trimeric PZN heterocycle synthetase was required to modify BamA (Figure 3.9), we hypothesized that BpumC requires its associated partners for efficient leader peptide binding. To test this, BamB-R93A, which lacks the FMN cofactor, was employed so that the intrinsic fluorescence of the flavin would not interfere in the FP assay. When BamB-R93A, BpumC, and BamD were combined to form the heterocycle synthetase, leader peptide binding was considerably enhanced (Kd = 1 ± 0.1 µM, Figure 3.43). This result is in stark contrast to several previously studied TOMM synthetases, where the C-protein, which contains the ~90 residue RRE, was sufficient for effective peptide binding (12, 46). We hypothesize that some C-proteins leverage their association with other synthetase components to induce structural rearrangements that potentiate leader peptide binding (49).

Table 3.2. Sequences of fluorescent peptides used in this study. Fluorescein isothiocyanate (FITC)-conjugated peptides were synthesized (Genscript) with an amino hexyl linker (Ahx) and a single glycine spacer, followed by the leader peptide (LP) sequences of BamA, BpumA, CmsA, and CurA. FITC-CurA-LP represents the CurA leader peptide with an N(-26)S substitution (equivalent to the wild-type sequence for BlinA; Figure 3.2B). This peptide binds to CurC with the same affinity as the wild-type CurA peptide (12).

Peptide Sequence FITC-BamA-LP FITC-Ahx-GMTQIKVPTALIASVHGEGQHLFEPMAA FITC-BpumA-LP FITC-Ahx-GMTKITIPTALSAKVHGEGQHLFEPMAA FITC-CmsA-LP FITC-Ahx-GMITTTALPRAAAVTTTVYGEGLHLFEPMA FITC-CurA-LP FITC-Ahx-GMSTLISKLPPAVSTDSSKIVSEVQAFEPT

111

Figure 3.43. Fluorescence polarization (FP) binding assay with BamB-R93A/BpumC/BamD and FITC-labeled BamA leader peptide (LP) (a) or FITC-labeled-BpumA-LP (b). Kd values not shown are >150 μM. Error bars represent s.d. of at least three independent measurements. Error on Kd values are given as the s.e.m. from regression analysis. For binding curves that do not reach saturation, Kd values were extrapolated.

To investigate the importance of various precursor peptide regions for substrate binding, we constructed a series of C-terminal truncations (Table 3.3). The results indicated that not only did the core region not detectably contribute to the overall binding of the peptide, but also that the first 24 residues of the precursor peptide were minimally necessary to maintain wild-type binding affinity. Next, Ala scanning of BamA was performed to identify individual residues of the leader peptide (designated by negative numbers) responsible for interactions with the synthetase. As

measured by a competition FP assay (Table 3.3), BamA and BpumA had nearly equal IC50 values for the synthetase, which were also similar to the Kd of their fluorescent derivatives. Overall, most of the single Ala substitutions had only modest effects on the binding of the peptide, although P(- 21)A and L(-18)A had a much more substantial effect, which was borne out by kinetic assays (Table 3.3).

112

Table 3.3. Binding and processing of leader peptide mutants. b c Peptide IC50 (μM) in vitro (%) in vivo BamA-WT 2.8 ± 0.3 100 ± 9.8 +++ BamA-T(-26)A 5.4 ± 0.6 108 ± 6.7 +++ BamA-Q(-25)A 4.4 ± 0.6 94 ± 9.8 ++ BamA-I(-24)A 5.5 ± 0.7 117 ± 5.4 + BamA-K(-23)A 5.9 ± 0.6 89 ± 5.1 ++ BamA-V(-22)A 4.8 ± 0.3 102 ± 1.0 + BamA-P(-21)A >100 54 ± 8.6 –d BamA-T(-20)A 5.5 ± 0.7 95 ± 2.7 ++ BamA-L(-18)A >100 63 ± 2.9 – BamA-I(-17)A 2.7 ± 0.3 99 ± 7.1 ++ BamA-S(-15)A 9 ± 2 100 ± 1.3 + BamA-V(-14)A 8.2 ± 0.9 110 ± 11.7 + BamA-H(-13)A 13 ± 3 106 ± 1.4 + BamA-G(-12)* 58 ± 8 n.d. n.d. BamA-Q(-9)* 13.5 ± 2.7 n.d. n.d. BamA-F(-6)* 6.8 ± 1 n.d. n.d. BamA-M(-3)* 2.4 ± 0.3 n.d. n.d. BpumA-WT 1.9 ± 0.1 105 ± 5.3 + CurA-WT 22 ± 4 108 ± 4 n.d.e CurA-N(-26)Sa 24 ± 4 100 ± 9 n.d. CurA-P(-23)A >150 7.1 ± 0.7 n.d. CurA-P(-22)T 53 ± 9 105 ± 1 n.d. CurA-V(-20)A >150 44.4 ± 0.3 n.d. aCurA-N(-26)S represents leader sequence of BlinA. bNormalized to wild-type BamA or CurA, respectively. cDefined relative to that of the plasmid-borne wild-type BamA: 100–75% (+++), 75–25% (++), 25–5% (+). dLimit of detection estimated to be 5% based on requiring that the signal be 3 times greater than baseline noise. en.d., not determined. To ensure that these observations were not in vitro artifacts, a previously described E. coli heterologous production system was used to assess processing of the BamA variants in vivo (41). In this assay, the production of the mature PZN-I7V variant (a known, well-tolerated core peptide substitution) from a plasmid-borne copy of the bamA gene containing a leader peptide substitution was compared to the production of the wild-type PZN from a fosmid-borne copy, as measured by MALDI-TOF-MS (Table 3.3; Figure 3.44). Overall, these in vivo results corroborated those obtained in vitro, indicating that two residues, Pro(-21) and Leu(-18), are primarily responsible for the peptide’s interaction with the synthetase in what appears to be a -21PXXL-18 recognition motif. Interestingly, in other precursor peptides within the PZN family, this Pro is highly conserved, whereas the Leu is less so (Figure 3.2B). Furthermore, these results indicate that the conserved

113

residues at the C-terminus of the leader region (FEPXAA), which are likely part of a conserved protease recognition motif, contribute only modestly to the overall peptide affinity.

Figure 3.44. Representative MALDI-TOF-MS spectra for in vivo production of BamA leader peptide mutants, demonstrating how the ratio between m/z 1322 and m/z 1336 was used to determine the acceptance of the indicated leader peptide substitution by the synthetase.

Select CurA peptides with substitutions at the equivalent positions in the leader peptide were also assayed for binding and processing by the Cur synthetase in vitro (Table 3.3). In line with the conserved nature of the position (Figure 3.2B), Pro(-23) in CurA was crucial for binding, but not the neighboring non-conserved Pro(-22). Unsurprisingly, the binding of CurA-N(-26)S, whose leader peptide is identical to that of BlinA (12), was virtually no different from that of wild- type CurA. Taken together with the leader peptide binding experiments using the Bam synthetase, these results identify the critical residues that mediate recognition by the PZN heterocycle synthetases. Beyond being necessary for leader peptide binding, we next sought to determine if these key residues were sufficient for binding by assaying a variety of precursor peptides from the PZN family with non-cognate synthetases. In the FP assay, the Bam synthetase exhibited weakened binding to CmsA compared to BamA or BpumA and none to BlinA, while CurC (sufficient for precursor peptide binding) exhibited binding only to the highly similar BlinA and CmsC only to

114

CmsA (Table 3.4). These observations were borne out by a MALDI-TOF-MS processing assay, which demonstrated that the Bam synthetase was capable of installing ten heterocycles on either BamA or BpumA, but only six on the less-similar CmsA, and none on either BlinA or CurA (Figure 3.45). Likewise, the Cur synthetase installed heterocycles only on CurA or BlinA, and not on BamA, BpumA, or CmsA (Figure 3.45). As the C-protein provides the main contact for binding to the leader peptide, it was feasible that substituting the C-protein in these synthetase complexes could enable binding, and thus processing, of non-cognate precursor peptides. However, these “hybrid synthetases” were not active, likely due in part to the inability to form productive complexes between a C-protein and its non-cognate partners (Figure 3.46). Additional attempts to confer binding of non-cognate substrates through simply swapping the RRE domains of these C- proteins were unsuccessful (data not shown).

Table 3.4. Dissociation constants (Kd, µM) measured by FP assay between various leader peptides and C-proteins. Error is the standard error of the mean.

BamA-LPa BpumA-LP CmsA-LP CurA-LP BpumC ~100b ~100 >150 >150 CmsC >150 >150 14 ± 3 >150 CurC >150 >150 >150 7 ± 1 BlinC >150 >150 >150 13 ± 1 PznBCDb 1.0 ± 0.1 0.8 ± 0.1 ~75 ~150 aLP, fluorescein-tagged leader peptide; see Table 3.2 b~, extrapolated cPznBCD = BamB-R93A, BpumC, BamD

115

Figure 3.45. Evolutionarily divergent PZN peptides are not modified by the Bam synthetase. (a) The maximum number of heterocycles installed on a given peptide correlates with the sequence identity (b) of its cognate C-protein to the C-protein employed for in vitro enzymatic reactions (BpumC or CurC). (c) MALDI-TOF-MS spectra from which the results in panel (a) were derived. Numbers above peaks indicate the number of heterocycles installed. #, unrelated peaks.

116

Figure 3.46. BpumC, CmsC, BlinC, and CurC were each tested with BamB/D and with CurB/D to see if an alternative C-protein would be capable of expanding the substrate scope of the synthetase. (a) Most non-cognate C-proteins could not form a complex with the other partner proteins as assessed by size exclusion chromatography. Dashed lines indicate the elution time of the complex; * indicates void, including protein aggregate. (b) Proteins which could not form a trimeric heterocycle synthetase resulted in little or no processing of alternative peptide substrates. Numbers above peaks indicate the number of heterocycles installed.

These non-cognate experiments demonstrated that relatively few conserved leader peptide residues [i.e. Pro(-21) in BamA] are necessary but not sufficient for modification. These results also suggest that synthetase recognition is encoded across the length of the peptide, even though affinity is primarily localized to a few “hot spots” (50). Even though most single substitutions had a minor effect, multiple residue differences in the leader peptide (e.g. between BamA and CurA) clearly prevented non-cognate processing, as the Bam synthetase was able to fully process non- cognate core peptides when its cognate leader was present (Figure 3.13E). Interestingly, CmsA does not contain the important Leu(-18) relative to BamA but can still be partially processed by

117

the Bam synthetase, likely due to the overall greater similarity between leader peptides for CmsA and BamA, compared to BamA and CurA (Figure 3.45). Overall, the ability of these various synthetases to process non-cognate substrates appears to correlate with evolutionary distance.

3.5 Production of novel PZN analogs Having investigated the substrate tolerance of the Bam synthetase, it was of interest whether these variant precursor peptides could be used to generate mature PZN analogs. Following heterocyclization, the next biosynthetic step is removal of the leader peptide, which is putatively performed by the protease encoded in the BGC (bamE, Figure 3.1B) (15, 16). As a predicted type II CaaX protease, BamE is thought to be an integral membrane protein (51); thus, we treated BamA that had been heterocyclized in vitro with a crude preparation of B. velezensis membranes. Unfortunately, no proteolytic product was observed (data not shown). To facilitate the in vitro production of PZN analogs, we introduced a cleavage site for a soluble, commercial protease, as has been previously successful for the production of other RiPPs (52-55). Using site-directed mutagenesis, Ala(-1) was changed to one of several other residues: Lys (trypsin), Phe (chymotrypsin), Glu (endoproteinase GluC), and Met (cyanogen bromide, CNBr). A(-1)M was the best-tolerated substitution in combination with enzymatic heterocyclization, as determined by MALDI-TOF-MS (Figure 3.47A). Subsequent treatment of the heterocycle-containing peptide with CNBr in the presence of formic acid (56, 57) yielded a species whose mass was consistent with that of hydrolyzed desmethylPZN, indicating cleavage of the leader peptide (Figure 3.47B). Unfortunately, the acidic conditions necessary for the CNBr reaction also hydrolyzed the methyloxazoline normally present in PZN (Figure 3.1A).

118

Figure 3.47. (a) Bam synthetase processing of Ala(-1) mutants for determining the best engineered leader peptide cleavage site. (b) CNBr cleavage of synthetase-treated BamA-A(-1)M removes the leader peptide to yield the 9-azole and 8-azole analogs of desmethylPZN (dmPZN). *, leader peptide cleavage site; red, thiazole; blue, (methyl)oxazole.

The A(-1)M substitution was introduced into select BamA variants, in addition to CurA, to enable chemical removal of the leader peptide. Following CNBr treatment, these peptides were then subjected to N-terminal dimethylation by the SAM-dependent methyltransferase involved in PZN biosynthesis (BamL), whose activity had previously been reconstituted in vitro (16-18). BamL dimethylated each of these PZN variants, including the CurA variant, as determined by MALDI-TOF-MS (Figure 3.48). Though the methyltransferase has significant substrate selectivity owing to its narrow substrate-binding channel (17), all of the PZN variants used in these reactions contained both the N-terminal Arg and at least one azole, which is minimally necessary for robust methyltransferase activity. Although enzymatic dimethylation was successful, more robust

119 modification was achieved by reductive alkylation with formaldehyde and borane-pyridine (Figure 3.49) (58). Regardless of the method of dimethylation, this sequence of reactions enabled the production of a new collection of PZN variants beyond what was previously achievable in vivo (41).

Figure 3.48. Enzymatic dimethylation by BamL of heterocyclized BamA after leader peptide removal, as assessed by MALDI-TOF-MS. (a) BamA; (b) BamA-S12A; (c) BamA-T13P; (d) BamA-F14*; (e) CurA; (f) BamA-I8*. Red, desmethyl species; blue, dimethylated species. Note different X-axis scale in (f).

120

Figure 3.49. Chemical dimethylation of heterocyclized BamA after leader peptide removal, as assessed by MALDI-TOF-MS. (a) BamA; (b) BamA-C4Δ; (c) BamA-S11Δ; (d) BamA-S12A; (e) BamA-T13A; (f) BamA-T13P; (g) BamA-S12A/S13C; (h) BamA-F14*; (i) CurA; (j) BamA- I8*. Red, desmethyl species; blue, dimethylated species. Note different X-axis scale in (j).

121

Through the system of treating CurA-A(-1)M with the Cur synthetase, CNBr-based removal of the leader peptide, and reductive alkylation of the N-terminus, we obtained sufficient quantities of the PZN-like natural product from C. urealyticum to enable FT-ICR-MS/MS analysis of the purified compound, which corroborated the proposed structure containing ten azoles and N- terminal dimethylation (Figures 3.50A, 3.51). 1H NMR also established the presence of sharp singlets consistent with azole formation (Figure 3.52). Similar to the name “plantazolicin,” we have named this anticipated natural product “coryneazolicin” (CZN). Our production of CZN demonstrates the use of in vitro biosynthesis to generate a predicted natural compound whose production has not been observed from the native producer. To establish if CZN is naturally produced will require further investigation.

Figure 3.50. Proposed structures for the novel natural products coryneazolicin (CZN) (a) and badiazolicin (BZN) (b), as supported by FT-ICR-MS/MS (see also Figures 3.51 and 3.54).

122

Figure 3.51. CID-based fragmentation of the [M+H]+ species of CZN.

Figure 3.52. 1H NMR spectral expansion indicating presence of azole moeities in CZN. The singlets correspond to the presence of thiazole (seven) and oxazole (one) C-H groups.

123

After obtaining CZN, we next turned our attention to recently identified PZN-like BGCs from other bacteria (Figure 3.2). Several of these BGCs encode precursor peptides with substitutions in the core region compared to BamA, the products of which could aid in further determination of the structure-activity relationships of the PZN class of natural products. Although the unannotated precursor peptide from Bacillus badius, BadA, lies in a different location relative to other genes in the cluster (Figure 3.1B), it only differs from BamA by two residues in the core region (Figure 3.1C). Methanolic extracts of B. badius contained an m/z 1300 species consistent with the expected mass of the precursor peptide with PZN-like modifications. Furthermore, a less intense species was found with m/z 1318, consistent with a PZN-like compound where the methyloxazoline moiety was hydrolyzed (Figure 3.53) (3). Purification of “badiazolicin” (BZN) enabled FT-ICR-MS/MS analysis, which supported a PZN-like structure containing nine azoles, one azoline, and N-terminal dimethylation (Figure 3.50B, 3.54). After quantification by 1H NMR, BZN exhibited a minimum inhibitory concentration (MIC) of 2 µg/mL against B. anthracis by microbroth dilution assay, but no growth inhibitory activity towards B. subtilis (MIC > 16 µg/mL) or methicillin-resistant S. aureus (MIC > 32 µg/mL).

Figure 3.53. MALDI-TOF-MS of methanolic extract from Bacillus badius ATCC 14574, indicating the presence of BZN. The core peptide from the PZN gene cluster in this strain is shown with the same color coding as in Figure 3.1C to indicate post-translational modifications.

124

Figure 3.54. CID-based fragmentation of the [M+H]+ species (a) and [M+2H]2+ species (b) of BZN. (c) Fragmentation of the [M+H]+ species in (a). (d) Structure of BZN based on fragmentation in (c).

In conclusion, we have demonstrated the in vitro reconstitution of two heterocycle synthetases involved in the biosynthesis of PZN by B. velezensis and CZN by C. urealyticum. Overall, the synthetases in vitro afforded a greater degree of biosynthetic insight than previous work with PZN biosynthesis in cells, enabling characterization of synthetase complex assembly and substrate binding properties. Using the Bam heterocycle synthetase, we have explored substrate tolerance in the PZN precursor peptide and proposed that cyclodehydration occurs in an N-terminal to C-terminal direction. These synthetases were then used to achieve the total in vitro biosynthesis of a number of PZN variants, culminating in the isolation of two novel natural products, CZN and BZN.

3.6 Methods 3.6.1 General methods All general molecular biology reagents were purchased from Fisher Scientific or Gold Biotechnology. Chemicals were purchased from Sigma-Aldrich. DNA sequencing was performed by the Roy J. Carver Biotechnology Center (UIUC). Restriction enzymes and dNTPs were

125

purchased from New England Biolabs (NEB), and PfuTurbo DNA polymerase was purchased from Agilent. Oligonucleotide primers were synthesized by Integrated DNA Technologies (IDT). Synthetic genes and fluorescent peptides were synthesized by GenScript.

3.6.2 Cloning, protein expression, and protein purification The coding sequences for BamA, BamB, BamD, BpumA, BpumC, CurB, and CmsA were PCR amplified from genomic DNA of their respective producing organisms (Bacillus methylotrophicus FZB42, Bacillus pumilus ATCC 7061, Corynebacterium urealyticum DSM 7109, or Clavibacter michiganensis subsp. sepedonicus) with Taq or Pfu polymerase and the primers listed in Table 3.5. PCR amplicons were then digested and ligated using standard methods into a modified pET28b vector containing the sequence of maltose-binding protein (MBP) and the cleavage site for tobacco etch virus (TEV) protease (41, 43). The coding sequence for BamCopt (vide infra) was synthesized by GenScript and likewise subcloned into the modified pET28b vector. Equivalent expression vectors for CurA, CurC, CurD, BamA-CmsA, BamA-BlinA, and BamA-CurA have been previously reported (12, 41). All site-directed mutagenesis was performed on the pertinent construct using the Quikchange™ method with Pfu Turbo polymerase (Agilent) and primer pairs listed in Tables 3.6 and 3.7. DNA sequencing used a custom MBP forward primer (5’-GAGGAAGAG TTGGCGAAAGATCCACGTA-3’) and/or T7 reverse primer. Expression and purification of MBP-tagged proteins from the pET28 construct were performed as previously described using amylose affinity resin (43). Select BamA variant precursor peptides were expressed and purified from previously described pBAD24 constructs (41) using the same protocol as for pET28 constructs with the exception that IPTG induction was performed for 2 h at 30 °C.

126

Table 3.5. Oligonucleotide primers used for cloning in this study. BamA BamHI fw AAAAGGATCCATGGAGGAGGTAACAATTATGACTCAAATTAAAGTGC BamA NotI rv AAAAGCGGCCGCTTAAAACGTAGATGAACTAGAGATGATTGTGGTACAGG BamB BamHI fw AAAAGGATCCATGAAAGAGATTGAAAGGCATGCCACTAATTTGG BamB NotI rv AAAAGCGGCCGCTCATGTTCCTGCAACTAAAGTATATATAGGGGCATC BamB R93A fw GACAAGTGTATCATTAGATGAAGTTATTCAAAATGCGAGAAGTATCGAACAGTT BamB R93A rv AACTGTTCGATACTTCTCGCATTTTGAATAACTTCATCTAATGATACACTTGTC BamB Y206A fw GCTAGCTTTTGGAGGTCAAGGTTTAAAGCTGGCCATCGTTCAT BamB Y206A rv ATGAACGATGGCCAGCTTTAAACCTTGACCTCCAAAAGCTAGC BamC(opt) BamHI fw AAAGGATCCATGAGCCAACAAACGCATAG BamC(opt) NotI rv AAAGCGGCCGCTTATTTAGCCACTTCGCCG BamD BamHI fw AAAAGGATCCATGGTGAGTAAATGGACGACATTAGAAGATTCATGG BamD NotI rv AAAAGCGGCCGCTTATGGGAATGGATGAGGTACTGGATTAAGTGTTTC BpumA BamHI fw AAAAGGATCCATGACTAAAATTACAATTCCAACTGCTTTG BpumA NotI rv AATTGCGGCCGCTTAGAAAGTTGAAGAACTTGAGATGATTG BpumC BamHI fw AAAAGGATCCATGTTTGTTAAAGAACAAGAAAGCACG BpumC NotI rv AAAAGCGGCCGCTTAATTTTTCACCTCACCATGAAACC CurB BamHI fw AAAGGATCCATGTCATCCGCAGCCGAAGC CurB NotI rv AAAGCGGCCGCTCATGTCTCAGATCCTAAGAGCACCAC CmsA BamHI fw AAAAGGATCCATGATCACAACCACCGCTC CmsA NotI rv AAAAGCGGCCGCTCAACCCCAGGTGCAC

127

Table 3.6. Oligonucleotide primers used for mutagenesis of leader peptide residues in this study. BamA T(-26)A fw ATGGAGGAGGTAACAATTATGGCTCAAATTAAAGTGCCAACTG BamA T(-26)A rv CAGTTGGCACTTTAATTTGAGCCATAATTGTTACCTCCTCCAT BamA Q(-25)A fw CAATTATGACTGCAATTAAAGTGCCAACTGCTCTTATTGCA BamA Q(-25)A rv GGCACTTTAATTGCAGTCATAATTGTTACCTCCTCCATGGA BamA I(-24)A fw GAGGAGGTAACAATTATGACTCAAGCTAAAGTGCCAACTGCTCTTATTGC BamA I(-24)A rv GCAATAAGAGCAGTTGGCACTTTAGCTTGAGTCATAATTGTTACCTCCTC BamA K(-23)A fw TGACTCAAATTGCAGTGCCAACTGCTCTTATTGCAAGTGTA BamA K(-23)A rv GCAGTTGGCACTGCAATTTGAGTCATAATTGTTACCTCCTC BamA V(-22)A fw CTCAAATTAAAGCGCCAACTGCTCTTATTGCAAGTGTACAC BamA V(-22)A rv AGAGCAGTTGGCGCTTTAATTTGAGTCATAATTGTTACCTC BamA P(-21)A fw CAATTATGACTCAAATTAAAGTGGCAACTGCTCTTATTGCAAGTGTA BamA P(-21)A rv TACACTTGCAATAAGAGCAGTTGCCACTTTAATTTGAGTCATAATTG BamA L(-18)A fw TGCCAACTGCTGCTATTGCAAGTGTACACGGAGAAGGTCAA BamA L(-18)A rv ACACTTGCAATAGCAGCAGTTGGCACTTTAATTTGAGTCAT BamA I(-17)A fw CAACTGCTCTTGCTGCAAGTGTACACGGAGAAGGTCAACAT BamA I(-17)A rv TGTACACTTGCAGCAAGAGCAGTTGGCACTTTAATTTGAGT BamA S(-15)A fw CTCTTATTGCAGCTGTACACGGAGAAGGTCAACATTTATTC BamA S(-15)A rv TCTCCGTGTACAGCTGCAATAAGAGCAGTTGGCACTTTAAT BamA V(-14)A fw TTATTGCAAGTGCACACGGAGAAGGTCAACATTTATTCGAG BamA V(-14)A rv CCTTCTCCGTGTGCACTTGCAATAAGAGCAGTTGGCACTTT BamA H(-13)A fw TTGCAAGTGTAGCCGGAGAAGGTCAACATTTATTCGAGCCA BamA H(-13)A rv TGACCTTCTCCGGCTACACTTGCAATAAGAGCAGTTGGCAC BamA G(-12)A fw TCTTATTGCAAGTGTACACGCAGAAGGTCAACATTTATTCG BamA G(-12)A rv CGAATAAATGTTGACCTTCTGCGTGTACACTTGCAATAAGA BamA E(-11)A fw ATTGCAAGTGTACACGGAGCAGGTCAACATTTATTCGAG BamA E(-11)A rv CTCGAATAAATGTTGACCTGCTCCGTGTACACTTGCAAT BamA H(-8)A fw TGTACACGGAGAAGGTCAAGCTTTATTCGAGCCAATGGCT BamA H(-8)A rv AGCCATTGGCTCGAATAAAGCTTGACCTTCTCCGTGTACA BamA F(-6)A fw TGTACACGGAGAAGGTCAACATTTAGCCGAGCCAATGGCT BamA F(-6)A rv AGCCATTGGCTCGGCTAAATGTTGACCTTCTCCGTGTACA BamA E(-5)A fw AAGGTCAACATTTATTCGCGCCAATGGCTGCACGC BamA E(-5)A rv GCGTGCAGCCATTGGCGCGAATAAATGTTGACCTT BamA P(-4)A fw GGTCAACATTTATTCGAGGCAATGGCTGCACGCTG BamA P(-4)A rv CAGCGTGCAGCCATTGCCTCGAATAAATGTTGACC BamA G(-12)stop fw AAGTGTACACTGAGAAGGTCAACATTTATTCGAGCCA

128

Table 3.6. (cont.) BamA G(-12)stop rv GACCTTCTCAGTGTACACTTGCAATAAGAGCAGTTG BamA Q(-9)stop fw CGGAGAAGGTTAACATTTATTCGAGCCAATGGCTG BamA Q(-9)stop rv CTCGAATAAATGTTAACCTTCTCCGTGTACACTTGCAATAAG BamA F(-6)stop fw CAACATTTATAGGAGCCAATGGCTGCACGCTGTAC BamA F(-6)stop rv CCATTGGCTCCTATAAATGTTGACCTTCTCCGTGTAC BamA M(-3)stop fw ATTCGAGCCATAGGCTGCACGCTGTACCTGTAC BamA M(-3)stop rv GCGTGCAGCCTATGGCTCGAATAAATGTTGACC BamA A(-1)E fw AGCCAATGGCTGAACGCTGTACCTGTACCACAATCATCTCTAG BamA A(-1)E rv CAGGTACAGCGTTCAGCCATTGGCTCGAATAAATGTTGACC BamA A(-1)F fw AGCCAATGGCTTTCCGCTGTACCTGTACCACAATCATCTCTAG BamA A(-1)F rv CAGGTACAGCGGAAAGCCATTGGCTCGAATAAATGTTGACC BamA A(-1)K fw GCCAATGGCTAAACGCTGTACCTGTACCACAATCATCTCTAG BamA A(-1)K rv CAGGTACAGCGTTTAGCCATTGGCTCGAATAAATGTTGACC BamA A(-1)M fw AGCCAATGGCTATGCGCTGTACCTGTACCACAATCATCTCTAG BamA A(-1)M rv CAGGTACAGCGCATAGCCATTGGCTCGAATAAATGTTGACC BamA A(-1)M/C4Δ fw GCCAATGGCTATGCGCTGTACCACCACAATCATCTCTAGT BamA A(-1)M/C4Δ rv GTGGTACAGCGCATAGCCATTGGCTCGAATAAATGTTGACC BamA A(-1)M/I8* fw GTACCACAATCTAGTCTAGTTCATCTACGTTTTAAGCGGCC BamA A(-1)M/I8* rv GATGAACTAGACTAGATTGTGGTACAGGTACAGCGCATAGC CurA N(-26)S fw CCACACTAATCTCCAAGCTGCCTCCCGCAGTTTCAACTGAT CurA N(-26)S rv GGAGGCAGCTTGGAGATTAGTGTGGACATGGATCCGGATTG CurA P(-23)A fw TCAACAAGCTGGCTCCCGCAGTTTCAACTGATTCTTCAAAG CurA P(-23)A rv GAAACTGCGGGAGCCAGCTTGTTGATTAGTGTGGACATGGA CurA P(-22)T fw ACAAGCTGCCTACCGCAGTTTCAACTGATTCTTCAAAGATT CurA P(-22)T rv GTTGAAACTGCGGTAGGCAGCTTGTTGATTAGTGTGGACAT CurA V(-20)A fw TGCCTCCCGCAGCTTCAACTGATTCTTCAAAGATTGTTTCC CurA V(-20)A rv GAATCAGTTGAAGCTGCGGGAGGCAGCTTGTTGATTAGTGT CurA A(-1)M fw AGCCTACCGCTATGCGTTGCTCCTGCACCACAATCCCGTGC CurA A(-1)M rv CAGGAGCAACGCATAGCGGTAGGCTCGAATGCCTGAACTTC

129

Table 3.7. Oligonucleotide primers used for mutagenesis of core peptide residues in this study. BamA R1D fw CAATGGCTGCAGACTGTACCTGTACCACAATCATCTCTAGT BamA R1D rv GTACAGGTACAGTCTGCAGCCATTGGCTCGAATAAATGTTG BamA R1F fw CAATGGCTGCATTCTGTACCTGTACCACAATCATCTCTAGT BamA R1F rv GTACAGGTACAGAATGCAGCCATTGGCTCGAATAAATGTTG BamA R1G fw CAATGGCTGCAGGCTGTACCTGTACCACAATCATCTCTAGT BamA R1G rv GTACAGGTACAGCCTGCAGCCATTGGCTCGAATAAATGTTG BamA R1I fw CAATGGCTGCAATCTGTACCTGTACCACAATCATCTCTAGT BamA R1I rv GTACAGGTACAGATTGCAGCCATTGGCTCGAATAAATGTTG BamA R1K fw CAATGGCTGCAAAATGTACCTGTACCACAATCATCTCTAGT BamA R1K rv GTACAGGTACATTTTGCAGCCATTGGCTCGAATAAATGTTG BamA R1S fw CAATGGCTGCAAGCTGTACCTGTACCACAATCATCTCTAGT BamA R1S rv GTACAGGTACAGCTTGCAGCCATTGGCTCGAATAAATGTTG BamA T3A fw CTGCACGCTGTGCCTGTACCACAATCATCTCTAGTTCATCT BamA T3A rv ATTGTGGTACAGGCACAGCGTGCAGCCATTGGCTCGAATAA BamA T5A fw GCTGTACCTGTGCCACAATCATCTCTAGTTCATCTACGTTT BamA T5A rv GAGATGATTGTGGCACAGGTACAGCGTGCAGCCATTGGCTC BamA T5A/T6C fw GTACCTGTGCCTGTATCATCTCTAGTTCATCTACGTTTTAA BamA T5A/T6C rv CTAGAGATGATACAGGCACAGGTACAGCGTGCAGCCATTGG BamA T6A/S9C fw CCGCAATCATCTGTAGTTCATCTACGTTTTAAGCGGCCGCA BamA T6A/S9C rv GTAGATGAACTACAGATGATTGCGGTACAGGTACAGCGTGC BamA I8P fw GTACCACAATCCCGTCTAGTTCATCTACGTTTTAAGCGGCC BamA I8P rv GATGAACTAGACGGGATTGTGTACAGGTACAGCGTGCAGC BamA S9A/S11A fw GCTGTACCTGTACCACAATCATCGCTAGTGCATCTACGTTTTAA BamA S9A/S11A rv TGCGGCCGCTTAAAACGTAGATGCACTAGCGATGATTGTGGTACAGGTACAGC BamA S11A fw TCATCTCTAGTGCATCTACGTTTTAAGCGGCCGCACTCGAG BamA S11A rv TAAAACGTAGATGCACTAGAGATGATTGTGGTACAGGTACA BamA S12A fw TCTCTAGTTCAGCTACGTTTTAAGCGGCCGCACTCGAGCAC BamA S12A rv GCTTAAAACGTAGCTGAACTAGAGATGATTGTGGTACAGG BamA S12A/T13C fw CTAGTTCAGCTTGCTTTTAAGCGGCCGCACTCGAGCACCAC BamA S12A/T13C rv GCCGCTTAAAAGCAAGCTGAACTAGAGATGATTGTGGTACA BamA T13A fw CTAGTTCATCTGCGTTTTAAGCGGCCGCACTCGAGCACCAC BamA T13A rv GCCGCTTAAAACGCAGATGAACTAGAGATGATTGTGGTACA BamA T13P fw CTAGTTCATCTCCGTTTTAAGCGGCCGCACTCGAGCACCAC BamA T13P rv GCCGCTTAAAACGGAGATGAACTAGAGATGATTGTGGTACA BamA ^8A fw CTGTACCTGTACCACAATCGCGATCTCTAGTTCATCTACGT

130

Table 3.7. (cont.) BamA ^8A rv ACGTAGATGAACTAGAGATCGCGATTGTGGTACAGGTACAG BamA ^12A fw GTACCTGTACCACAATCATCTCTAGTTCAGCGTCTACGTTTTAAAAGC BamA ^12A rv GCTTTTAAAACGTAGACGCTGAACTAGAGATGATTGTGGTACAGGTAC BamA C4Δ fw GGCTGCACGCTGTACCACCACAATCATCTCTA BamA C4Δ rv TAGAGATGATTGTGGTGGTACAGCGTGCAGCC BamA S11Δ fw CAATCATCTCTAGTTCTACGTTTTAAGCGGCCGCACTCGAG BamA S11Δ rv GCTTAAAACGTAGAACTAGAGATGATTGTGGTACAGGTACA BamA F14* fw GTTCATCTACGTAATAAGCGGCCGCACTCGAGCACCACCAC BamA F14* rv GCGGCCGCTTATTACGTAGATGAACTAGAGATGATTGTGGT

3.6.3 Size exclusion chromatography Formation of protein complexes between the various PZN biosynthetic proteins was assessed using size exclusion chromatography. 25 µM of each MBP-tagged protein was prepared individually or in combinations as indicated with storage buffer and allowed to incubate at room temperature for at least 15 minutes. Following incubation, 5 µL of each sample was injected onto an analytical Yarra SEC-3000 column (300 × 4.6 mm, Phenomenex) pre-equilibrated with storage buffer and was monitored at 280 nm using a 1200 HPLC (Agilent) with a flow rate of 0.6 mL/min. Traces were normalized to the highest absorbance value in each sample.

3.6.4 In vitro synthetase assays Heterocycle synthetase reactions to assess substrate tolerance or necessity of individual components contained 100 µM of MBP-tagged precursor peptide, 10 µM of each of the pertinent MBP-tagged synthetase enzymes, and 40 µg/mL TEV protease in synthetase buffer [50 mM Tris

(pH 7.5), 125 mM NaCl, 20 mM MgCl2, 10 mM dithiotreitol, and 3 mM ATP] (9). Reactions were carried out in 100 µL volume for 18 h at 22 °C, at which point they were quenched by addition of 100 µL MeCN to precipitate the large proteins. After centrifugation for 5 min at 16,000 x g, the supernatant was either dried by speedvac or diluted to 12.5% MeCN by addition of water to facilitate desalting before analysis by mass spectrometry. For synthetase reactions where isolation of intermediates was attempted, synthetase enzymes were pre-treated with TEV to remove MBP and present at 2 µM in the reaction. Other

131

components were present at the concentrations listed above, and the reaction was quenched by addition of 100 µM MeCN at 10 and 30 minutes.

3.6.5 Mass spectrometry For MALDI-TOF-MS, peptide samples were desalted by C18 ZipTip according to manufacturer instructions and analyzed using a Bruker Daltonics UltrafleXtreme MALDI- TOF/TOF instrument operating in reflector/positive mode with α-cyano-4-hydroxycinnamic acid as the matrix. For high-resolution mass spectrometry and MS/MS, desalted peptides were resuspended in 80% (v/v) MeCN, 1% AcOH and centrifuged at 11000 x g for 5 min prior to analysis by direct infusion Fourier transform mass spectrometry (FT-MS). An Advion NanoMate 100 was used to directly infuse samples to an LTQ-FTMS/MS (ThermoFisher) operating at 11 T. The MS was calibrated weekly, following the manufacturer’s instructions, and tuned daily using Pierce LTQ Velos ESI Positive Ion Calibration Solution (ThermoFisher). Spectra were collected with a resolution of 100,000. Ions were selected for ion trap fragmentation or FT-MS/MS fragmentation based on signal intensity, and spectra were collected using the following parameters: isolation width of 5 m/z, normalized collision energy of 35, activation q value 0.4, activation time 30 ms. Data analysis was performed using the Qualbrowser application within ThermoFisher Xcalibur v 2.2.

3.6.6 In vitro kinetics assay Peptide processing kinetics were measured using a previously described purine nucleoside phosphorylase (PNP)-coupled assay to detect ADP production from ATP hydrolysis (9, 59). Because ATP consumption is 1:1 with heterocycle formation, this assay provides a measure of the rate of heterocycle formation. In general MBP-BamB, MBP-BpumC, and MBP-BamD were pretreated with TEV protease at room temperature to cleave the tag, but for the Cur synthetase proteins, TEV treatment was performed at 4 oC because extended incubation at room temperature caused protein precipitation. After adding the enzyme cleavage mixture to a cuvette for a final concentration of 5 µM BCD, enzymatic reactions were initiated by the addition of a peptide substrate mixed with synthetase buffer, 200 µM 7-methyl-6-mercapto-7-methylpurineriboside (Berry and Associates), and 0.2 U of PNP at 22 oC. Maximal initial rates were determined by

132 averaging the slope over the first two minutes of the reaction. The rate of processing for leader peptide mutants was determined with 30 µM BamA/BpumA or 25 µM for CurA and their respective synthetase complexes whereas the BamA-R1X mutants were tested at 100 µM. Kinetic parameters for the Bam/Bpum complex were determined using variable substrate concentrations and were calculated using the nonlinear Michaelis-Menten fit with OriginPro9.1 (OriginLab). All experiments were performed in triplicate.

3.6.7 Fluorescence polarization binding assay The interaction between FITC-conjugated precursor peptides and different synthetase proteins was measured with a previously employed fluorescence polarization (FP) assay (12). In brief, the assay was performed in nonbinding-surface, 384-black-well polystyrene microplates (Corning) with serial dilutions of the indicated protein(s) and 25 nM of the indicated fluorescent peptide in storage buffer. After 30 minutes of incubation at room temperature, the dilutions were measured using a FilterMax F5 multimode microplate reader (Molecular Devices) with λex = 485 nm and λem = 538 nm. All assays were performed in triplicate and Kd values were determined from a non-linear dose-response curve in OriginPro9.1 (OriginLab). Error bars represent standard error of the mean. To assess the interaction of non-fluorescent peptides with the PZN synthetase, a competition FP binding assay was used as previously described (49). MBP-tagged precursor peptides were serially diluted and mixed with 25 nM of a fluorescent peptide and synthetase component(s). The BamA/BpumA peptides were competed with FITC-BamA-LP to bind to 3 μM BamB-R93A, BpumC, and BamD whereas the CurA mutants were competed with FITC-CurA-

LP to bind 25 µM CurC. IC50 values were determined from the 50% inhibition point calculated using a dose response curve with Origin Pro 9.1 (OriginLab).

3.6.8 In vivo PZN production from alanine substituted precursor peptides A previously designed screen for the production of PZN analogs (41) was repurposed to examine how certain positions of the leader contributed to processing of BamA by the native PZN biosynthetic machinery in vivo. In the original system, chemically competent E. coli cells containing the entire PZN biosynthetic gene cluster (including bamA) on a fosmid were transformed with a plasmid bearing a BamA variant. Production of the plasmid-borne variant

133

could be compared to production of wild-type PZN as an indication of how relatively well- tolerated the mutant was. To assess the effect of leader mutants, a well-tolerated substitution (I7V) was incorporated into the plasmid-borne BamA and subsequently subjected to alanine scanning mutagenesis in the leader peptide. The ability of the biosynthetic machinery to tolerate the mutant could then be assessed by the relative production of PZN-I7V compared to the wild- type PZN originating from the fosmid, as determined by MALDI-TOF-MS analysis of methanolic extracts from the transformed E. coli cells after culturing these in 10 mL of Luria- Bertani (LB) media, as previously described (41).

3.6.9 Leader peptide cleavage Dried precursor peptide (after synthetase reaction and MeCN precipitation) was resuspended in 60% formic acid and 1-2 small crystals of cyanogen bromide (CNBr) were added. The reaction was allowed to proceed for 4 h at 22 °C, at which point it was quenched by addition of 5 volumes of water and desalted by ZipTip, then dried by speedvac.

3.6.10 Methylation Enzymatic dimethylation using BamL was performed using <50 µM peptide (exact concentration not known), 10 µM MBP-BamL, 10 µM Pfs, and 3 mM S-adenosylmethionine in 50 mM Tris (pH 8), for 18 h at 37 °C, as described previously (17). Chemical dimethylation was performed using <1 mM peptide (exact concentration not known), 200 mM formaldehyde, and

300 mM borane-pyridine in 10 mM NH4CO3 with 50% MeOH, for 18 h at 22 °C (58).

3.6.11 Production of CZN Synthetase reactions using MBP-tagged CurA, CurB, CurC, and CurD were performed as described above, with the exceptions that 250 µM MBP-CurA was used, the buffer contained 6 mM ATP, and reactions were carried out in replicates of 200 µL each. After precipitation with MeCN and centrifugation, the supernatants were dried by speedvac, then resuspended and pooled in 60% formic acid (v/v) for CNBr digestion as described above. After the CNBr reaction was quenched with water, the solvent was removed by blowing air. The dried reaction mixture was then resuspended in water to remove water-soluble components and centrifuged for 10 min at 16,000 x g, after which the supernatant was removed and the remaining solids resuspended in

134

MeOH to extract CZN. After centrifugation for 10 min at 16,000 x g, the MeOH supernatant (containing crude CZN) was removed and used in the chemical methylation reaction as described above. This crude CZN solution in 50% MeOH (v/v) was reverse phase purified using a Thermo BETASIL C18 column (250 mm x 10 mm; pore size: 100 Å; particle size: 5 m) at a flow rate of -1 4 mL min . A gradient of 40-95% MeOH with 10 mM NH4HCO3 in the aqueous phase over 41

min was used. The fractions containing CZN (as monitored by A254 and later verified by MALDI- TOF-MS) were collected into 20 mL borosilicate vials and the solvent was removed in vacuo.

3.6.12 Production of BZN Bacillus badius ATCC14574 was grown in Luria-Bertani (LB) broth (10 mL/18mm glass culture tube) at 37 ˚C overnight. Sterilized aluminum trays (16-7/16” x 11-5/8” cake pans) containing nutrient agar (1 L/tray) were inoculated with 3 mL of overnight culture and incubated for 48 h at 37 ˚C. Cells were harvested with a razor blade and Tris-buffered saline (TBS) (160 mL/tray) and pelleted by centrifugation (4,000 x g, 20 min, 10 ˚C). The supernatant was decanted and the pellets were stored at –20 ˚C until extraction. Crude BZN was extracted by resuspending the pellets in MeOH (160 mL/tray) through vortex agitation, and the resuspended cells were equilibrated for 4 hours at 22 ˚C on a shaking platform. The supernatant was retained after centrifugation (4,000 xg, 20 min, 10 ˚C), vacuum filtered with Whatman filter paper, and concentrated by rotary evaporation. Lyophilization of the extract yielded a yellow solid (200 mg/tray). The crude material was dissolved in 50% MeCN, where the sample separated into two layers. Both layers were added to Celite, which was dried by rotary evaporation. The dried Celite was packed into a cartridge for solid loading of BZN onto a RediSep Rf 130g C18 column (Teledyne Isco) for purification by MPLC using a Combiflash

system (30-95% MeCN/10 mM aqueous NH4HCO3 over 16 column volumes). The fractions containing BZN were pooled, concentrated by rotary evaporation, and lyophilized to dryness. BZN was then reverse phase purified using the same method as for CZN, except that a gradient of 85-

95% MeOH with 10mM aqueous NH4HCO3 over 20 min was used. The fractions containing BZN were collected in 20 mL borosilicate vials and the solvent removed in vacuo.

135

3.6.13 NMR

Samples were dissolved in 600 µL of CD3OD (99.95 atom % D, Cambridge Isotope Labs). NMR spectra were obtained with an Agilent VNMRS 750 MHz narrow bore magnet spectrometer equipped with a 5 mm triple resonance (1H-13C-15N) triaxial gradient probe and pulse-shaping capabilities. Samples were held at 298 K during acquisition. Standard Varian pulse sequences were used; a relaxation delay (d1) of 28 s was used during acquisition for quantitative NMR (qNMR), and 90° pulse widths were calibrated and used (8.50 µs for CZN acquisition; 8.75 µs for BZN acquisition). 320 transients were recorded for BZN; 2048 transients were recorded for CZN. Apodization (0.4 Hz line-broadening), phase correction, integration, integral normalization, and automated baseline correction were applied prior to qNMR processing. Spectra were recorded with VNMRJ 4.2 software, and data was processed using MestReNova 8.1.1. Chemical shifts (δ, ppm) were referenced internally to the solvent peak (methanol). For quantitative NMR, the probe was calibrated using the qEstimate tool VNMRJ on the standard 31P sample (48.5 mM triphenylphosphate in CDCl3, Varian part #: 00-968120-97).

3.6.14 Antibacterial activity assays The concentration of BZN was calculated by qNMR, which has been used for accurate quantification of microgram quantities of natural products (60). The native qNMR functionality within VNMRJ 4.2 was used to quantify BZN based on the average integration of its azole peaks, which occur as distinct singlets in the aromatic region. Determination of MIC values for BZN was performed as described previously (19).

3.7 Sequences

3.7.1 Coding sequence of BamCopt ATGAGCCAACAAACGCATAGCATTCAAACGGTCGTCCGCTTCAAACTGTCCCCGCACATCACCCTGGGCGAACTGGAA GGTAGCCTGCTGTCATTTAAAGCACAGTCGATTACCAAAGTGGATGGCGAAGGTATTGTTGAATTTTTCGAAGCTATC AGTGAATATCTGGTGTGCGAAAATGGCGTTTCCATCGAAGAACTGCATCAGCGCTTCAACCTGGAAAAAGATCAACTG CGTAATGTCTTTGACTTCCTGGTGACCAACGGTCTGTGTATTCAGAGCACCTCTAATTTTGATATGACGGCGTTCCTG GCATACGAACGTGCGGGCGGTCAGGTCGAAGTGAACCAAATTCTGCATCGTCTGAAAACGAGCATCGTGGAAGTGGTT GCACACGGCGAAAACAATCTGGCTATTAACCTGACCAATTCTCTGCAGGAAACGGGCCTGAATATCGATTGGAAAGAA AAACCGGACAACGGTAAACCGAGCATTCGCATCGCCGTTGCGGCCTCTCATCTGGACCCGCTGCTGGAACAGGTCAAC GATCAAAGCCTGGAAGACCACGTTCCGTGGCTGTGCGTCATTCCGTATGACGGCCAGACCGCGTGGGTGGGTCCGTTT TTCATCCCGCATAAATCAGCGTGCCTGCACTGTTACAACCTGCGCAAAAGTGCCAATTTTTCCGATGAAGTTTTCCGT TCCGAACTGATGAAAGTGAAACCGGTTAATGCAGCTAAACAGCCGGTGTATAACCAACCGGTCCATCTGGTGCAGGCA

136

GGCATTGCTAACAATCTGGTTACCGAATGGATCACGCTGCGTGATTACGCACCGTCAGCAGTCCCGGGCGGTTTTATC ACGGTGAATCTGGATGACCGTGGCGTGAGCGTGGATAACCACCGTGTTTATCGCGTCCCGCGTTGCCAAAAATGTTCT CCGGCTGCTGACACGGGTTTTCCGCAAGTTTGGTATCACGGCGAAGTGGCTAAATAA

3.7.2 Translation of BamCopt 330 amino acids, identical to WP_012117036.1 MSQQTHSIQTVVRFKLSPHITLGELEGSLLSFKAQSITKVDGEGIVEFFEAISEYLVCENGVSIEELHQRFNLEKDQL RNVFDFLVTNGLCIQSTSNFDMTAFLAYERAGGQVEVNQILHRLKTSIVEVVAHGENNLAINLTNSLQETGLNIDWKE KPDNGKPSIRIAVAASHLDPLLEQVNDQSLEDHVPWLCVIPYDGQTAWVGPFFIPHKSACLHCYNLRKSANFSDEVFR SELMKVKPVNAAKQPVYNQPVHLVQAGIANNLVTEWITLRDYAPSAVPGGFITVNLDDRGVSVDNHRVYRVPRCQKCS PAADTGFPQVWYHGEVAK

3.8 References

1. Newman, D. J., and Cragg, G. M. (2016) Natural Products as Sources of New Drugs from 1981 to 2014. J. Nat. Prod. 79, 629-661. 2. Arnison, P. G., et al. (2013) Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, 108-160. 3. Scholz, R., Molohon, K. J., Nachtigall, J., Vater, J., Markley, A. L., Sussmuth, R. D., Mitchell, D. A., and Borriss, R. (2011) Plantazolicin, a novel microcin B17/streptolysin S- like natural product from Bacillus amyloliquefaciens FZB42. J. Bacteriol. 193, 215-224. 4. Molohon, K. J., Melby, J. O., Lee, J., Evans, B. S., Dunbar, K. L., Bumpus, S. B., Kelleher, N. L., and Mitchell, D. A. (2011) Structure determination and interception of biosynthetic intermediates for the plantazolicin class of highly discriminating antibiotics. ACS Chem. Biol. 6, 1307-1313. 5. Kalyon, B., Helaly, S. E., Scholz, R., Nachtigall, J., Vater, J., Borriss, R., and Sussmuth, R. D. (2011) Plantazolicin A and B: structure elucidation of ribosomally synthesized thiazole/oxazole peptides from Bacillus amyloliquefaciens FZB42. Org. Lett. 13, 2996- 2999. 6. Dunlap, C. A., Kim, S. J., Kwon, S. W., Rooney, A. P. (2015) Bacillus velezensis is not a later heterotypic synonym of Bacillus amyloliquefaciens; Bacillus methylotrophicus, Bacillus amyloliquefaciens subsp plantarum and ‘Bacillus oryzicola’ are later heterotypic synonyms of Bacillus velezensis based on phylogenomics. Int. J. Syst. Evol. Microbiol. doi: 10.1099/ijsem.0.000858. 7. Lee, S. W., Mitchell, D. A., Markley, A. L., Hensler, M. E., Gonzalez, D., Wohlrab, A., Dorrestein, P. C., Nizet, V., and Dixon, J. E. (2008) Discovery of a widely distributed toxin biosynthetic gene cluster. Proc. Natl. Acad. Sci. USA 105, 5879-5884. 8. Cox, C. L., Doroghazi, J. R., and Mitchell, D. A. (2015) The genomic landscape of ribosomal peptides containing thiazole and oxazole heterocycles. BMC Genomics 16, 778. 9. Dunbar, K. L., Melby, J. O., and Mitchell, D. A. (2012) YcaO domains use ATP to activate amide backbones during peptide cyclodehydrations. Nat. Chem. Biol. 8, 569-575.

137

10. Dunbar, K. L., Chekan, J. R., Cox, C. L., Burkhart, B. J., Nair, S. K., and Mitchell, D. A. (2014) Discovery of a new ATP-binding motif involved in peptidic azoline biosynthesis. Nat. Chem. Biol. 10, 823-829. 11. Melby, J. O., Li, X., and Mitchell, D. A. (2014) Orchestration of enzymatic processing by thiazole/oxazole-modified microcin dehydrogenases. Biochemistry 53, 413-422. 12. Burkhart, B. J., Hudson, G. A., Dunbar, K. L., and Mitchell, D. A. (2015) A prevalent peptide-binding domain guides ribosomal natural product biosynthesis. Nat. Chem. Biol. 11, 564-570. 13. Koehnke, J., Mann, G., Bent, A. F., Ludewig, H., Shirran, S., Botting, C., Lebl, T., Houssen, W. E., Jaspars, M., and Naismith, J. H. (2015) Structural analysis of leader peptide binding enables leader-free cyanobactin processing. Nat. Chem. Biol. 11, 558-563. 14. Oman, T. J., and van der Donk, W. A. (2010) Follow the leader: the use of leader peptides to guide natural product biosynthesis. Nat. Chem. Biol. 6, 9-18. 15. Maxson, T., Deane, C. D., Molloy, E. M., Cox, C. L., Markley, A. L., Lee, S. W., and Mitchell, D. A. (2015) HIV protease inhibitors block streptolysin S production. ACS Chem. Biol. 10, 1217-1226. 16. Pei, J., and Grishin, N. V. (2001) Type II CAAX prenyl endopeptidases belong to a novel superfamily of putative membrane-bound metalloproteases. Trends Biochem. Sci. 26, 275- 277. 17. Lee, J., Hao, Y., Blair, P. M., Melby, J. O., Agarwal, V., Burkhart, B. J., Nair, S. K., and Mitchell, D. A. (2013) Structural and functional insight into an unexpectedly selective N- methyltransferase involved in plantazolicin biosynthesis. Proc. Natl. Acad. Sci. USA 110, 12954-12959. 18. Piwowarska, N. A., Banala, S., Overkleeft, H. S., and Sussmuth, R. D. (2013) Arg-Thz is a minimal substrate for the Nα,Nα-arginyl methyltransferase involved in the biosynthesis of plantazolicin. Chem. Comm. 49, 10703-10705. 19. Molohon, K. J., Blair, P. M., Park, S., Doroghazi, J. R., Maxson, T., Hershfield, J. R., Flatt, K. M., Schroeder, N. E., Ha, T., and Mitchell, D. A. (2016) Plantazolicin is an ultranarrow- spectrum antibiotic that targets the Bacillus anthracis membrane. ACS Infect. Dis. 2, 207- 220. 20. Appleyard, A. N., Choi, S., Read, D. M., Lightfoot, A., Boakes, S., Hoffmann, A., Chopra, I., Bierbaum, G., Rudd, B. A., Dawson, M. J., and Cortes, J. (2009) Dissecting structural and functional diversity of the lantibiotic mersacidin. Chem. Biol. 16, 490-498. 21. Caetano, T., Krawczyk, J. M., Mosker, E., Sussmuth, R. D., and Mendo, S. (2011) Heterologous expression, biosynthesis, and mutagenesis of type II lantibiotics from Bacillus licheniformis in Escherichia coli. Chem. Biol. 18, 90-100. 22. Molloy, E. M., Ross, R. P., and Hill, C. (2012) ‘Bac’ to the future: bioengineering lantibiotics for designer purposes. Biochem. Soc. Trans. 40, 1492-1497. 23. Boakes, S., Ayala, T., Herman, M., Appleyard, A. N., Dawson, M. J., and Cortes, J. (2012) Generation of an actagardine A variant library through saturation mutagenesis. Appl. Microbiol. Technol. 95, 1509-1517. 24. van Heel, A. J., Mu, D., Montalban-Lopez, M., Hendriks, D., and Kuipers, O. P. (2013) Designing and producing modified, new-to-nature peptides with antimicrobial activity by use of a combination of various lantibiotic modification enzymes. ACS Synth. Biol. 2, 397- 404.

138

25. Field, D., Molloy, E. M., Iancu, C., Draper, L. A., O' Connor, P. M., Cotter, P. D., Hill, C., and Ross, R. P. (2013) Saturation mutagenesis of selected residues of the alpha-peptide of the lantibiotic lacticin 3147 yields a derivative with enhanced antimicrobial activity. Microb Biotechnol. 6, 564–575. 26. Molloy, E. M., Field, D., PM, O. C., Cotter, P. D., Hill, C., and Ross, R. P. (2013) Saturation mutagenesis of lysine 12 leads to the identification of derivatives of nisin A with enhanced antimicrobial activity. PloS ONE 8, e58530. 27. Pan, S. J., and Link, A. J. (2011) Sequence diversity in the lasso peptide framework: discovery of functional microcin J25 variants with multiple amino acid substitutions. J. Am. Chem. Soc. 133, 5016-5023. 28. Piscotta, F. J., Tharp, J. M., Liu, W. R., and Link, A. J. (2015) Expanding the chemical diversity of lasso peptide MccJ25 with genetically encoded noncanonical amino acids. Chem. Comm. 51, 409-412. 29. Maksimov, M. O., Koos, J. D., Zong, C., Lisko, B., and Link, A. J. (2015) Elucidating the specificity determinants of the AtxE2 lasso peptide isopeptidase. J. Biol. Chem. 290, 30806-30812. 30. Bowers, A. A., Acker, M. G., Koglin, A., and Walsh, C. T. (2010) Manipulation of thiocillin variants by prepeptide gene replacement: structure, conformation, and activity of heterocycle substitution mutants. J. Am. Chem. Soc. 132, 7519-7527. 31. Li, C., Zhang, F., and Kelly, W. L. (2011) Heterologous production of thiostrepton A and biosynthetic engineering of thiostrepton analogs. Mol. Biosyst. 7, 82-90. 32. Li, C., Zhang, F., and Kelly, W. L. (2012) Mutagenesis of the thiostrepton precursor peptide at Thr7 impacts both biosynthesis and function. Chem. Comm. 48, 558-560. 33. Bowers, A. A., Acker, M. G., Young, T. S., and Walsh, C. T. (2012) Generation of thiocillin ring size variants by prepeptide gene replacement and in vivo processing by Bacillus cereus. J. Am. Chem. Soc. 134, 10313-10316. 34. Zhang, F., and Kelly, W. L. (2015) Saturation mutagenesis of TsrA Ala4 unveils a highly mutable residue of thiostrepton A. ACS Chem. Biol. 10, 998-1009. 35. Zhang, F., Li, C., and Kelly, W. L. (2016) Thiostrepton variants containing a contracted quinaldic acid macrocycle result from mutagenesis of the second residue. ACS Chem. Biol. 11, 415-424. 36. Tianero, M. D., Donia, M. S., Young, T. S., Schultz, P. G., and Schmidt, E. W. (2012) Ribosomal route to small-molecule diversity. J. Am. Chem. Soc. 134, 418-425. 37. Goto, Y., Ito, Y., Kato, Y., Tsunoda, S., and Suga, H. (2014) One-pot synthesis of azoline- containing peptides in a cell-free translation system integrated with a posttranslational cyclodehydratase. Chem. Biol. 21, 766-774. 38. Houssen, W. E., Bent, A. F., McEwan, A. R., Pieiller, N., Tabudravu, J., Koehnke, J., Mann, G., Adaba, R. I., Thomas, L., Hawas, U. W., Liu, H., Schwarz-Linek, U., Smith, M. C., Naismith, J. H., and Jaspars, M. (2014) An efficient method for the in vitro production of azol(in)e-based cyclic peptides. Angew. Chem. Int. Ed. 53, 14171-14174. 39. Ruffner, D. E., Schmidt, E. W., and Heemstra, J. R. (2014) Assessing the combinatorial potential of the RiPP cyanobactin tru pathway. ACS Synth. Biol. 4, 482-492. 40. Sardar, D., Lin, Z., and Schmidt, E. W. (2015) Modularity of RiPP enzymes enables designed synthesis of decorated peptides. Chem. Biol. 22, 907-916.

139

41. Deane, C. D., Melby, J. O., Molohon, K. J., Susarrey, A. R., and Mitchell, D. A. (2013) Engineering unnatural variants of plantazolicin through codon reprogramming. ACS Chem. Biol. 8, 1998-2008. 42. Gonzalez, D. J., Lee, S. W., Hensler, M. E., Markley, A. L., Dahesh, S., Mitchell, D. A., Bandeira, N., Nizet, V., Dixon, J. E., and Dorrestein, P. C. (2010) Clostridiolysin S, a post- translationally modified biotoxin from Clostridium botulinum. J. Biol. Chem. 285, 28220- 28228. 43. Melby, J. O., Dunbar, K. L., Trinh, N. Q., and Mitchell, D. A. (2012) Selectivity, directionality, and promiscuity in peptide processing from a Bacillus sp. Al Hakam cyclodehydratase. J. Am. Chem. Soc. 134, 5309-5316. 44. Milne, J. C., Roy, R. S., Eliot, A. C., Kelleher, N. L., Wokhlu, A., Nickels, B., and Walsh, C. T. (1999) Cofactor requirements and reconstitution of microcin B17 synthetase: a multienzyme complex that catalyzes the formation of oxazoles and thiazoles in the antibiotic microcin B17. Biochemistry 38, 4768-4781. 45. Dunbar, K. L., and Mitchell, D. A. (2013) Insights into the mechanism of peptide cyclodehydrations achieved through the chemoenzymatic generation of amide derivatives. J. Am. Chem. Soc. 135, 8692-8701. 46. Mitchell, D. A., Lee, S. W., Pence, M. A., Markley, A. L., Limm, J. D., Nizet, V., and Dixon, J. E. (2009) Structural and functional dissection of the heterocyclic peptide cytotoxin streptolysin S. J. Biol. Chem. 284, 13004-13012. 47. Donia, M. S., Ravel, J., and Schmidt, E. W. (2008) A global assembly line for cyanobactins. Nat. Chem. Biol. 4, 341-343. 48. Morris, R. P., Leeds, J. A., Naegeli, H. U., Oberer, L., Memmert, K., Weber, E., LaMarche, M. J., Parker, C. N., Burrer, N., Esterow, S., Hein, A. E., Schmitt, E. K., and Krastel, P. (2009) Ribosomally synthesized thiopeptide antibiotics targeting elongation factor Tu. J. Am. Chem. Soc. 131, 5946-5955. 49. Dunbar, K. L., Tietz, J. I., Cox, C. L., Burkhart, B. J., and Mitchell, D. A. (2015) Identification of an auxiliary leader peptide-binding protein required for azoline formation in ribosomal natural products. J. Am. Chem. Soc. 137, 7672-7677. 50. Clackson, T., and Wells, J. A. (1995) A hot spot of binding energy in a hormone-receptor interface. Science 267, 383-386. 51. Pryor, E. E., Jr., Horanyi, P. S., Clark, K. M., Fedoriw, N., Connelly, S. M., Koszelak- Rosenblum, M., Zhu, G., Malkowski, M. G., Wiener, M. C., and Dumont, M. E. (2013) Structure of the integral membrane protein CAAX protease Ste24p. Science 339, 1600- 1604. 52. Goto, Y., Li, B., Claesen, J., Shi, Y., Bibb, M. J., and van der Donk, W. A. (2010) Discovery of unique lanthionine synthetases reveals new mechanistic and evolutionary insights. PLoS Biol. 8, e1000339. 53. Shi, Y., Yang, X., Garg, N., and van der Donk, W. A. (2011) Production of lantipeptides in Escherichia coli. J. Am. Chem. Soc. 133, 2338-2341. 54. Plat, A., Kluskens, L. D., Kuipers, A., Rink, R., and Moll, G. N. (2011) Requirements of the engineered leader peptide of nisin for inducing modification, export, and cleavage. Appl. Environ. Microbiol. 77, 604-611. 55. Okesli, A., Cooper, L. E., Fogle, E. J., and van der Donk, W. A. (2011) Nine post- translational modifications during the biosynthesis of cinnamycin. J. Am. Chem. Soc. 133, 13753-13760.

140

56. Schreiber, J., and Witkop, B. (1964) Reaction of cyanogen bromide with mono- + diamino acids. J. Am. Chem. Soc. 86, 2441-2445. 57. Slootweg, J. C., Liskamp, R. M., and Rijkers, D. T. (2013) Scalable purification of the lantibiotic nisin and isolation of chemical/enzymatic cleavage fragments suitable for semi- synthesis. J. Pept. Sci. 19, 692-699. 58. Krusemark, C. J., Frey, B. L., Smith, L. M., and Belshaw, P. J. (2011) Complete chemical modification of amine and acid functional groups of peptides and small proteins. Meth. Mol. Biol. 753, 77-91. 59. Webb, M. R. (1992) A continuous spectrophotometric assay for inorganic phosphate and for measuring phosphate release kinetics in biological systems. Proc. Natl. Acad. Sci. USA 89, 4884-4887. 60. Krunic, A.; Orjala, J. (2015) Application of high-field NMR spectroscopy for characterization and quantitation of submilligram quantities of isolated natural products. Magn. Reson. Chem. 53, 1043-1050.

141

APPENDIX A: HIV protease inhibitors block streptolysin S production1 A.1 Introduction The ribosomally synthesized and post-translationally modified peptides (RiPPs) comprise a rapidly expanding class of natural products that includes a wide variety of structural modifications.2 These modifications impart RiPPs with diverse activities, giving rise to a range of products from antibacterials3-5 to anticancer agents.6 The installation of azole and/or azoline heterocycles is one such modification common to many RiPPs, forming a sub-class of natural products called the thiazole/oxazole-modified microcins (TOMMs).7 The azoles are biosynthesized by the cyclodehydration and subsequent dehydrogenation of cysteine, serine, and threonine residues to form thiazole and (methyl)oxazole rings on the C-terminal portion, or “core”, of a ribosomally produced precursor peptide.7 The azole-containing peptides will often undergo further processing, including the proteolytic removal of the N-terminal “leader” portion of the peptide and export of the mature product.8 Although recent discoveries have shed light on the mechanism of azole formation,9-11 the proteolytic processing step of most TOMMs has yet to be investigated. Streptolysin S (SLS), a key virulence factor of Streptococcus pyogenes, is one such TOMM whose biosynthesis is incompletely understood.12 S. pyogenes is the causative agent of diseases ranging in severity from pharyngitis to necrotizing fasciitis13 and is a major global health burden, causing over 600 million infections and 500,000 deaths annually.14 SLS is the cytolytic toxin responsible for the classic β-hemolytic phenotype when S. pyogenes is grown on blood agar15 and has been shown to be critical to pathogenesis in mammalian infection models.16-18 Although a few strains of non-β-hemolytic, pathogenic S. pyogenes have been described, such as the Lowry strain,19 the vast majority of S. pyogenes isolates produce SLS.20 The toxin is biosynthesized by a 9-gene biosynthetic operon that encodes the precursor peptide (sagA), cyclodehydratase and dehydrogenase enzymes (sagBCD), a putative leader peptidase (sagE), a multicomponent ABC- transporter (sagGHI), and a protein of unknown function (sagF) (Figure A.1A).15 The SagBCD heterocycle synthetase is known to install azol(in)e rings on the core region of the precursor peptide,21, 22 which is followed by proteolytic removal of the leader peptide prior to export of the mature, bioactive natural product (Figure A.1B,C). The final molecular weight of SLS has been 1Reproduced with permission from Maxson, T., Deane, C. D., Molloy, E. M., Cox, C. L., Markley, A. L., Lee, S. W., and Mitchell, D. A. (2015) ACS Chem. Biol. 10, 1217-1226. Copyright 2015 American Chemical Society. I performed plantazolicin production experiments. 142

inferred from classic gel filtration studies to be 2.8 kDa,23 which is consistent with the bioinformatic prediction of the scissile bond being C-terminal to a Gly-Gly motif based on the similarity to bacteriocins from other Gram-positive bacteria.24 Additionally, proteolysis following small residues is common in RiPPs with known cleavage sites.8 Although SLS was first defined nearly 80 years ago,25 with the β-hemolytic phenotype being known since the late 1800s,26 a detailed mechanism of SLS biosynthesis and the structure of the mature toxin remain elusive. Thus, effective SLS biosynthetic inhibitors could serve as powerful chemical tools to shed more light on the biochemistry of SLS and the infection biology of S. pyogenes.

Figure A.1. TOMM gene clusters and the biosynthetic pathway. (A) Open reading frame diagram showing the organization of several TOMM gene clusters, grouped by function of the mature TOMM. The letters over each gene correspond to the name of the sag genes in S. pyogenes. The function of each gene is color-coded in the legend. (B) Sequences of the precursor peptides from the clusters shown in panel A. The putative (*) or known (-) cleavage sites are shown. The residues substituted in SagA to generate SagA-VLPLL are shown in red. SagA, from S. pyogenes; ClosA, from C. botulinum; StaphA, from S. aureus; LlsA, from L. monocytogenes; BamA, from B. amyloliquefaciens. (C) Generalized mechanism of TOMM maturation with SLS as the example. The proteins putatively responsible for each step are shown above each arrow.

143

Gaining a better understanding of how pathogens employ their various virulence factors also aids in the development of selective treatment strategies that could help to increase the lifespan of clinically important antibiotics. Unlike traditional broad-spectrum antibiotic treatment, specifically targeting virulence would retain the human microbiota, helping to eradicate secondary infections and, in some cases, could theoretically reduce the evolutionary pressure for the development of resistance.27 Previous approaches to targeting virulence have included disruption of quorum sensing, which often regulates virulence factor expression, blocking toxin delivery or function, and inhibition of bacterial adhesion.27 Several compounds designed around these approaches have been efficacious in vivo and have prompted further study,27, 28 but the selective nature of targeting virulence requires a tailored therapy for each pathogen, which when developed, would stimulate vast improvements in clinical diagnostics. SLS is an interesting anti-virulence target, as it plays a major role in paracellular invasion, immune evasion, and host-metabolism manipulation.12, 29 Furthermore, additional roles of SLS in iron acquisition have been suggested but require further confirmation.12 Due to the importance of SLS in a multitude of pathogenic processes, chemical inhibitors developed to probe the biosynthesis of SLS may find future roles in virulence attenuation strategies that protect the vital microbiota and limit the spread of antibiotic resistance in other pathogens that employ SLS-like toxins (i.e. S. pyogenes and specific strains of Listeria monocytogenes, Clostridium botulinum, and Staphylococcus aureus). In this study, we identified inhibitors SLS biosynthesis in S. pyogenes by searching for compounds that block an essential proteolytic maturation step. This proteolysis event has been proposed to be performed by SagE, a putative peptidase with homology to a large family of referred to as the CaaX proteases and bacteriocin-processing enzymes (CPBP; often confusingly annotated as abortive infection proteins, Abi), which includes the eukaryotic type II CaaX proteases as well as prokaryotic proteins with putative bacteriocin-related functions.30, 31 The type II CaaX proteases are involved in the processing of a number of C-terminally prenylated proteins in eukaryotes and have been much more thoroughly studied than their prokaryotic counterparts.32-34 Type II CaaX proteases were initially believed to be cysteine proteases,35 but the presumed catalytic cysteine was later proven to be unnecessary for activity.34 In contrast, conserved glutamate and histidine residues were shown to be important for activity, leading to the hypothesis that type II CaaX proteases were metalloproteases.34 Many of the prokaryotic members of the CPBP family, including SagE, have been annotated as immunity proteins due to the role in

144

bacteriocin self-immunity of the family members in Lactobacillus plantarum and L. sakei.36 However, the production of viable allelic exchange mutants of sagE and the homolog in Listeria monocytogenes (llsP) indicates that SagE may not be serving an immunity role or may be redundant with other uncharacterized immunity mechanisms.15, 37 Thus, it is plausible that the prokaryotic CPBPs actually act as proteases and that this function is exploited to provide self- immunity in certain cases. The results presented herein support a role for SagE as a protease through the discovery and characterization of a family of small molecule inhibitors of SLS biosynthesis. Appealingly, these inhibitors were identified through the repurposing of existing drugs by an examination of known off-target effects. This approach facilitated the rapid identification of lead compounds without the need to perform expensive and laborious high- throughput screens, as well as aiding in the synthesis of analogs by leveraging previous work on the compounds.38-40

A.2 Evidence for the role of SagE as a protease Reconstitution of SagE in vitro would be the most direct means of testing for the predicted protease activity and for the screening of inhibitors. Unfortunately, numerous attempts to heterologously express SagE proved unsuccessful and alternative assays were developed to address this issue. We attribute much of the difficulty in expressing SagE to the predicted transmembrane nature of the protein; topology modeling of SagE with SPOCTOPUS41 predicted five transmembrane helixes and an N-terminal signal peptide (Figure A.2). With this in mind, crude membranes from S. pyogenes were prepared from total cellular lysates by ultracentrifugation. The resultant samples were then assessed for proteolytic activity towards 35S- labeled SagA, generated through in vitro transcription/translation. Robust proteolysis of SagA to the predicted molecular weight23 was observed after treatment with the membrane fraction (Figure A.3A). SagA processing in the whole cell lysate and supernatant fractions was much less extensive and often not observable (Figure A.4). To test the substrate specificity of proteolysis, a mutant version of SagA with residues Ala20, Gly22, and Gly23 mutated to leucines (SagA-VLPLL) was generated (Figure A.1B). These residues are directly N-terminal to the predicted leader peptide cleavage site and were expected to be important for protease recognition. Pro21 was left intact to avoid inducing a drastic structural change on the peptide. When treated with isolated S. pyogenes membranes, SagA-VLPLL was processed at a rate slower than wild-type SagA (Figure A.3A),

145

suggesting that the cleavage was performed by a membrane protease specifically recognizing the SagA cleavage site (other membrane-bound proteases may also be minor contributors to the observed proteolysis).

Figure A.2. Hydropathy plot of SagE. The membrane topology of SagE was modeled using the SPOCTOPUS program41 and is predicted to be an integral membrane protease with 5 transmembrane helices. The eukaryotic homologue Rce1 from Methanococcus maripaludis is known to also be an integral membrane protein with eight transmembrane α-helices (PDB ID 4CAD).42

Figure A.3. The role of SagE in the processing of pre-SLS. (A) Purified S. pyogenes membranes display proteolytic activity towards SagA. Both wild type (WT) SagA, and to a reduced extent, the predicted cut site mutant SagA-VLPLL, function as cleavage substrates. (B) Lytic activity of E. coli expressing the SLS biosynthetic machinery after a 2 h induction. The hemolytic activity of extracted SLS on erythrocytes is measured by a colorimetric readout of hemoglobin release.

146

Figure A.4. Cleavage of Flag-tagged SagA with the S. pyogenes membrane fraction. Proteins present within S. pyogenes membranes harbor proteolytic activity towards MBP-Flag-SagA (Flag tag is on the N-terminus of SagA). A cleavage band is not seen for MBP-SagA-Flag as the Flag tag here is appended to the much smaller C-terminal core peptide, which migrates with the dye front. WCL, whole cell lysate; Sol, soluble fraction; Ins, insoluble (crude membrane) fraction.

To provide additional evidence for the role of SagE, a previously reported multi-plasmid- based expression system for the generation of SLS in Escherichia coli was used. In that study, a lytic entity was generated by the expression of a maltose-binding protein (MBP)-tagged SagA in conjunction with SagB-D from pETDuet vectors after extended induction times.43 SLS produced in this system could be extracted with bovine serum albumin (BSA) and applied to blood using procedures long established for S. pyogenes.44 The heterologously produced SLS was presumably exported by generalized transporters after non-specific proteolysis of the MBP tag or was released via E. coli cell death following buildup of the toxin. Using this system, no lytic activity was observable after a considerably shorter induction time (2 h) from a strain expressing only SagA- D; however, E. coli expressing SagA-F was highly lytic after 2 h (Figure A.3B), indicating that SagEF expedite SLS maturation. The additional inclusion of SagF was required for this effect, although its functional role is unknown. SagF shares no homology to any known proteins but has previously been demonstrated to be necessary for SLS biosynthesis15 and may aid in the folding or localization of the other modification machinery. To support the role of SagE as a protease involved in SLS maturation, two highly conserved glutamine residues (Glu131 and Glu132) that are critical for activity in mammalian CPBP family members31, 34 were mutated to alanine, resulting in the complete loss of the observed lytic activity.

A.3 Aspartyl protease inhibitors block SagA proteolysis A small panel of general mechanism-based protease inhibitors was screened for inhibition of SagA leader proteolysis using the membrane cleavage assay with S. pyogenes membranes. This

147

assay was preferred over the E. coli multi-plasmid system to avoid potential issues with membrane penetration or general toxicity of the inhibitors. From the panel, only the aspartyl protease inhibitor pepstatin displayed inhibitory activity (Figure A.5), which was unexpected given that SagE bears no similarity to known aspartyl proteases and that members of the type II CaaX protease family were previously hypothesized to function as zinc-dependent metalloproteases.34 However, inhibition of a metalloprotease by aspartyl protease inhibitors has literature precedent, as several inhibitors of HIV protease were found to also inhibit the type I CaaX protease ZMPSTE24, which is a known zinc metalloprotease.45-47 Lipodystrophy is a possible side effect of treatment with certain HIV protease inhibitors and also occurs from genetic deficiencies in ZMPSTE24, leading to the discovery of this off-target effect for these drugs.45 Although type I and II CaaX proteases do not share sequence similarity, they redundantly process some of the same substrates and share similar substrate- architectures.42, 48 Given this, we reasoned that HIV protease inhibitors might be repurposed as inhibitors of SLS production.

Figure A.5. Inhibition of SagA proteolysis by general-mechanism based inhibitors. Proteolysis of WT [35S]-MBP-SagA by S. pyogenes membranes treated with general mechanism-based protease inhibitors. Bestatin, metalloproteases; pepstatin, aspartyl proteases; pefabloc, serine proteases; E- 64, cysteine proteases.

A.4 HIV protease inhibitors block SLS production A whole-cell assay in S. pyogenes based on the extraction of SLS with BSA was used to screen a panel of nine FDA-approved HIV protease inhibitors (Figure A.6). , ritonavir, saquinavir, and lopinavir were found to inhibit the production of SLS when tested at 50 µM, while indinavir, amprenavir, atazanavir, and darunavir did not inhibit SLS production (Figure A.7A). Interestingly, tipranavir caused significant growth suppression in S. pyogenes, and treated cultures never reached late exponential phase (Figure A.8), which is when SLS becomes detectable in vitro.49 The efficacies of the HIV protease inhibitors shown to inhibit SLS production in the initial screen were evaluated by determining 50% inhibition concentration (IC50) values. The IC50 of nelfinavir (6 µM) was lower than those of ritonavir (35 µM), saquinavir (25 µM), and lopinavir (25 µM). Owing to its greater potency, nelfinavir (Figure A.7B) was selected for further study.

148

Figure A.6. Structures of the FDA-approved HIV protease inhibitors. The structures of all FDA- approved HIV protease inhibitors are shown (U.S. brand name in parentheses) with the exception of fosamprenavir calcium (Lexiva), a pro-drug of amprenavir. Any previously reported activity for each drug on the CaaX protease ZMPSTE24 is listed,46, 47 along with the activity for SLS production inhibition.

149

Figure A.7. Inhibition of SLS production. (A) Hemolytic activity of SLS extracts from S. pyogenes treated with HIV protease inhibitors. Tipranavir is shown separated by a dashed line due to significant growth effects during treatment (Figure A.8). Asterisks indicate a P-value < 0.01 relative to the DMSO control. (B) The structure of nelfinavir (1). (C) The proteolytic effect of S. pyogenes membranes treated with nelfinavir on MBP-SagA, relative to a DMSO-treated control.

150

Figure A.8. Growth effects of HIV protease inhibitors on S. pyogenes. S. pyogenes M1 5448 was grown in the presence of the indicated HIV protease inhibitor at 50 µM and the optical density at 600 nm was recorded. Tipranavir was found to significantly inhibit growth.

Nelfinavir was evaluated in the membrane proteolysis assay to determine if the observed loss of β-hemolysis was due to inhibition of the proteolytic processing step of SLS maturation. As expected, treatment with nelfinavir greatly reduced the proteolytic activity toward SagA contained within S. pyogenes membranes (Figure A.7C, the membrane fraction used in this experiment was prepared on a different day than the fraction used in Figure A.3 and displayed much higher activity). Evidence that nelfinavir was not drastically perturbing normal cellular function came from the observation that the growth rates of S. pyogenes treated with nelfinavir were identical to the DMSO control (Figure A.8). Additionally, minimum inhibitory concentration (MIC) testing revealed no growth inhibition up to the highest concentration tested (64 µM) for a range of bacterial species (Table A.1). Transmission electron microscopy (TEM) was used to examine the morphology of S. pyogenes treated with nelfinavir, with no apparent changes compared to the control sample (Figure A.9). The transcription levels of a panel of virulence factor genes, as assessed by qRT-PCR, were also not significantly impacted (Table A.2). Notably, the levels of sagA and sagB expression were unchanged. These data indicate that nelfinavir inhibited peptide

151 processing directly, rather than through transcriptional regulation or by significantly perturbing other cellular processes.

Table A.1. The minimum inhibitory concentrations of nelfinavir against a range of bacterial species, including Gram-positive and Gram-negative organisms, are shown. No growth inhibition was seen for any species up to the solubility limit of nelfinavir after 16 h of growth.

Bacterial strain MIC (µM)

Streptococcus pyogenes M1 5448 >64

Listeria monocytogenes 4b F2365 >64

Clostridium sporogenes ATCC 19404 >64

Staphylococcus aureus USA300 >64

Enterococcus faecium U503 >64

Bacillus subtilis 168 >64

Klebsiella pneumoniae ATCC 27736 >64

Pseudomonas aeruginosa PA01 >64

Escherichia coli MC4100 >64

Figure A.9. Transmission electron microscopy (TEM) of nelfinavir-treated S. pyogenes. TEM images of S. pyogenes M1 5448 treated with (A) DMSO or (B) 25 µM nelfinavir. Scale bars are 100 nm. These images are representative of the entire sample, where more than 100 cells were closely examined.

152

Table A.2. Changes in virulence factor expression as measured by qRT-PCR with S. pyogenes M49 NZ131 during treatment with nelfinavir versus a DMSO control. Values are from three biological replicates averaged from three technical replicates each. The 16S rRNA gene was used as an internal control gene and relative gene expression was calculated by a comparative CT method.50 Positive and negative values represent up- and down-regulation upon nelfinavir treatment, respectively. The primers used are given in the supplemental methods. A transcription change was not considered significant if the log2 ratio was < 1.5 fold or if the P value was > 0.01. Based on these criteria, none of the genes were significantly altered upon nelfinavir treatment. Similar results were obtained for an additional strain, S. pyogenes M1 5448.

Gene Log2 ratio Virulence factor (treated/untreated) emm49 -1.98 ± 0.77 M protein

slo 0.02 ± 1.22 streptolysin O (SLO)

sagA 0.32 ± 0.20 streptolysin S (precursor peptide)

sagB 0.10 ± 0.70 streptolysin S (dehydrogenase)

scpA -0.86 ± 0.26 C5A peptidase

ska -1.23 ± 0.83 streptokinase

speB 1.38 ± 0.92 streptococcal pyrogenic exotoxin B (SpeB)

nga -1.30 ± 0.90 NAD glycohydrolase

spd3 -0.83 ± 0.28 streptodornase

A.5 Structure-activity relationships A series of nelfinavir analogs was synthesized to gain a better understanding of the structure-activity relationships (SAR) for inhibition of SLS maturation. If nelfinavir were to interact with the SLS leader peptidase in a manner analogous to HIV protease, the secondary hydroxyl group would function as a tetrahedral intermediate mimic, as is the case for many aspartyl and metalloprotease inhibitors (Figure A.10). Synthetic routes to nelfinavir have been thoroughly explored and a route that allowed facile derivatization was adapted for this study (Scheme A.1).38 The efficacy of each analog at 25 µM was evaluated using the hemolysis assay (Table A.3 and Table A.4).

153

Figure A.10. Aspartyl and metalloproteases share a common intermediate. Both types of protease activate water, which performs a nucleophilic attack on the scissile amide bond. This attack results in a gem-diol intermediate shared between the two mechanisms, unlike the covalent acyl-enzyme intermediate formed by serine, threonine, and cysteine proteases. Nelfinavir and other aspartyl and metalloprotease inhibitors usually mimic the intermediate with the non-hydrolyzable secondary hydroxyl group. However, our data suggest that nelfinavir does not inhibit the SLS leader peptidase in this fashion.

154

Scheme A.1. Synthesis of nelfinavir. a, i-BuOCOCl, Et3N, THF; CH2N2, Et2O (62%); b, HCl, Et2O; c, NaBH4, THF (62% over two steps); d, KOH, EtOH (91%); e, 6, KOH, IPA, 80 °C (84%); f, 3-hydroxy-2-methylbenzoic acid, EDC, HOBt, THF (66%).

Nelfinavir has peptidomimetic features, including an S-phenyl-ethyl group intended to replace the side chain of phenylalanine commonly found in the P1 position of the HIV protease substrate. Since the putative P1 position of SagA is a much smaller glycine residue (Figure A.1B, Gly23), we surmised that smaller substituents at this location on nelfinavir might improve the observed anti-SLS activity. Instead, removal of the side chain (8) abolished detectable activity. It is possible that the cleavage site is closer to the N-terminus, however, and that Pro21 or an aliphatic residue preceding it resides in the P1 position. Accordingly, analogs mimicking these amino acids (9, 10) were synthesized, but these compounds did not exhibit detectable inhibitory activity. Even retention of the phenyl ring in a phenylalanine mimic (11) was insufficient to maintain activity. Conversely, a tryptophan mimic (12) was inhibitory, albeit weakly, suggesting that a relatively large group in that position may be necessary for activity. The lack of detectable activity with the series of P1 position analogs prevented the rigorous establishment of SAR. Modifications to other portions of the molecule were prepared in order to enhance the inhibitory activity to address this pitfall. As installation of the benzamide is the final step of the synthesis (Scheme A.1), preparation of analogs at this location was convenient. Initial analogs revealed that neither the hydroxyl nor the methyl groups (13–15) were important for

155 activity but that the ring itself was necessary (16). Replacement of the ring with bulkier naphthyl and cyclohexyl groups (17, 18) provided >5-fold increases in potency, indicating that this part of the pharmacophore may reside in a hydrophobic pocket not fully occupied by the single planar ring. However, increasing the size of this moiety to an anthracene (19) greatly reduced activity. In general, electron-deficient rings had improved activity relative to electron-rich rings (Table A.4), although hydrophobicity appeared to be a more significant contributing factor towards potency.

156

Table A.3. Relative activity of nelfinavir analogs for SLS inhibition. aThe activity of each analog is reported qualitatively relative to nelfinavir due to variability in commercial blood lots and extraction effectiveness. Inhibitory activity that is >3-fold than nelfinavir is designated as (+++); activity that is equal to nelfinavir is (++); detectable activity that is <3-fold than nelfinavir is (+); non-detectable activity is denoted (-).

157

Table A.4. Relative activity of additional nelfinavir analogs for SLS inhibition. The activity of each analog is reported qualitatively relative to nelfinavir due to lot variation in the commercial sources of blood and the BSA extraction efficiency. Activity >3-fold that of nelfinavir is designated as (+++); activity that is equal to nelfinavir is (++); detectable activity that is <3-fold that of nelfinavir is (+); non-detectable activity is denoted (-).

158

Table A.4. (cont.)

Given the enhanced activity of the naphthylamide-bearing compound (17), this group was incorporated into the collection of P1 position analogs (Table A.3; Table A.4), increasing the

159

activity of these analogs to detectable levels. The tryptophan mimic with the naphthylamide (20) displayed considerably increased potency relative to the 3-hydroxy-2-methylbenzamide analog (12). Within the naphthylamide series, the phenylalanine and leucine mimics (21, 22) were equivalently active to nelfinavir, while the glycine mimic (23) displayed weak SLS-inhibitory activity. The proline mimic (24) remained devoid of activity, possibly due to structural perturbations enforced by the ring. These data support a trend of larger substituents imparting higher activity that was foreshadowed by the initial SAR. To further probe the pseudo-P1 position of nelfinavir, the stereochemical configurations of both the side chain and the secondary hydroxyl group were varied. Surprisingly, inverting stereocenters with this series of analogs (25–27) did not result in large changes in activity. Many of the derivatives were still highly potent, suggesting that nelfinavir may not inhibit SagE through mimicking the proteolytic tetrahedral intermediate (Figure A.10). This is further supported by the retention of activity in acetylated derivatives (28, 29). Overall, the sum of the SAR analysis resulted in the development of an analog with significantly improved potency (17, IC50 = 1 µM). Additionally, the activity trends led us to believe that nelfinavir might serve as an inhibitor for the protease in other TOMM natural product biosynthetic clusters in which the precursor peptides do not always contain a predicted Gly-Gly cleavage motif (Figure A.1B).

A.6 Biosynthesis inhibition of other TOMMs SLS is the best-studied member of a group of related cytolysins produced by a number of bacteria, including pathogens such as L. monocytogenes and C. botulinum.12, 21 The SLS-like biosynthetic gene clusters in these strains are highly similar to that of S. pyogenes and include orthologs of sagE (Figure A.1A, Figure A.11). The SLS-like toxin from L. monocytogenes, listeriolysin S (LLS), is known to be expressed during oxidative stress; therefore, a strain with LLS under the control of a constitutive promoter was used in the blood lysis assay to determine if nelfinavir could also inhibit LLS production.51 The strain was deficient in production of an unrelated cytolysin, listeriolysin O (LLO), to ensure any hemolysis observed derived from LLS production. When treated with nelfinavir, this strain produced significantly less LLS (Figure A.12A). The LLO-/LLS- strain of L. monocytogenes was included as a negative control to demonstrate the observed hemolysis was indeed LLS-dependent (Figure A.12A).

160

Figure A.11. Amino acid sequence alignment of select bacterial CaaX proteases and bacteriocin- processing enzymes (CPBPs). A Clustal Omega alignment of SagE and related proteases in other natural product gene clusters is shown. Conserved residues shown to be important for catalysis in eukaryotic homologues are highlighted in yellow and their proposed role in catalysis is shown in Figure A.18.34 The C-terminal 35 residues were manually aligned. Gene names and producing organisms: SagE, S. pyogenes; LlsP, L. monocytogenes; StaphE, S. aureus; ClosE, C. sporogenes/botulinum; BamE, B. amyloliquefaciens.

161

Figure A.12. Inhibition of the biosynthesis of other TOMMs by nelfinavir. (A) Lytic activity of extracts from an LLS-producing strain (LLS+) of L. monocytogenes treated with nelfinavir relative to a DMSO control (n = 3). Extracts from a separate strain with llsA deleted (LLS-) were also used as a control. (B) Lytic activity of extracts from a CLS-producing strain of C. sporogenes treated with nelfinavir relative to a DMSO control (n = 3). (C) Production of PZN by B. amyloliquefaciens treated with nelfinavir relative to a DMSO control (n = 4). Nelfinavir was used at 50 µM in all cases. Excluding LLS- negative control, P-values < 0.05 were obtained for all nelfinavir-treated samples relative to the corresponding DMSO positive controls.

Similar to SLS and LLS, clostridiolysin S (CLS) is a hemolysin from C. botulinum as well as from certain strains of C. sporogenes, which are nearly identical to C. botulinum but do not produce botulinum toxin.22 The presence of the CLS cluster in an unsequenced strain of C. sporogenes (ATCC 19404) known to be hemolytic on blood agar was confirmed by PCR amplification of closC and closD (the cyclodehydratase genes). When C. sporogenes was grown in the presence of nelfinavir, the production of CLS was significantly reduced (Figure A.12B). The inhibition of not only SLS but also LLS and CLS production suggests that nelfinavir would likely inhibit the production of additional TOMM cytolysins. TOMM biosynthetic gene clusters are wide-spread among bacteria and archaea, and the products of the vast majority of these clusters have not been structurally or functional characterized.7 A bioinformatic analysis of TOMM clusters revealed that 22% (328 out of 1520 as of October 2014) contained a CPBP family member within the cluster, many of which are predicted or known to have non-cytolytic products. The presence of a CPBP member in these clusters led us

162

to postulate that nelfinavir would also inhibit TOMM production in these cases. Plantazolicin (PZN) is a TOMM produced by Bacillus amyloliquefaciens FZB42 with highly selective antibacterial activity against Bacillus anthracis and has a sagE-like gene (bamE) in its biosynthetic gene cluster (Figure A.1A, Figure A.11).52 The structure of PZN has been determined; thus the precise cleavage site is known (after Ala27, Figure A.1B), although no biochemical studies have directly linked BamE to leader peptide cleavage. Methanolic surface extracts from B. amyloliquefaciens were analyzed by liquid chromatography-mass spectrometry and the nelfinavir- treated cultures were found to produce significantly less PZN than a DMSO control (Figure A.12C). Unlike the cytolysins, PZN can be readily observed and quantified by mass spectrometry, which permitted confirmation of the nelfinavir dose-dependent inhibition (Figure A.13). The nelfinavir-dependent inhibition in divergent organisms of additional TOMM cytolysins, as well as a functionally distinct antimicrobial TOMM, not only supports the assignment of the CPBP protein being responsible for leader peptide removal during maturation, but also suggests that nelfinavir, and analogs thereof, could be generally useful for inhibition of TOMM production in a large number of hitherto uncharacterized clusters.

Figure A.13. Dose-dependent inhibition of PZN by nelfinavir. The production of plantazolicin by B. amyloliquefaciens is reduced in a dose-dependent manner when treated with nelfinavir, relative to a DMSO control. *P<0.05, **P<0.005.

163

Additional CPBP family members are also present in bacteria beyond TOMM biosynthetic gene clusters. One such family member is PrsW from Bacillus subtilis, which is involved in cleaving the peptide RsiW (anti-σ factor) during a signaling cascade following cell envelop stress.53 It is believed that PrsW is responsible for sensing antimicrobial peptides and other cell envelop stressing agents.53 Since PrsW is a more divergent CPBP family member and is not expected to be involved in TOMM biosynthesis, we were interested in determining if it would also be inhibited by nelfinavir. Previously reported strains of B. subtilis containing an IPTG inducible, FLAG-tagged rsiW with and without prsW genetically deleted53 were grown in the presence or absence of nelfinavir. After induction of prsW, the cells were treated with small amounts sodium hydroxide as a model cell envelop stressing agent. Rather than observing nelfinavir-dependent inhibition of PrsW, however, nelfinavir appeared to activate the protease even in the absence of sodium hydroxide (Figure A.14). While this could indicate that nelfinavir is binding to the protease but resulting in activation instead of inhibition, it is also possible that nelfinavir is simply acting as a mild cell envelop stressing agent. This would result in non-specific activation of PrsW, complicating any attempts to determine the interaction between nelfinavir and the protease. Nevertheless, it appears that the strong inhibitory activity of nelfinavir towards CPBP family members likely does not extend significantly beyond members in TOMM biosynthetic gene clusters.

Figure A.14. Cleavage of RsiW by PrsW. The CPBP family member PrsW cleaves RsiW under conditions of cell envelop stress, created here by sodium hydroxide. The WT strain is CDE743, containing a FLAG-tagged rsiW under IPTG controlled induction. The ΔprsW strain is CDE744, a modified version of CDE743 with prsW genetically deleted.

A.7 Target validation studies Although the previous data strongly support SagE as the target of nelfinavir in SLS biosynthesis inhibition, we sought to further validate this hypothesis through affinity pulldown with a photoactivatable crosslinker probe. The previously described synthesis (Scheme A.1) was utilized to install photoactivatable moieties as well as terminal alkynes for subsequent azide-alkyne

164

cycloaddition click chemistry. As the most facile position to modify, replacement of the benzamide was initially investigated. The installation of a photoactivatable benzophone moiety at this position unfortunately yielded a barely active compound (140), but installation of the smaller aryl alkyne did not significantly affect activity (141). Thus, as leucine-mimic analogs of nelfinavir had been determined to be active (22, 89, 114), an analog derived from D-photo-leucine containing a photoactivatable diazirine moiety was synthesized with the aryl alkyne in the benzamide position. The D-form of photo-leucine was chosen rather than the L-form as head-to-head comparisons of 22 and 114 consistently indicated that the R stereochemistry at the side chain of the P1 position was more active. The probe (150, Figure A.15A) was determined to retain inhibitory activity via the hemolysis assay, albeit weakly, and was thus used for subsequent affinity pulldown studies. Cultures of S. pyogenes were treated with 150 before irradiation with UV light, cell lysis, and conjugation to biotin via copper mediated azide-alkyne cycloaddition. The lysates were then passed over streptavidin resin for affinity pulldown or were visualized directly by western blotting. Unfortunately, it became quickly apparent that copious non-specific crosslinking was occurring due to the presence of numerous bands by western blot (Figure A.15B), none of which were diminished in competition experiments with nelfinavir. As SagE was shown to be active in E. coli, a strain overexpressing SagE was utilized for labeling and also displaying non-specific crosslinking (Figure A.15C). A proteomics analysis of the labeled proteins revealed that several of the most abundant cellular proteins were crosslinking to the probe (150), including EF-tu and protein S9. These abundant housekeeping proteins overwhelmed the analysis and no proteins that would reasonably be expected to be involved in SLS biosynthesis were detected. It is probable that 150, as well as nelfinavir, non-specifically stick to many different cellular components, likely contributing to the range of known off-target activities of nelfinavir.

165

Figure A.15. Cell extracts photocrosslinked with 150. (A) Structure of probe 150. (B) Western blot of the insoluble fraction of lysates from S. pyogenes M1 WT cells labeled with probe 150. (C) Western blot of total lysate (lane 1) and the insoluble fraction (lane 2) from E. coli labeled with probe 150.

Although initial attempts to heterologously express and reconstitute SagE were unsuccessful, we believed it might be possible to generate properly folded SagE in the presence of lipid nanodiscs as the protein is expected to normally be membrane bound (Figure A.2). Lipid nanodiscs are portions of lipid bilayer solubilized by the action of membrane scaffolding proteins that wrap around the hydrophobic tails of the lipids.54 The nanodiscs have been shown to be effective platforms for stably reconstituting otherwise intransigent membrane proteins in vitro. SagE was generated through a coupled in vitro transcription/translation system in the presence of lipid nanodiscs. The reactions were then tested for their ability to cleave an MBP-SagA substrate. Although intact SagE could be visualized by western blotting, no cleavage activity could be detected (Figure A.16).

166

Figure A.16. Representative example of the lack of MBP-SagA cleavage by SagE expressed in vitro with lipid nanodiscs. Each lane of the western blot represents a different reaction condition for the in vitro transcription/translation (TNT) of SagE or for the cleavage of MBP-SagA. (1) no DNA, 30 °C TNT, 37 °C cleavage; (2) pET28-Mistic-SagE, 30 °C TNT, 37 °C cleavage; (3) pET28-Mistic-SagE, 18 °C TNT, 37 °C cleavage; (4) pCOLA- SagE-SagF-Stag, 30 °C TNT, 37 °C cleavage; (5) pCOLA- SagE-SagF-Stag, 18 °C TNT, 37 °C cleavage; (6) pET28-Mistic-SagE, 30 °C TNT, 18 °C cleavage; (7) pET28-Mistic-SagE, 18 °C TNT, 18 °C cleavage; (8) pCOLA- SagE-SagF-Stag, 30 °C TNT, 18 °C cleavage; (9) pCOLA- SagE-SagF-Stag, 18 °C TNT, 18 °C cleavage. No cleavage is seen in any condition.

In a final effort to validate SagE as the target of nelfinavir, sagE (with and without sagF) was cloned into the S. pyogenes overexpression vector pIB184 under constitutive control.55 If inhibition of SagE by nelfinavir causes a bottleneck in SLS biosynthesis, overexpression of SagE may abrogate the inhibition. The amount of SLS produced by strains containing the sagE vectors was compared to an empty vector control in the presence or absence of nelfinavir via the hemolysis assay. No difference in the amount of SLS produced between sagE-containing or empty vectors could be detected in any instance (Figure A.17). Taking the opposite approach, overexpression vectors containing antisense sequences to sagE were created, transformed into S. pyogenes, and tested in the hemolysis assay. Again, however, no synergistic inhibition of SLS biosynthesis was observed (Figure A.17). qRT-PCR analysis of strains containing the overexpression vectors against empty vector controls indicated that the presence of the vectors was resulting in a several fold increase in sagE RNA quantities (Table A.5). Therefore, it is likely that the levels of SagE were being effectively modulated but that this was having no effect on SLS biosynthesis in the presence of nelfinavir. It is possible that SagE requires other cellular components to function that are more limiting than the quantity of SagE. If nelfinavir only binds to SagE in the context of this interaction, modulating the expression of SagE would not be expected to have a significant effect on nelfinavir inhibition. However, it is also possible that nelfinavir does not directly inhibit SagE or that the inhibition of nelfinavir biosynthesis is achieved through the interaction of nelfinavir and a different target. Even in this latter scenario, it is still possible that nelfinavir inhibits proteolysis via SagE, but that this simply isn’t a limiting factor in the biosynthesis of the mature cytolysin.

167

Figure A.17. Over- and under-expression of SagE does not affect SLS biosynthesis inhibition by nelfinavir. Lytic activity of extracts from S. pyogenes containing pIB184 vectors with sagE/F or antisense sagE treated with nelfinavir relative to DMSO controls (n = 2).This subset of data is representative of the overall trend.

Table A.5. Changes in SagE/F expression as measured by qRT-PCR with S. pyogenes M1 5448 transformed with pIB184 vectors. Values are from three biological replicates averaged from three technical replicates each, except for the pIB184-sagEF vector which was only one biological replicate (indicated by a *). The sagB gene was used as an internal control gene and relative gene 50 expression was calculated by a comparative CT method. Positive values represent up-regulation of the genes. sagE appears to be upregulated with the antisense vector as the primers used cannot differentiate between normal sagE RNA and antisense sagE RNA. The primers used are given in the supplemental methods.

Log ratio Vector Gene 2 (treated/untreated) pIB184-sagE sagE 4.54 ± 1.32

pIB184-sagEF sagE 3.11*

pIB184-sagEF sagF 2.55*

pIB184-sagF sagF 4.63 ± 0.72

pIB184-AntiEfull sagE 6.07 ± 0.41

A.8 Discussion In this work, the FDA-approved HIV protease inhibitor nelfinavir was repurposed as the first small molecule inhibitor of SLS production in S. pyogenes, displaying low micromolar

168

activity. Nelfinavir was identified as a lead compound by leveraging the extensive basic and clinical research data accumulated on the effects of the drug. Lipodystrophy, a known side effect of nelfinavir and several other HIV protease inhibitors, had been previously linked to the off-target inhibition of the CaaX protease ZMPSTE24. We surmised that the HIV protease inhibitors would also inhibit SagE due to its homology with CaaX proteases, allowing us to rapidly identify a lead compound for the inhibition of SLS production. This strategy for lead identification negated the need for high-throughput screening or for a crystal structure of the target for in silico and structure- based design. Utilizing a drug with synthetic routes that have been thoroughly explored also greatly accelerated the creation of analogs for SAR efforts that yielded compound 17, with an improved

IC50 value of 1 µM. Nelfinavir and related compounds are inhibitors of SLS biosynthesis, most likely through inhibition of proteolytic cleavage of the protoxin by the CPBP family member SagE. Although in vitro reconstitution was unsuccessful, the necessity for SagE during SLS production in the multi- plasmid expression system in E. coli provides considerable evidence for its role in proteolytic processing. Like many CPBP members, SagE is commonly referred to as an immunity protein in the literature,15, 16 but inhibition by nelfinavir did not have any effect on the growth of S. pyogenes, providing evidence that SagE is not involved in self-immunity. Alternatively, compensatory mutations that abolish SLS production may arise when SagE is inactivated, as has been previously suggested.16 The original annotation of SagE as an immunity protein stems from its similarity to PlnP from Lactobacillus plantarum. PlnP and several related proteins are found downstream of bacteriocin structural genes in L. plantarum and have been shown to provide immunity to the antibacterial effect of the respective bacteriocins.36 Yet, unlike these bacteriocins, SLS has not been demonstrated to possess any antibacterial activity against intact S. pyogenes cells. A large buildup of intracellular SLS might result in toxicity, but this would be expected to affect any S. pyogenes strains in which the transport machinery is inactivated as well, which has not been observed.15 Furthermore, treatment of SagA with the cyclization machinery (SagBCD) in vitro results in a lytic entity without cleavage of the leader peptide, indicating that while proteolysis is required for cellular export, it is unnecessary for lytic activity.21 These observations lead us to conclude that the principal function of SagE is to proteolytically mature SLS. In addition to experimentally supporting a biochemical role for SagE, we have also addressed the mechanism of proteolysis and the probable inhibition by nelfinavir. CPBP family

169

members have been postulated to function through a zinc metalloprotease mechanism based on the presence of two glutamates, two histidines, and an asparagine residue that are conserved across the family (Figure A.11).34 Mutation of these residues in the eukaryotic type II CaaX protease Ras- converting enzyme (Rce1) has also demonstrated that these residues are critical for activity.34 We found that Glu131 and Glu132 were critical for the activity of SagE. Thus, it was initially unexpected that proteolysis was inhibited by pepstatin, a general aspartyl protease inhibitor, but not by bestatin, a metalloprotease inhibitor (Figure A.5). However, a recent report detailing the crystal structure of Rce1 from Methanococcus maripaludis provides compelling evidence that this family of proteases actually functions through a novel glutamate-dependent mechanism (Figure A.18).42 The authors hypothesized that a glutamate residue extending into the active site is responsible for deprotonating a water molecule, activating it for nucleophilic attack. Given the similarities between this proposed mechanism and the mechanism of aspartyl proteases, it is perhaps unsurprising that a CaaX protease homolog would be inhibited by aspartyl protease inhibitors, including nelfinavir.

Figure A.18. Proposed mechanism for glutamate-dependent proteases. A potential mechanism for glutamate dependent proteases based on the crystal structure of Rce1 is shown.42 Glutamate and histidine activate water for nucleophilic attack while a second histidine residue and an asparagine residue labeled “Oxy” comprise the . A second conserved glutamate near the active site is thought to be structurally important. A general mechanism for aspartyl proteases is shown for comparison.

In addition to increasing the potency of inhibition with compound 17, our synthetic effort also yielded information on how nelfinavir may interact with SagE. The SAR analysis revealed that a rather large side chain in the pseudo P1 position of the structure was required for potent activity, with the original S-phenyl-ethyl group displaying the greatest inhibition. This result was

170

not anticipated, given that the P1 residue in the SagA substrate is putatively glycine. One possible explanation for this discrepancy is that the catalytic site architecture of SagE is highly conserved with the type II CaaX proteases, which normally cleave substrates with a prenylated cysteine in the P1 position. In this case, nelfinavir would be an ideal fit for the active site, as the core of the molecule closely resembles a cysteine with a hydrophobic group appended. An alternative explanation is that nelfinavir does not bind in the active site in the expected fashion (with the secondary hydroxyl interacting with the catalytic residues, Figure A.10) or does not bind in the active site at all. These possibilities are supported by the potent activity of several analogs with different stereochemical configurations, as these molecules would likely be forced into different conformations that do not allow the same favorable interactions with active site residues. In this scenario, binding of nelfinavir to the membrane protease may be driven by hydrophobic interactions. This explanation would also account for the attenuated activity of saquinavir, which is nearly identical to nelfinavir except for the presence of a more hydrophilic group at the benzamide position (Figure A.6, XLogP3 values from PubChem for nelfinavir and saquinavir are 5.7 and 4.2, respectively). Comparisons to the other HIV protease inhibitors do not yield much additional information due to their low similarity to nelfinavir, as reflected in the Tanimoto similarity coefficients (Table A.6).

171

Table A.6. The Tanimoto similarity coefficients for the FDA-approved HIV protease inhibitors. Higher Tanimoto coefficients indicate higher similarity. Compounds that inhibited the production of SLS are highlighted in green. Abbreviations: ampr, amprenavir; ataz, atazanavir; daru, darunavir; indi, indinavir; lopi, lopinavir; nelf, nelfinavir; rito, ritonavir; saqu, saquinavir; tipr, tipranavir.

ampr ataz daru indi lopi nelf rito saqu tipr

ampr 1.00 0.24 0.67 0.14 0.21 0.14 0.21 0.20 0.12

ataz 0.24 1.00 0.22 0.15 0.20 0.13 0.21 0.20 0.11

daru 0.67 0.22 1.00 0.13 0.20 0.13 0.20 0.19 0.11

indi 0.14 0.15 0.13 1.00 0.15 0.18 0.14 0.22 0.10

lopi 0.21 0.20 0.20 0.15 1.00 0.14 0.28 0.19 0.11

nelf 0.14 0.13 0.13 0.18 0.14 1.00 0.10 0.40 0.09

rito 0.21 0.21 0.20 0.14 0.28 0.10 1.00 0.17 0.10

saqu 0.20 0.20 0.19 0.22 0.19 0.40 0.17 1.00 0.10

tipr 0.12 0.11 0.11 0.10 0.11 0.09 0.10 0.10 1.00

Unequivocal confirmation of the bacterial target of nelfinavir (and analogs) was unfortunately not possible in the present study, which can be attributed to the technical challenges inherent to the study of integral membrane proteins. Nelfinavir also likely interacts with multiple targets in S. pyogenes, as the drug is known to have multiple off-target effects in humans.45 A significant body of work in mammalian cell lines has demonstrated that nelfinavir displays promiscuous activity, such as interruption of Akt signaling and inhibition of the proteasome.56 Target promiscuity likely also exists in bacteria, so possible interactions of nelfinavir with additional targets cannot be ruled out. Thus, nelfinavir may be inhibiting additional participants that indirectly result in the inhibition of SLS production in a mechanism unrelated to proteolysis. Further targets of nelfinavir may also exist that do not result in observable phenotypes. However, nelfinavir also inhibited the biosynthesis of other natural products that include a CPBP family

172

member in the gene cluster (i.e. LLS, CLS, and PZN). The fact that this inhibition occurred in a range of disparate bacterial species decreases the probability that another protease is responsible and provides substantial, albeit indirect, support that SagE is the primary target of nelfinavir in S. pyogenes. Finally, the HIV protease inhibitors found to inhibit SLS production parallel those capable of inhibiting the human CaaX protease ZMPSTE24 (Figure A.6),46, 47 providing additional evidence that nelfinavir and its analogs inhibit SLS production by blocking the action of SagE.

A.9 Summary and Outlook Despite their prevalence, prokaryotic members of the CPBP family have not yet been thoroughly investigated. Many of the family members are incorrectly annotated or have a predicted function based solely on distant homology. The discovery of nelfinavir as an inhibitor of CPBPs has provided evidence that SagE functions as a protease and will aid in the assignment of functions to other family members, including other human pathogens such as S. aureus. Additionally, nelfinavir is the first reported inhibitor of the production of SLS and related toxins. Nelfinavir and improved analogs will provide a new tool to investigate toxin function without the need to create genetic deletions while also allowing for temporal control over toxin production. Reversible control of SLS production with nelfinavir analogs will also help to clarify the precise contribution of SLS to virulence in in vivo models of infection and may open the door to the development of virulence-targeting strategies for the control of S. pyogenes infections. Finally, the chemical knockdown effect of nelfinavir can be utilized for the discovery of natural products from the 22% of TOMM gene clusters that contain a CPBP family member, potentially accelerating the structural and functional characterization of these compounds.

A.10 Methods A.10.1 Materials All chemicals were purchased from Sigma-Aldrich, VWR, Fisher Scientific, or Oakwood Products and used without further purification unless otherwise specified. HIV protease inhibitors were obtained from the NIH AIDS Research and Reference Program, Division of Aids, NIAID, NIH. Nelfinavir, ritonavir, saquinavir, indinavir sulfate, amprenavir, lopinavir, atazanavir sulfate, tipranavir, and darunavir were received through distribution by Thermo Fisher.

173

A.10.2 Plasmid construction The genes encoding Flag-SagA and SagA-Flag were generated in pDCerm via PCR utilizing primers specific to sagA and containing the Flag tag sequence and XbaI (Flag-sagA) or BamHI (sagA-Flag) restriction sites. For heterologous expression of these peptides in Escherichia coli, the genes of interest were subcloned into a pET28 vector containing an N-terminal maltose binding protein (MBP) fusion as previously described.21 The construction of the Duet vectors pETDuet-1-sagB-sagC and pACYCDuet-1-MBP-sagA-sagD has been previously described.43 The Duet vector containing sagE and sagF was constructed using the pCOLADuet-1 vector (EMD Millipore) in an analogous manner.43 Point mutations of SagA and SagE were introduced by the Quikchange site-directed mutagenesis kit (Agilent), following the manufacturer’s instructions. The genes encoding SagE, SagEF, and SagF were inserted into pIB184 using Gibson cloning following the manufacturer’s instructions (New England Biolabs) with the EcoRI cut site to generate pIB184-sagE, pIB184-sagEF, and pIB184-sagF respectively. Antisense RNA comprising the entire sequence of sagE and the first 150 bp of sagE was inserted into pIB184 in a similar manner to generate pIB184-AntiEfull and pIB184-AntiE150 respectively. A.10.3 Generation of SagA substrates 35S-Methionine-labeled MBP-SagA was generated as previously described (from rabbit reticulocyte, T7-coupled in vitro transcription/translation, Promega).57 MBP-Flag-SagA and MBP-SagA-Flag were generated by heterologous expression and purification as previously described.21

A.10.4 Preparation of S. pyogenes membranes An overnight culture (5 mL) of S. pyogenes M1 5448 ΔsagA16, 21 was grown in Todd- Hewitt Broth (THB) at 37 °C and used to inoculate a 50 mL culture of THB. This culture was grown to an OD600 of 0.6 and harvested at 4,000 x g (5 min, 4 °C), and the cell pellet was transferred to a 1.7 mL microfuge tube using 1 mL of cell lysis buffer [50 mM MOPS (pH 7.6), 50 mM NaCl, 5% (v/v) glycerol, 1500 U mutanolysin, and 5 mg lysozyme]. The cell wall was digested for 1 h at 37 °C with agitation before cooling to 4 °C, directly prior to cell disruption by sonication (3 rounds for 30 s each, 30% power using a microprobe). The sample was then subjected to 3 rounds of freeze/thaw cycling using liquid nitrogen with cap venting upon warming to room temperature. A final round of sonication was then applied and any remaining intact cells were removed by a

174

low-speed centrifugation step (500 x g, 3 min, 4 °C). Omission of either the sonication or freeze/thaw cycling steps in this procedure gave poor membrane fraction preparations. The supernatant of the low-speed spin is referred to as the whole-cell lysate fraction. The majority of this fraction was transferred to a thick-walled microfuge tube prior to a subsequent high-speed spin (50,000 x g, 45 min, 4 °C), in which the membrane fraction was separated from the whole-cell lysate. The supernatant of this spin, referred to as the soluble fraction, was transferred to a clean microfuge tube. Proteins from the harvested membrane fraction were resolubilized in 300 µL of membrane extraction buffer [50 mM HEPES (pH 7.4), 125 mM NaCl, 5% (v/v) glycerol, and 1% (w/w) dodecyl maltoside] with light sonication (2 rounds for 1 s each, 10% power using a microprobe). The extraction was allowed to proceed for 1 h at 4 °C with rocking. Protein concentration in each sample (whole-cell lysate, soluble, and membrane fractions) were quantified using the Bradford assay, adjusted to the same concentration (3.8 mg/mL) and would typically produce a total of 1-2 mg of protein for each fraction.

A.10.5 Localization of SagA/SLS proteolytic activity To S. pyogenes fractions (whole-cell lysate, soluble, and membrane, 13.2 µL, 50 µg protein each), was added TCEP (1 µL of 50 mM), purified 35S-methionine-labeled MBP-SagA, MBP- Flag-SagA, or MBP-SagA-Flag protein pre-diluted in membrane extraction buffer (1 µL of 90 µg/mL), and 14.8 µL of membrane extraction buffer for a total reaction volume of 30 µL. Aliquots were removed at specific times of reaction at 25 °C and quenched with SDS-PAGE loading buffer. Proteins were separated on a 10% SDS-PAGE gel with 25 ng of MBP-SagA protein loaded per lane. For samples utilizing 35S-methionine-labeled MBP-SagA, the gels were dried and visualized by autoradiography (typically requiring ~10 h of exposure at -80 °C using a Kodak BioMax low- energy isotope intensifying screen). For samples utilizing MBP-Flag-SagA or MBP-SagA-Flag, protein was transferred to nitrocellulose membranes (Pierce) by electroblot. Membranes were blocked for 30 min at 25 °C in 4% non-fat milk in TBST [25 mM Tris (pH 7.6), 150 mM NaCl, 0.1% (v/v) Tween 20] before anti-Flag antibodies (M2, mouse monoclonal, Sigma-Aldrich) were added at a 1:5,000 dilution directly to the buffer and allowed to bind for 1 h at 25 °C. The buffer was discarded and the membranes were washed 3 times for 5 min each with TBST. The membranes were then treated with anti-mouse horseradish peroxidase (HRP, GE Healthcare) at a 1:5,000 dilution for 30 min at 25 °C followed by 3 washes for 5 min each with TBST. The membranes

175

were then treated with the SuperSignal West Pico Chemiluminescent substrate (Pierce) prior to film imaging.

A.10.6 Expression of SLS in E. coli The three vectors comprising SagABCDEF were co-transformed into BL21-DE3 cells (Invitrogen) and plated on LB agar plates with 100 µg/mL ampicillin, 35 g/mL chloramphenicol, and 30 g/mL kanamycin. The resulting colonies were then chosen and tested for their ability to produce hemolytic activity. To perform hemolytic tests, 10 mL LB cultures with 100 µg/mL ampicilln, 35 g/mL chloramphenicol, and 30 g/mL kanamycin were grown overnight at 37C with shaking. Separate 10 mL LB cultures were inoculated 1:50 with each of these starter cultures

and grown to OD600 0.7. Cultures were induced with 0.1 mM isopropyl--D-thiogalactopyranoside (IPTG) and bovine serum albumin (BSA, 10 mg/ml) was added to the induced cultures. An aliquot was removed at 2 and 8 h for hemolytic assay testing. Aliquots were centrifuged at 4,000 x g for 15 min and supernatants were collected and directly added to washed, defibrinated sheep blood.

A.10.7 Erythrocyte lysis assay Defibrinated sheep blood (Hemostat Laboratories) was rinsed by resuspension in phosphate buffered saline (PBS; 137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4·2H2O, 2 mM

KH2PO4, pH 7.4) followed by centrifugation at 300 x g for 10 min at 4 °C. The supernatant was discarded and the pellet was rinsed again, repeating until the supernatant was nearly colorless. The blood was resuspended in fresh PBS to a concentration within the linear absorbance range at 405 or 450 nm when treated with 0.1% Triton X-100. SLS extracts were applied to the blood in a 3:4 ratio and incubated at 37 °C for 30 to 120 min. Lysed cells were removed by centrifugation at 300

x g for 10 min at 4 °C and hemoglobin release was measured by A405 (FilterMax F5 plate reader)

or A450 (Cary 4000 UV-Vis).

A.10.8 Membrane proteolysis inhibition Purified membranes from S. pyogenes M1 5448 were used to digest [35S]-MBP-SagA in the presence of nelfinavir or mechanism-based protease inhibitors. A typical reaction (25 µL total volume) consisted of 18 µL of S. pyogenes membranes (5.0 mg/mL protein concentration), [35S]- MBP-SagA, protease inhibitor, and TCEP (2 mM). The inhibitors tested were pepstatin A (for

176

aspartyl proteases, 60 µM), bestatin (for metalloproteases, 200 µM), E-64 (for cysteine proteases, 200 µM), and pefabloc (for serine proteases, 400 µM). Nelfinavir was also tested in subsequent assays. Aliquots were removed at specific times of reaction at 25 °C, quenched with SDS-PAGE loading buffer, and analyzed by SDS-PAGE as described above.

A.10.9 Effect of FDA-approved HIV protease inhibitors on S. pyogenes An overnight culture (5 mL) of S. pyogenes M1 5448 was grown in THB without shaking

at 37 °C and used to inoculate multiple, identical 50 mL cultures of THB. Upon reaching an OD600 of 0.2, the cultures were diluted two-fold using an equal volume of THB containing the desired concentration of protease inhibitor (initial screening was at 50 µM; nelfinavir analog screening was at 25 µM) from 10 mM stock solutions in DMSO. An equal volume of DMSO was used as a negative control for these assays. The OD600 of these cultures was measured in triplicate every 30

min using a Cary 4000 UV-Vis spectrophotometer. For the cultures that reached an OD600 of 0.6, any SLS present on the cell surface was isolated using a BSA extraction procedure. SLS activity was then quantified by erythrocyte lysis assay.

A.10.10 BSA extraction of SLS

S. pyogenes M1 5448 was grown in THB at 37 °C as described above to an OD600 of 0.6. Cultures were harvested by centrifugation at 4000 x g for 10 min at 4 °C and the supernatant was discarded. The pellets were resuspended in fresh THB containing 10 mg/mL BSA and incubated at 37 °C for 1 h. The cultures were harvested by centrifugation at 4000 x g for 10 min at 4 °C and the supernatant was subsequently used in the erythrocyte lysis assay either undiluted or at a 1:3 dilution in fresh THB containing 10 mg/mL BSA.

A.10.11 Determination of IC50 values S. pyogenes M1 5448 was grown in the presence of inhibitors with a concentration range of 0.2 to 64 μM. SLS extracts were applied to the erythrocyte lysis assay alongside a DMSO treated control. Samples were centrifuged at 300 x g for 10 min at 4 °C and the supernatants were assayed

for hemoglobin release when DMSO control treated blood had reached roughly 90% lysis. IC50 values were defined as the concentration of inhibitor needed to prevent 50% of the hemolysis relative to the DMSO control over the given time course.

177

A.10.12 Minimum inhibitory concentration (MIC) testing Cultures of Bacillus subtilis 168 and E. coli MC4100 were grown in Luria-Bertani (LB) broth; cultures of Listeria monocytogenes 4b F2365, Staphylococcus aureus USA300 (MRSA), and Enterococcus faecium U503 (VRE) were grown in brain heart infusion (BHI) broth; cultures of S. pyogenes M1 5448 were grown in THB; cultures of C. sporogenes ATCC 19404 were grown in anaerobically prepared THB supplemented with 5 mM L-cysteine; cultures of Bacillus amyloliquefaciens RS6, Klebsiella pneumoniae ATCC 27736, and Pseudomonas aeruginosa PA01 were grown in Mueller Hinton broth. All strains were started as overnight cultures, were incubated at 37 °C, and were used at 1:100 dilutions to inoculate fresh cultures in the same media. The cultures were grown at 37 °C to mid-exponential phase prior to MIC testing. Nelfinavir was added to the cultures at 1 to 64 µM from a 10 mM DMSO stock. The treated cultures were incubated for 16 h at 37 °C and growth was visually examined.

A.10.13 Electron microscopy An overnight culture of S. pyogenes M1 5448 was grown in THB at 37 °C and used to inoculate 5 mL cultures of THB containing 25 µM nelfinavir (from a 10 mM stock) or an equivalent amount of DMSO at a 1:20 dilution. The cultures were grown to an OD600 of 0.6, harvested by centrifugation (5,000 x g, 10 min, 25 °C), and fixed with a Karnovsky’s fixative in phosphate buffered 2% glutaraldehyde and 2.5% paraformaldehyde. Microwave fixation was used with this primary fixative and the tissue was then washed in cacodylate buffer with no further additives. Microwave fixation was also used with the secondary 2% osmium tetroxide fixative, followed by the addition of 3% potassium ferricyanide for 30 min. After washing with water, saturated uranyl acetate was added for enbloc staining. The sample was dehydrated in a series of increasing concentrations of ethanol. Acetonitrile was used as the transition fluid between ethanol and the epoxy. Infiltration series were done with an epoxy mixture using the epon substitute Lx112. The resulting blocks were polymerized at 90 °C overnight, trimmed, and ultrathin sectioned with diamond knives. Sections were stained with uranyl acetate and lead citrate and photographed with a Hitachi H600 Transmission Electron Microscope.

178

A.10.14 qRT-PCR of virulence gene expression An overnight culture of S. pyogenes M49 NZ131 was grown in THB at 37 °C and used to inoculate 2 mL cultures of THB containing 25 µM nelfinavir (from a 10 mM stock) or an

equivalent amount of DMSO at a 1:20 dilution. The cultures were grown to an OD600 of 0.8 before RNA was isolated using an RNeasy Protect Bacteria Mini Kit (Qiagen) following the manufacturer’s instructions, except for the lysis step, which was extended to 1 h with the addition of 250 U mutanolysin. The absence of DNA contamination was confirmed by PCR with primers to the 16S rRNA and sagA genes. Isolated RNA was used to generate cDNA using the ImProm II kit (Promega) following the manufacturer’s instructions, running reactions both with and without reverse transcriptase (RT). RT negative reactions again confirmed the absence of DNA contamination. qRT-PCR reactions were initiated with 2 uL of cDNA (~2.5 ng/µL), 2 µL of 10 µM forward primer, 2 µL of 10 µM reverse primer, 10 µL of iTaq Universal SYBR Green Supermix (Bio-Rad), and 4 µL of water and were run on a Lightcycler 480 (Roche). The fold- 50 change in gene expression was calculated using a comparative CT method. We defined a significant change in gene expression to be genes that were either up- or down-regulated by more

than 1.5 fold (log2) with P-values <0.01.

A.10.15 Inhibition of LLS production Overnight cultures (5 mL) of L. monocytogenes were grown in BHI media at 37 °C and used to inoculate a 50 mL culture of BHI at a 1:20 dilution. Upon reaching an OD600 of 0.2, the culture was used to create multiple, identical cultures diluted by twofold using an equal volume of BHI containing 50 µM nelfinavir from a stock solution of 10 mM in DMSO. An equal volume of DMSO was used as a negative control for these assays. L. monocytogenes F2365LLSC Δhly (LLO genetically deleted, LLS constitutively expressed) was used as the “+LLS” strain and L. monocytogenes F2365LLSC ΔhlyΔllsB (LLO and LLS genetically deleted) was used as the “-LLS” strain51. LLS was extracted in a manner analogous to SLS and applied to the erythrocyte lysis assay.

A.10.16 Inhibition of CLS production Overnight cultures (5 mL) of Clostridium sporogenes ATCC 19404 were grown in anaerobically prepared THB containing 5 mM L-cysteine at 37 °C in an anaerobic chamber (Coy

179

Lab Products) and used to inoculate 5 mL cultures at a 1:40 dilution of anaerobically prepared

THB containing 5 mM L-cysteine. The cultures were incubated at 37 °C until an OD600 of 0.15, at which point 50 µM nelfinavir from a stock solution of 10 mM in DMSO was added. An equal volume of DMSO was used as a negative control. CLS was extracted in a manner analogous to SLS and applied to the erythrocyte lysis assay.

A.10.17 Inhibition of plantazolicin production 10 mL cultures of B. amyloliquefaciens RS6 (sfp::ermAM bac::cmR, deficient in 58 lipopeptides, polyketides, and bacilysin) were grown in LB broth at 37 °C to an OD600 of 0.2 or 0.6 (early- to mid-log phase, respectively), at which point nelfinavir or an equivalent volume of DMSO was added to the liquid cultures. Following inhibitor addition, cultures were grown with shaking at 37 °C for 24 h. Cells were harvested by centrifugation (4,000 x g, 10 min, 4 °C), washed with Tris-buffered saline (TBS, pH 8), and harvested again (8,000 x g, 10 min, 4 °C). PZN was obtained by a non-lytic, methanolic cell surface extraction, as described previously.52, 59 Briefly, cells were resuspended in MeOH (1 mL), agitated by vortex for 30 s, and equilibrated for 15 min at 22 °C. Cells were harvested by centrifugation (8,000 x g, 10 min, 4 °C), and the MeOH supernatant was evaporated in a speed-vac concentrator with minimal heating. Dried, crude extracts were resuspended in 100 µL of MeOH and filtered through a 0.22 µm filter in preparation for analysis by liquid chromatography-mass spectrometry (LC-MS) using an Agilent 1200 HPLC system coupled to a G1956B quadrupole mass spectrometer with an electrospray ionization (ESI) source. LC used a 4.6 mm x 250 mm Thermo Scientific BetaSil C18 column (100 Å, 5 µm) with a gradient of 40-90% MeCN (with 10 mM NH4HCO3 in the aqueous phase) over 25 min, with the analytes eluted directly into the MS. PZN was quantified by integrating the total area under the curve for the single-ion monitoring (SIM) trace of m/z 668 ([M+H]2+), normalized to a negative control sample included in each set.

A.10.18 Bioinformatics analysis of CaaX-like proteins in divergent TOMM clusters All YcaO superfamily proteins (InterPro IPR003776, SagD-like) were obtained from UniProt on October 28, 2014. Genome clusters were defined by using regions of 1000 bp on either side of the YcaO-like protein. Gene cluster families were created using MultiGeneBlast11 with a cutoff of total score above 10. TOMM clusters were defined as containing both a YcaO-like protein

180

and a SagC or SagA like ortholog. All gene clusters were manually examined with special attention toward stand-alone YcaO clusters such as bottromycin (i.e. lacking the C-protein). CaaX-like proteins were defined as being orthologous to the SagE protein and identified using a Hidden Markov Model (HMM) created with SagE orthologs from Clostridium and Bacillus.

A.10.19 PrsW cleavage of RsiW B. subtilis strains CDE743 (B. subtilis PY79 amyE::Phyperspank-3xflag-rsiW) and CDE744 (B. subtilis PY79 amyE::Phyperspank-3xflag-rsiW psrW::erm) were obtained from Craig Ellermeier (University of Iowa, Iowa City). Overnight cultures of CDE743 and CDE744 were grown in LB containing 5 µg/mL kanamycin (for maintenance of the rsiW-containing vector) at 37 °C with shaking. An additional 25 µg/mL erythromycin was added to all cultures of CDE744. The overnight cultures were used to inoculate fresh 2 ml cultures in LB containing 5 µg/mL kanamycin at a 1:100 dilution. The cultures were grown at 37 °C with shaking to an OD600 of 0.1 before the addition of 1 mM IPTG and either 50 µM nelfinavir or an equivalent amount of DMSO.

The cultures were then grown to OD600 of 0.7 before 24 mM NaOH from a 100x stock was added to half of the cultures. The cultures were incubated at 37 °C with shaking for an additional 1 h. The cultures were pelleted and the supernatant was discarded. The pellets were resuspended in 375 µL of TAE buffer (40 mM Tris HCl, 20 mM acetic acid, 1mM EDTA, pH 8.0) with 10 mg/mL lysozyme. After a 10 min incubation at room temperature, the samples were mixed with 4x reducing loading dye and heated at 65 °C for 10 min. The samples were run on 12% SDS-PAGE gels and protein was transferred to PVDF membranes (Pierce) by electroblot. Membranes were blocked overnight at 4 °C in 5% non-fat milk in TBST before horse radish peroxidase (HRP) conjugate anti-Flag antibodies (M2, mouse monoclonal, Sigma-Aldrich) were added at a 1:5,000 dilution directly to the buffer and allowed to bind for 1 h at 4 °C. The buffer was discarded and the membranes were washed 5 times for 5 min each with TBST. The membranes were then treated with the SuperSignal West Pico Chemiluminescent substrate (Pierce) prior to film imaging.

A.10.20 Affinity labeling with crosslinking probe S. pyogenes M1 WT and E. coli BL21-DE3 were grown in THB and LB, respectively, as above. Overnight cultures of each species were used to inoculate fresh 50 mL cultures in the same

media. The cultures were grown to an OD600 of 0.9 at 37 °C and then pelleted at 4,000 x g for 10

181

min. The supernatants were discarded and the pellets were washed with 2 mL water before being resuspended in 1 mL of water. Probe 150 was added at 50 µM to each sample and the suspensions were incubated at 37 °C with shaking for 30 min. The samples were transferred to a 6-well plate and exposed to 365 nm light from a 4 W UV wand placed directly on the well for 20 min. The samples were transferred back to an Eppendorf tube and pelleted at 2,000 x g for 10 min. The pellets were resuspended in MOPS lysis buffer (50 mM MOPS, 50 mM NaCl, 5% glycerol, pH 7.0) containing 15 µL/mL protease inhibitor cocktail (2 µM leupeptin, 200 µM PMSF, 2 mM benzamidine, 2 µM E64) and either 200 U/mL mutanolysin (for S. pyogenes) or 15 mg/mL (for E. coli). For S. pyogenes samples, cell lysis was carried out by a one hour incubation at room temperature followed by freezing, thawing, and sonicating the cells for 30s, with the freeze/thaw/sonicate cycle being repeated 4 times total. For E. coli samples, cell lysis was carried out by a 30 min incubation at room temperature followed by four rounds of sonication for 30 s each. Biotin-peg3-azide (Jena Bioscience, Jena, Germany; 100 µM), copper sulfate (10 mM), and sodium ascorbate (20 mM) were added to each sample. The samples were incubated for 1 h and were then used as total lysate or were centrifuged at 17,000 x g for 10 min. The supernatant was used as the soluble portion of the lysate while the pellet was used as the insoluble portion. The insoluble portion was resuspended in 100 µL MOPS lysis buffer with 1% CHAPS with a reducing loading dye before application to 12% SDS-PAGE gels. Protein was transferred to PVDF membranes (Pierce) by electroblot. Membranes were blocked overnight at 4 °C in 5% BSA in TBST before neutrAvidin-HRP (Pierce) was added at a 1:30,000 dilution directly to the buffer and allowed to bind for 1 h at 4 °C. The buffer was discarded and the membranes were washed 5 times for 5 min each with TBST. The membranes were then treated with the SuperSignal West Pico Chemiluminescent substrate (Pierce) prior to film imaging.

A.10.21 Expression of SagE in lipid nanodiscs POPC lipid nanodiscs (disc concentration at 3uM, in 20mM tris, 0.1% NaCl, 15% glyercol) were obtained from Steven Sligar (University of Illinois, Urbana, IL). Coupled transcription/translation reactions were carried out with the Promega TNT T7 Coupled Rabbit Reticulocyte Lysate System (Progema product number L4610) following the manufacturer’s guidelines. Reactions contained 12.5 µL rabbit reticulocyte lysate, 1 µL TNT reaction buffer (supplied with kit), 0.5 µL T7 polymerase, 0.25 µL amino acid mix minus Met, 0.25 µL amino

182

acid mix minus Leu, 0.5 µL RNasin, 6.5 µL POPC nanodiscs, and 1 µg DNA. The DNA used for each reaction was either pET28-Mistic-SagE or pCOLA-SagE-SagF-Stag. Generation of the pCOLA is described above while the pET28 vector was generated in the Dixon lab (University of California, San Diego (UCSD), La Jolla, CA) by cloning from S. pyogenes M1 5448. The reactions were incubated at either 18 or 30 °C for 90 min before 2 µL of each reaction was mixed with 3 µL of 33.3 ng/µL MBP-SagA in various cleavage buffers (50 mM tris, 50-125 mM NaCl, 0-20 mM

MgCl2, 0-50 µM ZnCl2, 2.5 mM TCEP or 10 mM DTT, 0-2.5% glycerol, 0-3 mM ATP, pH 7.5). The cleavage reactions were incubated at 18 °C or 37 °C for 2 h before being mixed with a reducing loading dye and applied to 12% SDS-PAGE gels. Protein was transferred to PVDF membranes (Pierce) by electroblot. Membranes were blocked overnight at 4 °C in 5% non-fat milk in TBST before anti-MBP primary antibodies (rabbit polyclonal, generated in the Dixon lab, UCSD) were added at a 1:2,000 dilution directly to the buffer and allowed to bind for 2 h at 4 °C. The buffer was discarded and the membranes washed twice with TBST for 30 s each. The membranes were placed in 10 mL fresh TBST and 1.5 µL of anti-rabbit-HRP goat secondary antibodies were added to the buffer. The membranes were incubated for 45 min at 4 °C. The buffer was discarded and the membranes were washed 5 times for 5 min each with TBST. The membranes were then treated with the SuperSignal West Pico Chemiluminescent substrate (Pierce) prior to film imaging.

A.10.22 qRT-PCR of strains containing pIB184-sagE/F vectors Overnight cultures of S. pyogenes M1 5448 transformed with pIB184-sagE, pIB184- sagEF, pIB184-sagF, or pIB184-AntiEfull were grown in THB at 37 °C with 50 µg/mL kanamycin and used to inoculate 2 mL cultures of THB. The cultures were grown to an OD600 of 0.65 before RNA was isolated using an RNeasy Protect Bacteria Mini Kit (Qiagen) following the manufacturer’s instructions, except for the lysis step, which was extended to 1 h with the addition of 250 U mutanolysin. The absence of DNA contamination was confirmed by PCR with primers to the 16S rRNA and sagE genes. Isolated RNA was used to generate cDNA using the ImProm II kit (Promega) following the manufacturer’s instructions, running reactions both with and without reverse transcriptase (RT). RT negative reactions again confirmed the absence of DNA contamination. qRT-PCR reactions were initiated with 2 uL of cDNA (~2.5 ng/µL), 2 µL of 10 µM forward primer, 2 µL of 10 µM reverse primer, 10 µL of iTaq Universal SYBR Green

183

Supermix (Bio-Rad), and 4 µL of water and were run on a Lightcycler 480 (Roche). The fold- 50 change in gene expression was calculated using a comparative CT method. A.10.23 Tanimoto similarity analysis The structure of each HIV protease inhibitor in ChemDraw (PerkinElmer) was converted to .sdf format. Tanimoto similarity coefficients were calculated in Discovery Study Client 2.5 (Accelrys) using the .sdf files as input reference ligands for the Library Analysis protocol “Find Similar Molecules by Fingerprints”. ECFP_6 was used and the minimum similarity was set at zero. The compounds were all compared pair-wise.

A.10.24 List of primers Table A.7. Primer sequences used in this study. Restriction endonuclease recognition sites are underlined. Lower-case letters indicate bases targeted for mutagenesis. The function of each gene analyzed by qRT-PCR is given in Table A.2.

Primer Sequence (5’ to 3’) Description Construct Generation GAGTCTAGAATGGATTACAAGGATGA pDC-Flag- Forward primer to append an CGACGATAAGTTCAAATTTACTTCAAA SagA-fwd N-terminal Flag tag to sagA TATTTTA pDC-SagA- ACAGGATCCTTATTTACCTGGCGTATA Reverse primer to append an rev ACTTCC N-terminal Flag tag to sagA pDC-SagA- GAGTCTAGAATGTTAAAATTTACTTCA Forward primer to append a C- fwd AATATTTTAG terminal Flag tag to sagA ACAGGATCCTTACTTGTCGTCGTCATC pDC-SagA- Reverse primer to append a C- CTTGTAGTCTTTACCTGGCGTATAACTT Flag-rev terminal Flag tag to sagA CC Forward primer to subclone Flag-SagA- AAAAGGATCCATGGACTACAAGGATG Flag- sagA from pDCerm fwd ACGAC construct Reverse primer to subclone Flag-SagA- AAAAGCGGCCGCTTATTTACCTGGCGT Flag- sagA from pDCerm rev ATAACTTCCG construct Forward primer to subclone SagA-Flag- AAAAGGATCCATGTTAAAATTTACTTC sagA-Flag from pDCerm fwd AAATATTTTAGCTACTAGTGT construct

184

Table A.7. (cont.) Reverse primer to subclone SagA-Flag- AAAAGCGGCCGCTTACTTGTCGTCGTC sagA-Flag from pDCerm rev ATCCTTGTA construct Sequencing primer for pET28- MBP-fwd ATGAAGCCCTGAAAGACG MBP constructs AAAACCATGGCGCCTTTGTCCATCCAA Forward primer to clone sagE SagE-fwd TGC into pCOLADuet-1 AAAGCGGCCGCTCATGTCACCTCCTTC Reverse primer to clone sagE SagE-rev TTCTTTTTTG into pCOLADuet-1 AAAAAAAACATATGATGCTATTGGTTT Forward primer to clone sagF SagF-fwd TGCTGTCG into pCOLADuet-1 AAAACTCGAGCTAATACTCTTTGCAAC Reverse primer to clone sagF SagF-rev TAATCATCAAATAAGTC into pCOLADuet-1 Forward primer for SagE-E131A- GGCTATCCTTTATTACTAGCTTTATTTg incorporation of alanine at fwd caGAGACGATTTATCG position E131 in sagE Reverse primer for SagE-E131A- CGATAAATCGTCTCtgcAAATAAAGCTA incorporation of alanine at rev GTAATAAAGGATAGCC position E131 in sagE Forward primer for SagE-E132A- CTAGCTTTATTTGAAgcgACGATTTATC incorporation of alanine at fwd GTTTTTTGTGG position E132 in sagE Reverse primer for SagE-E132A- CCACAAAAAACGATAAATCGTcgcTTCA incorporation of alanine at rev AATAAAGCTAG position E132 in sagE Forward primer for SagA GCTACTAGTGTAGCTGAAACAACTCAA incorporation of leucine at VAPGG- GTTcttCCTctactcTGCTGTTGCTGCTGTAC positions A20, G22, and G23 VLPLL-fwd in sagA Reverse primer for SagA GTACAGCAGCAACAGCAgagtagAGGaag incorporation of leucine at VAPGG- AACTTGAGTTGTTTCAGCTACACTAGT positions A20, G22, and G23 VLPLL-rev AGC in sagA G184-SagE- CCCGCGGTACCCGGGATGCCTTTGTCC Forward primer to clone sagE fwd ATCCAATG (and sagEF) into pIB184 G184-SagE- AGATCTCGAGCTCTAGTCATGTCACCT Reverse primer to clone sagE rev CCTTCTTC into pIB184 G184-SagF- CCCGCGGTACCCGGGATGATGCTATTG Forward primer to clone sagF fwd GTTTTGC into pIB184 G184-SagF- AGATCTCGAGCTCTAGCTAATACTCTT Reverse primer to clone sagF rev TGCAACTAATCATC (and sagEF) into pIB184 G184-AntiE- CCCGCGGTACCCGGGTCATGTCACCTC Forward primer to clone fwd CTTCTTC antisense sagE into pIB184

185

Table A.7. (cont.) Reverse primer to clone G184-AntiE- agatctcgagctctagATGCCTTTGTCCATCCA antisense sagE-full into rev ATG pIB184 Reverse primer to clone G184- CCCGCGGTACCCGGGAATGTCATAAAT antisense sagE-150 into AntiE150-rev CAGAGTTGCTC pIB184 qRT-PCR Forward primer for qRT-PCR q16S-fwd GAGAGTTTGATCCTGGC on the 16S rRNA gene Reverse primer for qRT-PCR q16S-rev TTGCCGAAGATTCCCTA on the 16S rRNA gene Forward primer for qRT-PCR qM49-fwd TTAGTTTTCTTCTTTGCGTTTTAGA on emm49 Reverse primer for qRT-PCR qM49-rev CGAAGCTAAGAAAAAAGTAGAAG on emm49 Forward primer for qRT-PCR qSLO2-fwd GATGTGTTTGATAAATCAGTGAC on slo Reverse primer for qRT-PCR qSLO2-rev TCAGTTCTGTTATTGACACC on slo ATGTTAAAATTTACTTCAAATATTTTA Forward primer for qRT-PCR qsagA-fwd GCT on sagA Reverse primer for qRT-PCR qsagA-rev TATTTACCTGGCGTATAACTTC on sagA Forward primer for qRT-PCR qsagB-fwd ATGTCATTTTTTACAAAGGAACAA on sagB Reverse primer for qRT-PCR qsagB-rev ATTGACGATGACTTCTTCG on sagB Forward primer for qRT-PCR qscpA-fwd TTAGAAGATCGTTTCTCTAGAGTA on scpA Reverse primer for qRT-PCR qscpA-rev TACTGTTCCATTGAAAATGTCA on scpA Forward primer for qRT-PCR qska-fwd ATGAAAAATTACTTATCTTTTGGGAT on ska Reverse primer for qRT-PCR qska-rev CTTGCAAAATCAATGACCTC on ska Forward primer for qRT-PCR qspeB-fwd CTAAGGTTTGATGCCTACAA on speB Reverse primer for qRT-PCR qspeB-rev TGTTGGTATTTCAGTAGACATG on speB Forward primer for qRT-PCR qnga-fwd AAACAAAAAAGTAACATTAGCTCAT on nga

186

Table A.7. (cont.) Reverse primer for qRT-PCR qnga-rev GTATTTAACATCAGCCTTTGC on nga Forward primer for qRT-PCR qspd3-fwd ATGTCTAAATCAAATCGTCGT on spd3 Reverse primer for qRT-PCR qspd3-rev GTCATTTAAGCCGCTAAATTG on spd3 Forward primer for qRT-PCR qsagE-fwd GCCTTTGTCCATCCAAT on sagE Reverse primer for qRT-PCR qsagE-rev AATCGTCTCTTCAAATAAAGCT on sagE Forward primer for qRT-PCR qsagF-fwd TGCTATTGGTTTTGCTGT on sagF Reverse primer for qRT-PCR qsagF-rev ATACTTAAGACTGGTTTATCTTGT on sagF Gene Verification ATGGAAAATAATACTATATACAAATTA Forward primer for confirming ClosC-fwd AGCAATAATTTAA the presence of closC TTAAGAATTAATATCTTCTAAAATTTC Reverse primer for confirming ClosC-rev ATCCACAA the presence of closC ATGATAAAATTTTACCCAAGTTTTAAT Forward primer for confirming ClosD-fwd AATATATTAGAG the presence of closD Reverse primer for confirming ClosD-rev TTAAGGCATTGGGTGCG the presence of closD

A.10.25 Compound Syntheses 1H, 13C, and 19F NMR spectra were collected on Varian Inova 400 and 500 MHz 1 13 spectrometers. All H and C spectra collected in CDCl3 were referenced to an internal 1 13 tetramethylsilane standard. All H and C spectra collected in (CD3)2SO were referenced to the solvent peak. All 19F spectra were referenced to an external CFCl3 standard. High-resolution mass spectrometry (HRMS) data were obtained on a Micromass Q-TOF Ultima tandem quadrupole mass-spectrometer at the University of Illinois at Urbana-Champaign Mass Spectrometry Laboratory.

A.11 References 1. Maxson, T., Deane, C. D., Molloy, E. M., Cox, C. L., Markley, A. L., Lee, S. W., and Mitchell, D. A. (2015) HIV protease inhibitors block streptolysin S production, ACS Chem. Biol. 10, 1217-1226.

187

2. Arnison, P. G., et al. (2013) Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature, Nat. Prod. Rep. 30, 108-160. 3. Bagley, M. C., Dale, J. W., Merritt, E. A., and Xiong, X. (2005) Thiopeptide antibiotics, Chem. Rev. 105, 685-714. 4. Chatterjee, C., Paul, M., Xie, L., and van der Donk, W. A. (2005) Biosynthesis and mode of action of lantibiotics, Chem. Rev. 105, 633-684. 5. Maksimov, M. O., Pan, S. J., and Link, A. J. (2012) Lasso peptides: structure, function, biosynthesis, and engineering, Nat. Prod. Rep. 29, 996-1006. 6. Sivonen, K., Leikoski, N., Fewer, D. P., and Jokela, J. (2010) Cyanobactins-ribosomal cyclic peptides produced by cyanobacteria, Appl. Microbiol. Biot. 86, 1213-1225. 7. Melby, J. O., Nard, N. J., and Mitchell, D. A. (2011) Thiazole/oxazole-modified microcins: complex natural products from ribosomal templates, Curr. Opin. Chem. Biol. 15, 369-378. 8. Oman, T. J., and van der Donk, W. A. (2010) Follow the leader: the use of leader peptides to guide natural product biosynthesis, Nat. Chem. Biol. 6, 9-18. 9. Dunbar, K. L., Melby, J. O., and Mitchell, D. A. (2012) YcaO domains use ATP to activate amide backbones during peptide cyclodehydrations, Nat. Chem. Biol. 8, 569-575. 10. Dunbar, K. L., and Mitchell, D. A. (2013) Insights into the mechanism of peptide cyclodehydrations achieved through the chemoenzymatic generation of amide derivatives, J. Am. Chem. Soc. 135, 8692-8701. 11. Melby, J. O., Li, X., and Mitchell, D. A. (2014) Orchestration of enzymatic processing by thiazole/oxazole-modified microcin dehydrogenases, Biochemistry 53, 413-422. 12. Molloy, E. M., Cotter, P. D., Hill, C., Mitchell, D. A., and Ross, R. P. (2011) Streptolysin S-like virulence factors: the continuing sagA, Nat. Rev. Microbiol. 9, 670-681. 13. Cunningham, M. W. (2000) Pathogenesis of group A streptococcal infections, Clin. Microbiol. Rev. 13, 470-511. 14. Carapetis, J. R., Steer, A. C., Mulholland, E. K., and Weber, M. (2005) The global burden of group A streptococcal diseases, Lancet Infect. Dis. 5, 685-694. 15. Nizet, V., Beall, B., Bast, D. J., Datta, V., Kilburn, L., Low, D. E., and De Azavedo, J. C. (2000) Genetic locus for streptolysin S production by group A streptococcus, Infect. Immun. 68, 4245-4254. 16. Datta, V., Myskowski, S. M., Kwinn, L. A., Chiem, D. N., Varki, N., Kansal, R. G., Kotb, M., and Nizet, V. (2005) Mutational analysis of the group A streptococcal operon encoding streptolysin S and its virulence role in invasive infection, Mol. Microbiol. 56, 681-695. 17. Betschel, S. D., Borgia, S. M., Barg, N. L., Low, D. E., and De Azavedo, J. C. (1998) Reduced virulence of group A streptococcal Tn916 mutants that do not produce streptolysin S, Infect. Immun. 66, 1671-1679. 18. Fontaine, M. C., Lee, J. J., and Kehoe, M. A. (2003) Combined contributions of streptolysin O and streptolysin S to virulence of serotype M5 Streptococcus pyogenes strain Manfredo, Infect. Immun. 71, 3857-3865. 19. James, L., and McFarland, R. B. (1971) An epidemic of pharyngitis due to a nonhemolytic group A streptococcus at lowry air force base, N. Engl. J. Med. 284, 750-752. 20. Yoshino, M., Murayama, S. Y., Sunaoshi, K., Wajima, T., Takahashi, M., Masaki, J., Kurokawa, I., and Ubukata, K. (2010) Nonhemolytic Streptococcus pyogenes isolates that lack large regions of the sag operon mediating streptolysin S production, J. Clin. Microbiol. 48, 635-638.

188

21. Lee, S. W., Mitchell, D. A., Markley, A. L., Hensler, M. E., Gonzalez, D., Wohlrab, A., Dorrestein, P. C., Nizet, V., and Dixon, J. E. (2008) Discovery of a widely distributed toxin biosynthetic gene cluster, Proc. Natl. Acad. Sci. U.S.A. 105, 5879-5884. 22. Gonzalez, D. J., Lee, S. W., Hensler, M. E., Markley, A. L., Dahesh, S., Mitchell, D. A., Bandeira, N., Nizet, V., Dixon, J. E., and Dorrestein, P. C. (2010) Clostridiolysin S, a post- translationally modified biotoxin from Clostridium botulinum, J. Biol. Chem. 285, 28220- 28228. 23. Bernheimer, A. W. (1967) Physical Behavior of Streptolysin S, J. Bacteriol. 93, 2024- 2025. 24. Jack, R. W., Tagg, J. R., and Ray, B. (1995) Bacteriocins of gram-positive bacteria, Microbiol. Rev. 59, 171-200. 25. Todd, E. W. (1938) The differentiation of two distinct serological varieties of streptolysin, streptolysin O and streptolysin S, J. Pathol. Bacteriol. 47, 423-445. 26. Marmorek, A. (1895) Le streptocoque et le sérum antistreptococcique, Ann. Inst. Pasteur 9, 593-620. 27. Rasko, D. A., and Sperandio, V. (2010) Anti-virulence strategies to combat bacteria- mediated disease, Nat. Rev. Drug Discov. 9, 117-128. 28. Cegelski, L., Marshall, G. R., Eldridge, G. R., and Hultgren, S. J. (2008) The biology and future prospects of antivirulence therapies, Nat. Rev. Microbiol. 6, 17-27. 29. Baruch, M., Belotserkovsky, I., Hertzog, B. B., Ravins, M., Dov, E., McIver, K. S., Le Breton, Y. S., Zhou, Y., Cheng, C. Y., and Hanski, E. (2014) An extracellular bacterial pathogen modulates host metabolism to regulate its own sensing and proliferation, Cell 156, 97-108. 30. Pei, J., and Grishin, N. V. (2001) Type II CAAX prenyl endopeptidases belong to a novel superfamily of putative membrane-bound metalloproteases, Trends Biochem. Sci. 26, 275- 277. 31. Pei, J., Mitchell, D. A., Dixon, J. E., and Grishin, N. V. (2011) Expansion of type II CAAX proteases reveals evolutionary origin of gamma-secretase subunit APH-1, J. Mol. Biol. 410, 18-26. 32. Bergo, M. O., Ambroziak, P., Gregory, C., George, A., Otto, J. C., Kim, E., Nagase, H., Casey, P. J., Balmain, A., and Young, S. G. (2002) Absence of the CAAX endoprotease Rce1: effects on cell growth and transformation, Mol. Cell. Biol. 22, 171-181. 33. Bergo, M. O., Wahlstrom, A. M., Fong, L. G., and Young, S. G. (2008) Genetic analyses of the role of RCE1 in RAS membrane association and transformation, Methods Enzymol. 438, 367-389. 34. Plummer, L. J., Hildebrandt, E. R., Porter, S. B., Rogers, V. A., McCracken, J., and Schmidt, W. K. (2006) Mutational analysis of the ras converting enzyme reveals a requirement for glutamate and histidine residues, J. Biol. Chem. 281, 4596-4605. 35. Dolence, J. M., Steward, L. E., Dolence, E. K., Wong, D. H., and Poulter, C. D. (2000) Studies with recombinant Saccharomyces cerevisiae CaaX prenyl protease Rce1p, Biochemistry 39, 4096-4104. 36. Kjos, M., Snipen, L., Salehian, Z., Nes, I. F., and Diep, D. B. (2010) The Abi Proteins and Their Involvement in Bacteriocin Self-Immunity, J. Bacteriol. 192, 2068-2076. 37. Clayton, E. M., Hill, C., Cotter, P. D., and Ross, R. P. (2011) Real-time PCR assay to differentiate Listeriolysin S-positive and -negative strains of Listeria monocytogenes, Appl. Environ. Microb. 77, 163-171.

189

38. Kaldor, S. W., Kalish, V. J., Davies, J. F., 2nd, Shetty, B. V., Fritz, J. E., Appelt, K., Burgess, J. A., Campanale, K. M., Chirgadze, N. Y., Clawson, D. K., Dressman, B. A., Hatch, S. D., Khalil, D. A., Kosa, M. B., Lubbehusen, P. P., Muesing, M. A., Patick, A. K., Reich, S. H., Su, K. S., and Tatlock, J. H. (1997) Viracept (nelfinavir mesylate, AG1343): a potent, orally bioavailable inhibitor of HIV-1 protease, J. Med. Chem. 40, 3979-3985. 39. Albizati, K. F., Babu, S., Birchler, A., Busse, J. K., Fugett, M., Grubbs, A., Haddach, A., Pagan, M., Potts, B., Remarchuk, T., Rieger, D., Rodriguez, R., Shanley, J., Szendroi, R., Tibbetts, T., Whitten, K., and Borer, B. C. (2001) A synthesis of the HIV-protease inhibitor nelfinavir from D-tartaric acid, Tetrahedron Lett. 42, 6481-6485. 40. Ma, D., Zou, B., Zhu, W., and Xu, H. D. (2002) A short synthesis of the HIV-protease inhibitor nelfinavir via a diastereoselective addition of ammonia to the alpha,beta- unsaturated sulfoxide derived from (R)-glyceraldehyde acetonide, Tetrahedron Lett. 43, 8511-8513. 41. Viklund, H., Bernsel, A., Skwark, M., and Elofsson, A. (2008) SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology, Bioinformatics 24, 2928- 2929. 42. Manolaridis, I., Kulkarni, K., Dodd, R. B., Ogasawara, S., Zhang, Z., Bineva, G., O'Reilly, N., Hanrahan, S. J., Thompson, A. J., Cronin, N., Iwata, S., and Barford, D. (2013) Mechanism of farnesylated CAAX protein processing by the intramembrane protease Rce1, Nature 504, 301-305. 43. Markley, A. L., Jensen, E. R., and Lee, S. W. (2012) An Escherichia coli-based bioengineering strategy to study streptolysin S biosynthesis, Anal. Biochem. 420, 191-193. 44. Alouf, J. E. (1980) Streptococcal toxins (streptolysin O, streptolysin S, erythrogenic toxin), Pharmacol. Ther. 11, 661-717. 45. Caron, M., Auclair, M., Sterlingot, H., Kornprobst, M., and Capeau, J. (2003) Some HIV protease inhibitors alter lamin A/C maturation and stability, SREBP-1 nuclear localization and adipocyte differentiation, Aids 17, 2437-2444. 46. Coffinier, C., Hudon, S. E., Farber, E. A., Chang, S. Y., Hrycyna, C. A., Young, S. G., and Fong, L. G. (2007) HIV protease inhibitors block the zinc metalloproteinase ZMPSTE24 and lead to an accumulation of prelamin A in cells, Proc. Natl. Acad. Sci. U.S.A. 104, 13432-13437. 47. Coffinier, C., Hudon, S. E., Lee, R., Farber, E. A., Nobumori, C., Miner, J. H., Andres, D. A., Spielmann, H. P., Hrycyna, C. A., Fong, L. G., and Young, S. G. (2008) A potent HIV protease inhibitor, darunavir, does not inhibit ZMPSTE24 or lead to an accumulation of farnesyl-prelamin A in cells, J. Biol. Chem. 283, 9797-9804. 48. Quigley, A., Dong, Y. Y., Pike, A. C., Dong, L., Shrestha, L., Berridge, G., Stansfeld, P. J., Sansom, M. S., Edwards, A. M., Bountra, C., von Delft, F., Bullock, A. N., Burgess- Brown, N. A., and Carpenter, E. P. (2013) The structural basis of ZMPSTE24-dependent laminopathies, Science 339, 1604-1607. 49. Bernheimer, A. W. (1949) Formation of a bacterial toxin (streptolysin S) by resting cells, J. Exp. Med. 90, 373-392. 50. Schmittgen, T. D., and Livak, K. J. (2008) Analyzing real-time PCR data by the comparative C(T) method, Nat. Protoc. 3, 1101-1108.

190

51. Cotter, P. D., Draper, L. A., Lawton, E. M., Daly, K. M., Groeger, D. S., Casey, P. G., Ross, R. P., and Hill, C. (2008) Listeriolysin S, a novel peptide haemolysin associated with a subset of lineage I Listeria monocytogenes, PLoS Pathog. 4, e1000144. 52. Molohon, K. J., Melby, J. O., Lee, J., Evans, B. S., Dunbar, K. L., Bumpus, S. B., Kelleher, N. L., and Mitchell, D. A. (2011) Structure determination and interception of biosynthetic intermediates for the plantazolicin class of highly discriminating antibiotics, ACS Chem. Biol. 6, 1307-1313. 53. Ellermeier, C. D., and Losick, R. (2006) Evidence for a novel protease governing regulated intramembrane proteolysis and resistance to antimicrobial peptides in Bacillus subtilis, Genes Dev. 20, 1911-1922. 54. Ritchie, T. K., Grinkova, Y. V., Bayburt, T. H., Denisov, I. G., Zolnerciks, J. K., Atkins, W. M., and Sligar, S. G. (2009) Reconstitution of Membrane Proteins in Phospholipid Bilayer Nanodiscs, Methods Enzymol. 464, 211-231. 55. Biswas, I., Jha, J. K., and Fromm, N. (2008) Shuttle expression plasmids for genetic studies in Streptococcus mutans, Microbiology 154, 2275-2282. 56. Gantt, S., Casper, C., and Ambinder, R. F. (2013) Insights into the broad cellular effects of nelfinavir and the HIV protease inhibitors supporting their role in cancer treatment and prevention, Curr. Opin. Oncol. 25, 495-502. 57. Mitchell, D. A., Lee, S. W., Pence, M. A., Markley, A. L., Limm, J. D., Nizet, V., and Dixon, J. E. (2009) Structural and functional dissection of the heterocyclic peptide cytotoxin streptolysin S, J. Biol. Chem. 284, 13004-13012. 58. Chen, X. H., Scholz, R., Borriss, M., Junge, H., Mogel, G., Kunz, S., and Borriss, R. (2009) Difficidin and bacilysin produced by plant-associated Bacillus amyloliquefaciens are efficient in controlling fire blight disease, J. Biotechnol. 140, 38-44. 59. Deane, C. D., Melby, J. O., Molohon, K. J., Susarrey, A. R., and Mitchell, D. A. (2013) Engineering unnatural variants of plantazolicin through codon reprogramming, ACS Chem. Biol. 8, 1998-2008. 60. Hu, D. X., Grice, P., and Ley, S. V. (2012) Rotamers or diastereomers? An overlooked NMR solution, J. Org. Chem. 77, 5198-5202.

191