University of Groningen

Nonribosomal peptide synthetases Zwahlen, Reto Daniel

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA): Zwahlen, R. D. (2018). synthetases: Engineering, characterization and biotechnological potential. University of Groningen.

Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license. More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne- amendment.

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

Download date: 04-10-2021

Nonribosomal peptide synthetases: Engineering, characterization and biotechnological potential

Academic Thesis, University of Groningen, the Netherlands

ISBN: 978-94-034-0674-9 978-94-034-0673-2 (e-book) Printing: Eikon + Cover: Reto D. Zwahlen & Lovebird design. Layout: Lovebird design. www.lovebird-design.com

© R. D. Zwahlen, Groningen, the Netherlands, 2018 All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, without written permission of the author. Nonribosomal peptide synthetases:

Engineering, characterization and biotechnological potential

PhD thesis

to obtain the degree of PhD at the University of Groningen on the authority of the Rector Magnificus Prof. E. Sterken and in accordance with the decision by the College of Deans.

This thesis will be defended in public on

The 18th of May 2018 at 14:30 hours

by

Reto Daniel Zwahlen born on 24 July 1988 in Bern, Switzerland Supervisors Prof. A.J.M. Driessen Prof. R.A.L. Bovenberg

Assessment committee Prof. D.B. Janssen Prof. J. Raaijmakers Prof. L. Dijkhuizen For you, my love, my other half, my Tonia.

Table of contents

Chapter I Introduction 9

Chapter II Identification and characterization of nonribosomal peptide synthe- tase modules that activate 4-hydroxyphenylglycine 45

Chapter III A golden gate based system for convenient assembly of chimeric Nonribosomal peptide synthetases 71

Chapter IV Biochemical and structural characterization of the Nocardia lactamdu- rans L-δ-(α-aminoadipyl)-L-cysteinyl-D-valine synthetase 89

Chapter V An engineered two component nonribosomal peptide synthetase (NRPS) producing a novel peptide-like compound in Penicillium chrysogenum 117

Chapter VI Prokaryotic MbtH like proteins stimulate secondary metabolism in the filamentous fungus Penicillium chrysogenum 145

Chapter VII Summary and outlook 179 Deutsche Zusammenfassung 187 Nederlandse samenvatting 197

Appendices Acknowledgements 205 List of publications and patents 209

CHAPTER I

Introduction

Reto D. Zwahlen, Roel A.L. Bovenberg,2,3 and Arnold J.M. Driessen1,4

1Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 2Synthetic Biology and Cell Engineering, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 3DSM Biotechnology Centre, Delft, The Netherlands 4Kluyver Centre for Genomics of Industrial Fermentations, Julianalaan 67, 2628BC Delft, The Netherlands

In part to be published as a review in Frontiers in Microbiology, abstract accepted Abstract

Secondary metabolites represent a class of bioactive natural prod- ucts and have a long history for human application and are a cor- nerstone of medicine and chemistry where they have found wide- spread recognition. Modern medicine has among other factors been enabled through the discovery and therapeutical implemen- tation of secondary metabolites, such as antibiotics or anti-tumor agents and up until this day are a vital component to contempo- rary medicine. The origin of these products can be traced to plants, fungi and bacteria alike. The underlying biosynthetic clusters fre- quently contain a multiverse of distinct genes encoding a series of who interact in a specific manner to ultimately produce a particular . At the very core of such a cluster is predomi- nantly a complex and multi-modular molecular machinery or more precisely a nonribosomal peptide synthetase (NRPS). NRPS along with related enzymes represent an outstanding class of secondary metabolite producers which have been a focus point in the dis- covery and engineering of novel bioactive compounds for decades. This chapter will give an introduction to natural products, the asso- ciated enzymes and the evolution of accompanying tools for their discovery and engineering with a focus on NRPS. 1. Nonribosomal peptide synthetases (NRPS) and nonribosomal peptides (NRP) 1 Nonribosonal peptide synthetases (NRPS) are large, highly structured and complex enzymatic machineries, closely related to other modular enzymes such as polyketide synthetases (PKS), NRPS-PKS hybrid synthetase and fatty acid synthetases (FAS). They have certain distinct properties in common, the Introduction most striking one being their structural division in domains and modules, which is manifested in their shared evolutionary history (Smith and ­Sherman 2008). Every minimally consists of one module, a functionally dis- tinct unit, which allows for the recruitment and subsequent incorporation of a precursor into a growing product. Every NRPS module, initiation (1), elon- gation (n) or termination (1), requires a minimal set of domains [2]. The two domains essential to every module are the adenylation domain (A) and the non-catalytic thiolation domain (T). This tandem di-domain enables the spe- cific selection and activation of a given . However, the T-domain­ must first go through 4’-phosphopantetheinyl (PPtase) and co- enzyme A (CoA) dependent activation after expression, by transferring a phoshpantethetheine moiety to a conserved serine residue, in order to enter an active state. Also, adenylation domains (A) have accompanying factors, or proteins, called MbtH-like proteins (MLP) [3–4]. In contrast to PPtases, MLPs are merely interacting with the A-domain, however they do not have an intrinsic enzymatic activity, though rather a chaperoning function upon binding a distinct part of the A domain [5–7]. In addition to these domains, any elongation module will require a condensation domain (C), which con- nects two modules and links up- and downstream activated substrates via a peptide bond. C-domains are stereospecific for both, up- and downstream activated substrates and transfer the resulting intermediate compound to the downstream T-domain. Lastly, the C-terminal termination module es- sentially requires a thioesterase domain (Te), to catalytically release the co- valently bound compound of the NRPS, returning the NRPS complex to the ground state for another reaction cycle. In addition to these essential do- mains, we can distinguish a series of additional domains, performing epi- merization, halogenation, cyclization, macrocyclization, multimerization or methylation [8–10]. Domains as well as modules are clearly defined and evo- lutionary exchangeable structures amongst multi-modular enzymes. In the case of PKS and NRPS, this lead to the occurrence of a variety of NRPS-PKS hybrids [11–14]. A NRPS can be as simple as a single modular unit containing three domains, although the most complex and largest structure known con- tains 15 modules with 46 domains [15–16] yielding a 1.8 mD protein complex

11 (type I NRPS). Although the size of a NRPS, as well as the modular sequence, limits the size of an NRPS product, it is common that NRPS enzymes cluster and interact with tailoring enzymes in order to produce products of a higher complexity [17]. To enable such specific interactions, NRPS can contain small stretches of up to 30 amino acids at the C- or N-terminus, which form a rather specific recognition point, thus enabling communication (COM-domain) be- tween multiple NRPS of one cluster (type II NRPS) [18–19]. As a result of this multitude of functional dimensions within the NRPS architecture, and the increasing number of NRPS containing biosynthetic gene clusters there is a large growing reservoir of natural products with potentially new properties. NRPS are a crucial part of the secondary metabolism, which is restricted to different prokaryotic, fungal and certain higher eukaryotic species [20]. In comparison to most ribosomally derived peptides, nonribosomal pep- tides (NRPs) are low molecular weight products. The structural diversity of NRPs is tremendous, mostly due to their chemical complexity. Significantly contributing to this diversity is the fact that NRPS are not only reliant on proteinogenic amino acids only, but up until now more than 500 substrates were identified, which serve as NRPS building blocks [21]. These molecules are predominantly amino acids, but not exclusively, since fatty acids, car-

1. Nonribosomal peptide synthetases (NRPS) and nonribosomal peptides (NRP) and nonribosomal peptides (NRPS) synthetases peptide 1. Nonribosomal boxylic acids and others substrates have been reported in NRPs [20]. NRP thus represent a diverse group of natural compounds and occur as linear, -branched, circular or macrocircular structures [22–23]. The natural functions of NRP are as diverse as their structures. Signaling, communication, metal ion chelation, host protection are important functions occupied by NRP, though many compounds are not fully characterized in this respect. Never- theless, the characterization of natural products for applied purposes is far developed and led to a vast collection of groundbreaking pharmaceuticals, including antibiotics, anti-fungal agents, immunosuppressants as well as cy- tostatic drugs [23–25]. The importance of secondary metabolites and hereby also of NRPS, has led to a long history of research aiming at the discovery and understanding of novel NRP compounds and the underlying mechanisms thereof. Natural products and their extracts, including NRP, have been used ever since humanity discovered its beneficial effects. However, only after the discovery and widespread use of penicillins in the 1940s, a rapid expansion in this research field occurred, predominantly focusing on the discovery of novel compounds. After almost four decades , interest in the biochemical and mechanistic understanding of secondary metabolite pathways rose and a ri- bosome independent way of synthetizing peptides was demonstrated [26]. From then on, the interest in multi modular enzymes and NPRS, respectively, increased exponentially with the availability of novel analytical techniques.

1 12 The steady and progressive understanding of NRPS lead to the development of ideas regarding their engineering, potentially allowing for the production of novel compounds with new application spectra. Due to the complexity of 1 the modular assembly line, the utilization of a random mutagenesis approach proved to be unfeasible and its limitations have been shown. However, along with the growing understanding of NRPS and enzymes in general, new ways of directed engineering became available. From a systemic point of view, there are Introduction three levels on which engineering may take place, on modules, domains and domain sub-structures or active sites, respectively. Extensive efforts targeting the of A-domains has long been considered to be the sole bottleneck in NRPS engineering. Multiple studies confirmed that the substrate specificity of a NRPS A-domain can be successfully altered, however, at the cost of sub- stantially lowered catalytic velocity [27–28]. Similar successes and limitations were observed when natural domains were swapped or replaced by synthetic versions [29]. The most challenging way of obtaining novel NRPS, however, is the swapping or combining of entire modules [30]. Even though this approach seems logical as the module as a functional unit remains untouched, much like the other approaches, it does not take inter-modular interactions across a multi-modular NRPS into account. Directed NRPS engineering remains there- fore mostly enigmatic, but in the light of global challenges such as combating drug resistance and sustainable manufacturing, it remains to be a potentially important technology for the discovery and biosynthesis of novel drugs.

2. Evolutionary and functional relationships of NRPS

In spite of the evident relations of the multi-modular enzyme classes of NRPS, PKS and FAS, there are nevertheless functional and structural differ- ences including different places of involvement within the cellular metab- olism across these mega-enzymes. While NRPSs and PKSs are exclusively present in the secondary metabolism, FASs are present almost exclusively in the primary metabolism. The differences are not resembled in their respec- tive enzymatic topology, but rather in the corresponding products they syn- thesize. NRPSs and PKSs can produce compounds containing proteino- and nonproteinogenic amino acids, fatty acids, ketides among other substrates. However, the spectrum of FASs is considerably restricted to fatty acids [31]. This is further underlined by the classes of products formed, for example a majority of antimicrobials are either produced by NRPSs (penicillins, cepah- losprins, ramoplanin, enterobactin e.g.) or PKSs (tetracyclines, erythromy- cin e.g.) [32–33] but none by FAS.

13 Besides the differences shown above, there is a series of evident simi- larities between those enzymes, which is best represented in the light of their shared structural features. All of these enzymes can be sub divided into three different types of modules. Overall, product synthesis is universally initiated by activating, starting or loading an initiation module, followed by linear, non-linear or iterative product extension carried out by elongation modules, which may also apply modifications in cis. Ultimately, product fi- nalization and release is catalyzed by a termination module, which contains a release or thioesterase domain, respectively. FAS enzymes can further be divided into two fundamental classes. FAS I a multi-modular enzyme which is present in mammals and fungi, and FAS II, which is abundant in eubacteria and archaea. An analogue classification can be applied to PKSs, which can be divided into at least three distinct classes [33]. All three classes have a similar working mechanism, they all form a product through condensation 2. Evolutionary and functional relationships of NRPS of and functional relationships 2. Evolutionary of building blocks (mostly ketides for PKSs), much like the NRPSs and the FASs. While type I PKSs represent multi-modular enzymes, comparable to NRPSs or type I FASs, type II PKSs are better described as monofunctional multienzyme complexes much like type II FASs. Furthermore, type III PKSs can be distinguished from type I and II PKSs as they lack an acyl carrier pro- tein (ACP) to activate the acyl CoA substrates, thus relying on standalone en- zymes for this purpose [33]. Another notable difference in type III PKS is their mostly homodimeric setup, which is rather uncommon among these multi-­ modular enzymes. The many structural and functional similarities between these complex enzymes suggest that there is a tight evolutionary relation- ship between them. For example, the phylogenetic relation of the highly con- served ketoacyl synthase domains indicates that FAS I in animals evolved from PKS I of fungi. Those relations are also shown in the overall modular structure, which indicates that FASs and PKSs are, at least structural, more closely related to each other than to NRPSs [14]. Additionally, fungal FAS I presumably evolved through the combination of bacterial type II FAS [34], but all of these enzymes likely evolved from a long process of gene duplica- tion and fusion as well as gene function specification [35]. This universal pro- cess of gene fusion and duplication has additionally led to the formation of a diverse class of hybrid synthetases, referred to as NRPS-PKS hybrids [12–13]. The existence of such hybrids further outlines the evolutionary picture of the close NRPS-PKS relationship, leading to functional enzymes which are highly active during the bacterial and fungal secondary metabolism. Further details elucidating structural and functional features of NRPS will be described in the respective sections of the subsequent paragraph.

1 14 3. NRPS domain function and biochemistry

3.1. Adenylation and thiolation domains 1

Any NRPS module minimally consists of an A- and T-domain, enabling single module functionality and multi-modular functionality upon addition of C do- mains [8;36–37]. A-domains are roughly 500–550 amino acid in size and in Introduction general determine the substrate specificity of a module due to their specific substrate recruitment [38–41]. They are often referred to as “gatekeeper” domains, as there is no subsequent product formation without prior ade- nylation and thioesterification of a substrate [42] (Figure 1). Bioinformatic prediction of the precise specificity of an A-domain is a delicate and still er- ror prone process. However, a series of 10 core motifs (A1–A10), essential to all NRPS A-domains have been elucidated and in part functionally assigned [43–44]. Based on this sequence motif, a core or “Stachelhaus” code has been determined, which covers 10 positions mainly across the A4 and A5 motifs and describes the essential residues that determine the adenylation speci- ficity [39]. Most of the identified motifs, core motifs and ­Stachelhaus code, both predominantly cluster in and around the active site of the A-domain,­ but not exclusively [45–6]. Furthermore, any A-domain can be further sub- divided into two distinct parts, the Acore and Asub part [47]. This division is mainly across catalytically inactive parts, however, due to its dynamics, still strongly affects the active site specificity [48]. In addition to the 10 core motifs, a 11th has been proposed, located at the junction between the Acore and Asub parts, seemingly fulfilling an essential hinge function, manifested in the conserved LPxP motif [49] Structural flexibility is crucial in the se- quential tandem reaction of the A-domain. The two core functions of the A-­ domain are characterized first, through the hydrolysis of ATP or adenylation, allowing an AMP-substrate conjugate to be formed, which is subsequently transferred to the free thiol group of the 4‘-phosphopantetheinyl-moiety (ppant), which is anchored to a conserved serine residue in the downstream T-­domain [50–52]. This thioesterification is guided by a structural rearrange- ment of the Asub-domain, which itself undergoes a rotation of about 140° [53–54] relative to the Acore-domain. This process is dynamically well con- served and can also be observed in other enzymes of the adenylate-form- ing enzyme superfamily (ANL), such as acyl-CoA synthetases and lucifer- ases [55], where this mechanism has been primarily characterized [53;56]. However, not only the dynamics, but also the structure among this family of enzymes appears conserved in a relatively strict manner. Especially the small Asub domain seems to bear an extremely important function, not only

15 A A Asub Asub

Asub Asub Acore T Acore T Serine core A T GGxS Acore T OH O HO P Serine O O GGxS OH H3C O H2N HO PCH N sfp HO 3 O O O O N O H3C O O HS O O P O P N NH NH N H C O H N 3HN CH3 OH OH 2 HO sfp CH3 O O O O N HO H3C N O HS O O O P O P OHN O NH NH N O P OOH HN CH3 OH OH HN HO Coenzyme A HO

O OH O H2N SH O P OH HN N Coenzyme A HO O N

O P O N N O H N OH 2 SH N 3. NRPS domain function and biochemistry 3’,5’-ADP O N O P O O OH N N O P OOH OH 3’,5’-ADP HO O OH O P OH

HO

B AA ATP H2N

O N O O O N 1 R B AA HO P O P O P O OH ATP N N O H N OH OH OH 2 NH2 O N O O O N 1 R HO P O P O P O OH OH OH N N O OH OH OH NH2 0 Adenylation 140 Thioesterification OH OH

H2N O 0

Adenylation N 1 O N 140 R Thioesterification O P O N N O H N OH 2 NH2 sub sub O A A N 1 O N R O P O OH OH N N O OH Asub NH2 Asub

core AMP-AA core

H OH OH A S A T T

core AMP-AA core H O A S A

T OT O

O S O

NH2 H N 1 2 R O O N O N S HO P O P O O O O P N OH OH N NH2 O H N 1 OH 2 R O O N O N HO P O P O O P O OH OH OH OH N N O OH PPi AMP

OH OH Figure 1 — Adenylation and thiolation domainPPi functionality. AMP (A) Sfp chaperones the transfer of the phosphopantheteine moiety from CoA onto a con- served serine residue in the thiolation domain (GGxS). The NRPS in the activatedholo -state subsequently recruits free amino acids through the adenylation domain (B), which forms an amino acid — AMP adenylate intermediate, ultimately leading to the transfer of the ade- nylated amino acid onto the ppant-arm in the thioesterification conformation.

1 16 in chaperoning the transition between the adenylation and thioesterification sub-steps within the A domain, but also in respect to the displacement of the T domain [48;53;57]. The 4 α-helix containing T domain needs to partially 1 penetrate the A domains active site in order to link the activated substrate to the carrier ppant arm and subsequently donate it to the downstream domain. This process is indivertibly linked to the precise movement and action of the upstream Asub domain. Introduction

3.2. Condensation and epimerization domains

Condensation (C) domains are approximately 450 residue NRPS domains, representing a highly versatile class of NRPS domains Any NRPS composed of more than one module must consequently contain at least one C-domain. However, also single modular NRPS may contain C-domains, especially if they cooperate with other NRPS. Essentially, the primary target of a C-­domain is the condensation of the up- and downstream activated substrates through a nucleophilic attack, mainly leading to the formation of a di-­peptide linked via a peptide bond (Figure 2). Nonetheless, several residues of the C-domain may have the intrinsic potential to fulfill multiple functions. The biochemical spec- tra of these pluripotent domains includes double functions of condensation and epimerization, N-methylation, cyclization or serine-dehydration [58–60]. C domains are classified among the chloramphenicol acetyltransferases (CAT), sharing highly conserved CAT-like folds in their structure. Nonetheless, as op- posed to the commonly observed trimer formation of CAT enzymes, C- and C-like domains are monomeric and resemble a pseudo-dimeric V-like struc- ture harboring the active site at its center [61]. In order to gain access to the up- and downstream activated substrates, the active site is accessible through two tunnels, which ultimately connect at the active site in the vi- cinity of the essential C-domain motif HHxxxDG [20]. The second histidine of this motif is essential for the peptide bond formation, as it appears to be involved in the orientation of metabolic intermediates, thus preventing the dissociation of the substrates from the active site. This function can also be seen as a proofreading step, which eliminates incorrect intermediates from the assembly line. Epimerization or E domains are among the most abundant modification domains intrinsic to NRPS. Interestingly, they are structurally closely related to C domains which is reflected in their pseudo-dimeric structure as well as their active site position and composition [61–62]. In contrast to C do- mains however they are responsible for the site specific epimerization of a

17 A M1 (A-T) M2 (C-A-T) M3 (C-A-T-Te)

Asub Asub Asub

Acore T D C A Acore T D C A Acore T

O O Te O

O S S S NH2 2 R O O NH2 1 NH2 R 3 B R 3. NRPS domain function and biochemistry

Asub Asub Asub

Acore T D C A Acore T D C A Acore T S

O

O O Te

S S O O 2 NH R 2 3 NH R C O NH2 1 R

Asub Asub

core D A core A T C A T O 1 R

HO S NH O

O Te O

2 NH R

H2N O S 3 R O R NH O R NH O N 1 R

Figure 2 — Linear assembly of a tripeptide on a NRPS. (A) Following the successful transfer of all required amino acids onto their respective thio- lation domains, they are handed to the up- or downstream condensation domain, starting with the N-terminal module, in linear acting NRPS. (B) The resulting di-peptide remains covalently attached to the downstream T-domain, allowing for the poly-peptide intermedi- ates to be extended with additional amino acids. (C) The final product is ultimately released through a thioesterase domain, laying on the C-terminus of the NRPS.

1 18 substrate, predominantly carrying out this function after peptide bind for- mation has occurred. Furthermore, the active site containing a central his- tidine among other bulky residues, allows it to fulfill a similar proofreading 1 function in the metabolite production as seen in C domains, as it may be aiding in the process eliminating incorrect intermediates [8]. This process is crucial in the production of active compounds, since even minor alteration to the stereochemistry of a compound can render it completely inactive. Lastly, Introduction despite only few available structures, biochemical studies [36;61–65] have shown clearly that these domains exhibit considerable specificity and are in a fine balance with the adjacent domains, which is further underlined with double C-E functional domains.

3.3. Thioesterase and other domains

In addition to A-, T- and C-domains, which are essential to any multi-modular assembly line, there are additional domains enabling on-line product mod- ification and most importantly product release, thus allowing for another catalytic cycle. Ultimately, alterations applied by intrinsic NRPS domains, as well as product release related structural modifications lead to either partial or full functionalization of NRPs or allow for product recruitment through in- trans acting tailoring enzymes.

3.3.1. Product release and functionalization

Thiotemplate based enzymatic systems rely on a catalytic activity in order to remove a product or product-scaffold of the primary enzyme. Therefore, most NRPS contain a domain on their C-terminus responsible for precisely this purpose, the thioesterase domain (Te). Te-domains are a common com- modity in single and multi-modular NRPS, although, in multi-NRPS systems only the terminal NRPS carry this domain. For some assembly lines, type II thioesterases fulfill this function representing a standalone alternative to intrinsic domains. Fundamentally, three different release mechanisms can be distinguished, linear product hydrolysis, intra- or inter-molecular cyclization or substrate multimerization. All reactions underlie a series of sequential steps in order to allow for proper compound release, sharing an intermittent step in which the product is covalently linked to a conserved serine residue in the Te-domain ser-his-asp motif [9]. This mechanism is similar to T-domain linkage, though in contrast to the T-domains, Te-domains bear an intrinsic

19 catalytic activity. This represents a special case among NRPS domain and is highly related to serine mechanisms. Structural data suggests that product transfer from the T-domain onto the serine of the Te-domain leaves a significant amount of free space within the active site tunnel which in turn allows for complex compound rearrangements. Basic linear hydrolysis ap- pears to be the most unspecific form of product release but it is also prone to unspecific reactions, such as the release of unfinished product intermediates or single substrates from the upstream T-domain. Nonetheless, it represents an efficient way of product release and can be observed in the production of the penicillin precursor ACV [66] or feglymycin [67–68]. Cyclization reactions on the other hand seem to be carried out in a more directed and specific way. The higher reliability of these reactions can mainly be attributed to the 3. NRPS domain function and biochemistry increased coordinative need in order to accommodate a cyclization of a long, charged and/or bulky substrate. An evident case, where this proofreading or gate-keeping is performed by the Te-domain is the β-lactam­ antibiotic nocar- dicin. The product of the two NRPS NocA and NocB, undergoes monocyclic β-lactam ring closure, where the Te-domain serves as a gatekeeper, hold- ing the product until ring closure has been finalized [69–71]. Similarly, strin- gent control mechanisms can be seen in macrocyclization as well as in mul- timerization reactions, as for example with the production of gramicidin S. Overall, the perspective on Te functionality has been gradually developed, revealing a both dynamic and specific domain, which serves as a catalyst to a wide range of highly time dependent product functionalization and release reactions [9;70].

3.3.2. Intrinsic product modifying domains

The focus of domain identification and characterization has long focused on the previously established and described domains with in-detail studies on their linked mechanisms. However, recent studies have not resolved com- plex sequential interactions and catalytic cascades, but have also deepened our understanding of the mechanistic roles of the underlying domains as well as their characteristics, respectively. In addition to the C-domain re- lated Epimerization domain, discussed previously, there are cyclization (Cy), oxygenation (Oxy) as well as methyl-transferase (MT). These domains have been characterized to the extent of classifying their functions, although, es- pecially Cy- and Oxy-domains may occur as a singular bi-functional unit or in a serial manner, respectively [72]. The aforementioned domains are in- tegral parts of a NRPS assembly line, in contrast to recruited domains and

1 20 enzymes, which are discussed at a later point. Their intrinsic nature indi- cates not only that the reactions performed are of utmost importance with respect to product functionality, but also points towards a very archaic and 1 conserved function of the domains. Cy- and Oxy-domains in fact, specifically replace the classic function of C-domains, omitting amino acid condensa- tion through peptide bond formation, resulting in thiazoline, oxazoline or methyloxazoline structures [72–73]. Those reactions predominantly occur in Introduction siderophore producing NRPS and rely on the presence of serine, threonine and residues [74–75]. Recently, a more detailed, structure based comparison of a Cy- with C-domains delivered further support on the com- mon ancestral history of the two domain types, shown in the overall con- served scaffold features and accentuated differences clustered across the active site of the domains, respectively [76]. Also, MT-domains follow the common di-sub-domain­ structural patterning, which is also seen in A-, C-, E- and Cy-domains. Fundamentally, MT-domains, however, are more restricted in their functional spectra, which covers the transfer of methyl-groups from S-adenosylmethionine to N (N-MT), C (C-MT), O (O-MT) or for certain resi- dues S (S-MT) atoms resolving around the amino acids Cα carbon [77] and in case of S-MT Cβ, respectively [78]. In NRPS the most abundant MT specificity observed and described is N-methylation, although an increasing number of domains have been proposed which carry out C-methylations. No conclusive O-­methylation activity has been shown up to date [10].

4. NRPS multi-modular structures and dynamics

In contrast to the biochemical characterization of NRPS domains as well as entire assembly lines, there is considerably little structural insight available. Even though all types of domains have been structurally characterized by means of X-ray crystallography or NMR, the elucidation of distinct conforma- tional states of a given domain remains a challenging task. This is especially a limiting factor with respect to the few multi-domain structures which have become available, as it limits the implications that can be derived from such novel structures. Furthermore, due to the countless conformational and dy- namical changes NRPS modules and domains undergo, it has been impos- sible so far to retrieve any structure of an entire multi-modular NRPS as- sembly line. Nonetheless, on basis of the spatial knowledge obtained in the process of module and domain characterization, efforts have been made to describe the catalytic sequence, timing and dedicated dynamics.

21 A A B B M3 M3

sub sub Acore Acore A D C A A D C A Acore Acore Acore Acore

Asub Asub Asub E Asub E T T T T Te Te T T M2 M2 T Te T Te D C A D C A E CE C

Asub Asub T T Asub Asub Acore T Acore T Acore Acore

Asub Asub D C A D C A

Acore Acore M1 M1 4. NRPS multi-modular structures and dynamics structures 4. NRPS multi-modular

Figure 3 — Potential structural arrangement of a tri-modular NRPS. Shown in A is the potential side view of a three modular NRPS on top and an EM image on the bottom. B displays the top view of the same NRPS along with the view of the particle through EM on the bottom. Both schemes are adapted from [81].

4.1. NRP synthesis and the associated conformational implications

Any given NRPS linked product synthesis begins with the adenylation of incoming amino acids, aided by an A-domain in its initial adenylate form- ing conformation. The subsequent rotation of the C-terminal Asub-domain, allowing to adopt the thioester forming conformation appears not only to be the limiting step in product formation [53;55], but furthermore guides the ppant arm into the active site tunnel of the downstream C-domain [79]. This finely timed and interlinking conformational changes are further man- ifested in the domain architecture, as there are dedicated non-conserved areas in the T-domain, which appear to bear dedicated functional relation- ships on a structural level with the A- and C-domains up- and downstream

1 22 within the assembly line, respectively [62]. The clockwork like intertwining domain-domain interactions continue beyond the borders of a given module as they appear to extend into the downstream modules and their C-domain. 1 This is proposed, as C-domains, representing the first or N-terminal domain of a module, appears to be depending on the movement of the upstream T-domain in order to rearrange into conformation and proximity, ultimately allowing for the formation of a peptide bond [48;80]. The fundamental dy- Introduction namics of how a polypeptide is synthesized within the framework of a NRPS seems to be a common although also highly case dependent mechanism, but ultimately should lead to a transferable concept to all NRPS. However, taking product release and functionalization (as discussed in part III) as well as the multitude of additional, internal and external modifying domains into this greater dynamic scheme, the situation becomes infinitely more complex as a result of the great number of different NRPS architectures and involved factors. The overall interactions can only be conclusively determined with novel, multi-modular structures of NRPS, which are non-existent as of this moment. Nonetheless, there have been attempts at creating multi-modular models, based upon the crystal structures of single NRPS domains and the interfaces known through the solution of whole module structures. Most notably the model proposed by Mahariel [81] has shed light on this issue and underlines the high level of organization within multi-modular NRPS, leading to the standing theory of a helix like domain setup, resolving around centrally aligned T domains (Figure 3).

5. NRP or NRPS associated factors

In order to create the diversity and complexity of compounds observed in NRPS, a series of accompanying factors are required in addition to the in- trinsic NRPS domains. Two main actions can be distinguished within these factors, NRPS chaperoning or activation and compound targeting elements. The first category includes enzymes which are either strictly essential such as the phosphophantetheiyl transferase sfp or may enable or improve cata- lytic properties as it is the case for the MbtH-like proteins (MLP). The latter group of factors ultimately target the growing NRP and may act during ca- talysis, such as factors which alter a compound during the NRPS associated state or once the compound is released from the assembly line. The latter concern the tailoring enzymes, as they vastly contribute to the very diverse structures observed in NRP.

23 5.1. Tailoring enzymes

Secondary metabolites exhibit diverse scaffolds, shown as linear, side- branched, circular or macro-circular structures which is determined by the ability of the NRPS to generate these compounds in a linear or iterative pro- cess. There is a series of site directed alterations which have been evolution- ary fine-tuned to enable more potent and specific properties. Certain residue alterations and rearrangements can be attributed to intrinsic NRPS domains (see section III), though many functions are absent of the NRPS assembly lines and require external tailoring enzymes. Genes encoding tailoring en-

5. NRP or NRPS associated factors or 5. NRP zymes may be clustered among the NRPS gene clusters, though many are derived from other pathways [82]. Two main interactive product alteration mechanisms can be distinguished, NRPS-associated and NRPS independent, or on-line and off-line, respectively. On-line modifications target either sub- strates or compound intermediates, which are in a covalently-linked state with the host NRPS assembly line and can be essential for product forma- tion [73]. This type of modification is either carried out by an intrinsic domain or more relevant in this section, a standalone tailoring enzyme or domain. Recently, the recruitment of cytochrome P450 enzymes has gained signif- icant attention [60;83–85]. Genes of oxygenases and especially P450 en- zymes frequently cluster among NRPS genes and the resultant enzymes have been shown to carry out substrate modifications, for example in skyllamycin [86] where a single oxygenase performed three hydroxylation reactions [87]. More interestingly, NRPS association and substrate modification appears to be highly selective and it has been shown that the T-domains play a crucial role in P450 recruitment [83;88]. Another series of experiments targeting te- icoplanin, a glycopeptide antibiotic, revealed that not only T-domain target- ing is used for P450 selection, but also a novel type of domain, the X-­domain, which appears to be closely related to C-domains, although with an active site rendered unusable for peptide bond formation. The example is even more outstanding as this reaction requires four dedicated oxygenases [60]. Another modification essential for product activity is the halogenation of residues. Halogenation may be carried out on-line or off-line [89], but always requires a standalone halogenase. The most abundant form of halogenation appears to be the chlorination of benzoic ring structures as it is seen with te- icoplanin, enduracidin or ramoplanin [17;90–91]. Moreover, there is a series of modifications, namely glycosylations, acylation and sulfation, which are generally restricted to off-line mechanisms. Glycosylations are highly abun- dant for many NRPS products, especially glycopeptides. They are often not essential to the biological activity of the molecule, though they may improve

1 24 solubility as well as stability of a compound [92]. In contrast, neither acy- lation nor sulfation appear to be a widespread phenomenon. Acylation is attributed to a new class of N-acyltransferases, which were only recently 1 classified and structurally analyzed [93]. Sulfation reactions are equally rare and associated sulfotransferases are rather uncommon, however the records have been expanded steadily in the past few years [82]. Introduction

5.2. MbtH-like proteins (MLP)

Another potentially essential factor in the line of NRPS compound produc- tion is the MbtH-like protein (MLP). The MLP is currently the only catalyti- cally inactive interaction partner know for NRPS, and they have been rather systematically analyzed. MLP proteins are ~70 amino acid peptides arranged in three β-sheets, and in most cases one N-terminal α-helix [94–95]. Through association with the A-domain of an NRPS the MLP appears to increase the kinetic properties of the interacting A-domain. In addition, they may even improve NRPS expression and translation levels and rates, respectively, us- ing a non-characterized mechanism [5]. Potential interaction sites have been characterized [6;97]. It appears that the MLP associates in a dynamic way with the rigid Acore-domain, and to a lesser extent via weak intermolecu- lar interactions with the Asub-domain. Available structures clearly show that three MLP borne conserved tryptophan residues align with two alanine res- idues on the Acore-domain. Based on a series of studies, investigating the transferability of MLP homologues between different prokaryotic organisms and strains, a minimal MLP functionality motif was created [95;98]. Although structures have been determined and important residues have been identi- fied, there appears to be no report of MLP presence in eukaryotic organisms, despite the widespread occurrence of NRPS across most sequenced prokary- otic organisms [3].

5.3. Phosphopanthetheinyl- (PPTases)

PPTases form a distinct enzymatic family and are essential to the function- alization of NRPS, FAS as well as PKSs. Fundamentally, they covalently link the CoA derived phosphopantheteine moiety onto Carrier proteins of the aforementioned megasynthetases. They target NRPS thiolation domains or FAS and PKS acyl carrier proteins (ACP) with various specificities. Overall PPTases can be divided into three distinct sub-types, (I) ACP specific holo-ACP

25 synthase (AcpS) targeting FAS and PKS, (II) Surfactin-phosphopantetheinyl transferase (sfp)-type NRPS specific PPTases and (III) the FAS-fused domain like megasynthase PPTases [99]. All three classes appear structurally diverse. Type I PPTases form a homotrimer of about 360 aa containing three active sites at the interfaces of the trimer. In contrast, type II structures are mo- nomeric 240 aa enzymes, although containing two equally sized domains, which resemble a pseudo-dimer containing a single active site. Based on the available crystal structures as well as primary sequence analysis, it is likely that type II PPTases arose through gene duplication and fusion from the archetypic type I PPTases. Type III PPTases however, appear to have very unique structural and multimeric states, resembling significantly more com- plex structures of homo-hexamers as well as hetero-dodecamers in a size range of 0.54–2.6 mD [100–101]. For NRPS related work however, only type II PPTases are of importance, as they are the only one which exhibit a distinct specificity towards T domains, while retaining sufficient promiscuity to target different NRPS T domains as well as CoA similar analogues, respectively. Type II PPTases are ubiquitously found across prokaryotic as well as eukaryotic species and often cluster among NRPS genes, although one copy is predominantly sufficient for one organism. Secondary metabolism is heavily impaired upon deletion of type

6. Analytical tools and methods for NRPS and NRP identification and characterization NRPS and NRP tools and methods for 6. Analytical II PPTases and purification experiments under PPTase non-overexpressing­ conditions showed extremely low PPTase and NRPS expression levels, fur- ther hinting at the essential function, high affinity and activity of this PPTase type. Overall, PPTases require a significant amount of experimental atten- tion, as neither conformational dynamics, specificity prediction nor the mul- timeric state are clarified at this point, though the utilization of broad spec- tra type II PPTases such as sfp are sufficient to bypass major bottlenecks in the design and engineering of NRPS based pathways [102].

6. Analytical tools and methods for NRPS and NRP identification and characterization

6.1. Metabolite and product discovery

The most direct route to discover NRP is a functional screening, as it has been applied ever since the early 1930s for the discovery of first generation antibi- otics and the subsequent screening of improved producer strains [103–105]. Exploiting the intrinsic properties of a product however is equally elegant, as it is limited to targets which have a considerably high potency. This has been

1 26 established for antimicrobials, where sensitive indicator strains are used in combination with halo based plate or broth assays for example [106–109]. A similar concept can be applied to the discovery of pigments such as indigoidine, 1 as changes in the color of a media can be readily determined with spectropho- tometrical devices [110–114]. In addition to the mere discovery, pigment pro- ducing NRPS such as IndC of the indigoidine pathway, have been successfully employed as biosensors for combinatorial engineering efforts [29]. Further- Introduction more, associated strategies relying on the ability of the host to produce scav- enger molecules such as siderophores enable the utilization of growth prop- erties as a phenotype to evaluate suitable targets. In addition to the intrinsic properties a series of combined reactions, using indicator substrates or sec- ondary catalytic reactions significantly contribute to the broad range of bio-­ assay applications. This type of assay may also make use of binding proteins as shown for the penicillin binding protein or analogous mechanisms [115–117]. However, despite the potential for high throughput and quantitative screening, this method is merely applicable to a minority of desired compounds, often be- cause of their low abundance or the lack of a directly or indirectly observable effect. This presents an even more eminent problem with anti-cancer drugs, as a tremendous selection of indicator strains must be evaluated in order to near conclusively determine their potential. Another very direct and versatile technique to determine the presence of secondary metabolites is the analysis of supernatant or other types of sam- ples containing excreted metabolites is mass spectroscopy, coupled with liq- uid chromatographic techniques (LC/MS). Even though the technique per se is not novel, recent developments were aiming at methods which allow for a more sensitive and overall inclusive detection. An additional advantage is that not only fully functionalized products, but also side-products, and metabolic intermediates can be analyzed and characterized. Though the increased depth of analysis also increases the time investment, limiting it to low-medium­ throughput screening. However, with ever decreasing costs for analytics and the parallel development of more reliable and sensitive techniques, this ap- proach poses a serious alternative especially for large scale users. Nonetheless, all these methods may be useless for very low abundant metabolites as well as any silent biosynthetic gene cluster. Though these metabolites potentially account for a significant amount of total natural products and should not be neglected. Luckily there are several ways for the improved or induced production of low abundant or cryptic metabolites, respectively. Based on the identification of promising biosynthetic gene clus- ters, discussed in the following paragraph, approaches targeting promoter sequences or accompanying factors such as PPTases or MLPs may aid in

27 increasing or triggering the production of novel metabolites. Lastly, in case of partially understood pathways, the simple feed of low abundant substrates can improve the yield of a given pathway product.

6.2. Bioinformatic and site-directed tools

The discovery of conserved and highly specific motifs as part of many func- tionally specific NRPS and NRPS domains has laid the fundament for the de- velopment of specific discovery tools. NRPS are frequently encoded in large genes with very dedicated genomic features. The combination with contin- uously more affordable genome sequencing techniques, allows for the rela- tively simple identification of putative biosynthetic gene clusters, based on their DNA sequence [118–123]. Due to their modularity and dedicated func- tional domains, NRPS can be identified through homology based searches, which allow for the identification of NRPS by means of domain sequence and more precisely using the conserved active site motifs for A (A1–A10 core mo- tifs), T (GGxS), C (HHxxDG) domains [124–126]. Furthermore, product modi- fying domains may be identified in an analogue manner, allowing to predict the full functionality of a NRPS based upon the primary DNA sequence. This

6. Analytical tools and methods for NRPS and NRP identification and characterization NRPS and NRP tools and methods for 6. Analytical prediction can be further taken into account for the prediction of NRPS spec- ificities and thus also allow for the determination of hypothetical products and product intermediates. Other NRPS-associated hallmarks, such as MLP, PPTases or tailoring enzymes contribute to a more global cluster identifica- tion. In fact, the latter factors, especially type II sfp-like PPTases, are a strong global indicator of functional secondary metabolite clusters, giving indica- tion of secondary metabolite production by an organism Even a step further, towards the discovery of specific parts and their par- tial isolation is the development of domain specific DNA/peptide signature sequences, allowing to identify NRPS parts at proteomic level [127–128]. Such probes generally consist of two parts, one part which associates with a specific domains active site and a second part which allows for the detection of the domain-probe conjugate [129–130]. The domain specific parts may tar- get an amino acid or an adenylated substrate for A and T domains respec- tively, including non-natural substrates. The indicator entity of this probe on the other side may be an affinity tag, i.e. biotin, 6×his, a fluorescent probe or a radiolabeled group [129;131]. A combinatorial approach using these probes along with electrophoretic methods, such as capillary electrophoresis or 2-D gels, represents a powerful tool to obtain direct proof of an organisms po- tential to express a specific NRPS. Lastly, this approach additionally allows

1 28 for the extraction of specific NRPS domains, which may subsequently be used for dedicated engineering efforts. 1 7. Engineering of NRPS: targets, strategies and challenges

NRPS are highly structured and multi-facetted enzymes, bearing a tremen- Introduction dous potential in the exploitation of their scaffold for the generation of novel, bioactive compounds. However, due to the complexity of all interactions within these mega enzymes, the elucidation and implementation of engi- neering strategies is an extremely challenging task. Several strategies have been developed and applied with different degrees of success, though the overall approaches can be grouped as module, domain, sub-domain or site di- rected, respectively. All of these strategies have their inherent difficulties, -ad vantages and disadvantages in respect to the complexity and success rate of NRPS engineering efforts and will next be highlighted with selected examples.

7.1. Subunit, module and domain swapping

Due to the strict arrangement of NRPS in domains and modules, the pos- sibility of exchanging a sub unit appears to be the most straight forward approach for altering the intrinsic properties of a NRPS. A series of studies targeting the enzymes linked to the production of daptomycin [132–133] elu- cidated the possibilities and borders of a combinatorial swapping strategy in light of novel compound production. The daptomycin biosynthetic cluster comprises of three NRPS containing a total of 13 modules for the incorpora- tion of an equal number of substrates. Different levels of domain and mod- ule swap approaches were followed, starting with the exchange of modules 8 and 11 (C-A-T), representing an internal module exchange. The resulting NRPS exhibited the production of novel daptomycin compounds with an in- verted amino acid composition at the predicted sites at a near native rate. However, if the less homologous epimerization domain (C-A-T-E) is taken into this swapping approach, the production levels drop under 1 % of the original activity, further illustrating the importance of precise inter-modular interactions [132]. Subsequent experiments targeting the third daptomycin NRPS DptD, focused on the replacement of the entire protein with highly homologous units of the calcium dependent antibiotic (CDA) and A54145 pathways, respectively. Despite significant production levels of novel com- pounds, no increased anti-microbial spectra nor activity was unraveled. It

29 appears that all hypothetical lipopeptides which may result from the given NRPS have been through extensive evolutionary selection, thus leading to daptomycin as the most suitable compound for the host organism. A simi- lar combinatorial approach has been chosen for the investigation of pyover- din and potential analogues, resulting in more moderate production rates of analogue compounds, likely due to a more disruptive engineering strategy in respect to the conserved inter-modular interactions and interfaces [134– 135]. A different approach on the improvement of a NRPS activity was taken with the indigoidine syntehase IndC. In contrast to the majority of studies targeting single domains for exchange purposes, not the A- or C-domain was chosen, but rather the significantly smaller T domain [29]. In a remarkable effort, not only functionally comparable NRPS variants with homologous T-domains from other organisms were created, but additionally, synthetic T-domains were successfully utilized to create novel NRPS which surpass the native kinetic potential of IndC. Overall, the choice of a particular approach 7. Engineering of NRPS: targets, strategies and challenges strategies NRPS: targets, of Engineering 7. must be tailored to the availability of homologous parts and if applicable, a series of domains, or an entire module should remain intact in order to circumvent the alteration of crucial inter-domain and module interactions. Nonetheless, combined computational and swapping approaches, may bear a significant potential for the improvement of NRPS, but are still heavily lim- ited by the lack of structural and experimental data.

7.2. Sub-domain exchange and directed evolution

Next to engineering efforts to the domain and module borders, there is a series of approaches which directs attention to a range of sub-domain parts, including active sites, binding sites as well as even more minimal changes. While even partially directed approaches still require the generation of enor- mous mutant libraries, a recent study has drawn significant attention to- wards a more elaborate exchange strategy, targeting binding pocket related regions [30;136]. In essence this approach is based on an event which has occurred at least once in the NRPS evolutionary history [137]. Based on this observation, a region was targeted in the gramicidin S synthetase A GrsA, which spans across most relevant binding pocket residues. The exchanged parts from either the NRPS GrsB, whose gene is located downstream from grsA, or other microbial sources, resulted in a series of active chimeric NRPS. Moreover, neither specificity nor catalytic activity were dramatically reduced, thus delivering a potent scaffold for further improvement using a computa- tional and directed evolutionary strategies.

1 30 7.3. Engineering challenges, application potential and outlook

An increasing level of detail about the structure and functions of the mod- 1 ules and domains as well as about biosynthetic capabilities of NRPS are be- ing unraveled. Hereby, both the diversity of the NRPS as well as their po- tential for biotechnological applications becomes apparent, resulting in an increasing number of products from the secondary metabolism of bacteria Introduction and fungi being considered or already used for societal purposes. Due to the massive genome sequencing efforts, also the overall number of second- ary metabolites, biosynthetic clusters as well as potential producing organ- isms is increasing continuously. Nonetheless, facing global health challenges, such as the anti-microbial resistance, anti-cancer and other drugs, require not only the discovery of new NRPS derived products and characterization of organisms, cluster and products, but also the development of enabling techniques for the exploitation of the full biochemical potential of their ded- icated NRPS. Even though there is a general idea concerning the structure of NRPS modules, domains and their biochemical functions and specificities, NRPS engineering efforts have been limited, as discussed previously. None- theless, the potential biochemical diversity provided by this enzyme class is tremendous and ideas how to unlock this potential have been proposed and will be further discussed. To provide for a fundament to an interlinked and systematic platform to explore the full potential of NRPS, two bottlenecks must be considered. The first concerns the ever growing reservoir of potential targets, gene clusters and standalone NRPS, which have been fully or at least partially identified and characterized. The second concerns the partial and incomplete struc- tures of NRPS domains and modules which have shed some light on essen- tial sub-structures, without revealing overall structures and dynamics of multi-modular NRPS systems. Due to the complexity of this class of mega-­ enzymes, a mere random engineering approach seems disproportionally laborious and will eventually require a target and site preselection within a given NRPS assembly line, based on a combinatorial approach. Such an approach should allow for a preliminary target selection, based on the pro- posed function of the projected novel enzyme and a sequential selection, allowing for the proposition of sub-structural targets for a more directed engineering approach. A platform as described above should furthermore be accessible through various inputs, covering the entire range from primary genetic code, discov- ered in metagenomics samples, till metabolics data and bioinformatic pre- dictions of novel hypothetical structures of a proposed compound found in

31 fermentation samples. AntiSMASH software and database [118] illustrates how the fundament of such a platform may be envisioned. Many features, including the predicted architecture of NRPS domain topology, the assign- ment of tailoring enzymes and a crude proposition of the associated meta- bolic product are indeed already included. However, for example promoter, terminator and RBS properties or domain specificities are yet to be deter- mined using a manual approach. While the latter expression features may be implemented in such a platform with relative ease, by the addition of ex- 8. Scope of the thesis of 8. Scope ternal links or the embedding of additional software tools and databases, an entire second software layer, focusing on structural features or predictions of the associated enzymes is missing. Naturally, due to the limited amount of structural data available, this feature would heavily rely on homology based structural models. Nonetheless, the systematic nature of the platform has the potential to increase the speed and precision of the given predictions while simultaneously expanding the internal database vastly. In an ideal setup, this process may even allow for the discovery of more universally transferable NRPS domains or generally structural elements, which could ultimately be used in a transferable parts library, similar to the BioBricks concept [138–139]. The potential success of such a platform however, will eventually rely on the overall ease of structural data acquisition and more specifically on the possibility to solve any multi-modular NRPS structure.

8. Scope of the thesis

The overall aim of this work concerns the fundamental characterization of the NRPS ACVS and the application of these insights in engineering a hy- brid NRPS to synthesize hpgCV. This hybrid hpgCV NRPS would enable the development of a new biosynthetic pathway, for the production of the anti- biotic D-amoxicillin in a fully fermentative manner. In order to establish this novel NRPS activity, an A-domain is required capable of recognizing hydroxy- phenylglycine (hpg). Chapter II describes the identification and characteri- zation of a multitude of hpg activating domains and demonstrates a system, which allows for the screening of such domains in a medium-throughput fashion, bypassing the utilization of radiolabeled substrates. Using the pre- viously characterized domains, a series of chimeric NRPS were constructed. Therefore, a versatile cloning and expression system was established allow- ing for the dedicated and convenient exchange of NRPS domains in a larger enzyme (Chapter III). The ACV synthetase of Nocardia lactamdurans served as a template for this system due to its natively beneficial specificity and

1 32 compatibility with heterologous expression in E. coli. For a more complete understanding of the ACVS a series of biochemical methods were devel- oped and applied. In addition, structural features of this three modular ACVS 1 NRPS were assessed providing novel data on the overall structural dynam- ics (Chapter IV). A selection of chimeric hpgCV hybrids were subsequently transferred to the production host Penicillium chrysogenum and together with NRPS associated protein MLP examined for the production of D-amoxicillin­ Introduction and the tripeptidic precursor hpgCV (Chapter V). Finally, the effects of dif- ferent MLP of prokaryotic sources were evaluated with respect to a series of P. chrysogenum native NRP products (Chapter VI). In the outlook, new NRPS engineering strategies are discussed.

9. References

1. Smith JL, Sherman DH. Biochemis- a nonribosomal peptide synthetase. try. An enzyme assembly line. Science. Biochemistry. United States; 2017;56: 2008;321: 1304–5. 5380–5390. doi:10.1126/science.1163785 doi:10.1021/acs.biochem.7b00517 2. Stachelhaus T, Marahiel MA. Modular 6. Miller BR, Drake EJ, Shi C, Aldrich structure of peptide synthetases re- CC, Gulick AM. Structures of a nonri- vealed by dissection of the multifunc- bosomal peptide synthetase module tional enzyme GrsA. J Biol Chem. United bound to MbtH-like proteins support a States; 1995;270: 6163–6169. highly dynamic domain architecture. J 3. Baltz RH. Function of MbtH homologs Biol Chem . 2016; in nonribosomal peptide biosynthesis doi:10.1074/jbc.M116.746297 and applications in secondary metabo- 7. Felnagle EA, Barkei JJ, Park H, Podev- lite discovery. J Ind Microbiol Biotechnol. els AM, McMahon MD, Drott DW, et al. 2011;38: 1747–1760. MbtH-like proteins as integral compo- doi:10.1007/s10295-011-1022-8 nents of bacterial nonribosomal pep- 4. Quadri LE, Sello J, Keating TA, Weinreb tide synthetases. Biochemistry. Ameri- PH, Walsh CT. Identification of a My- can Chemical Society; 2010;49: 8815–7. cobacterium tuberculosis gene cluster doi:10.1021/bi1012854 encoding the biosynthetic enzymes for 8. Bloudoff K, Schmeing TM. Structural and assembly of the virulence-conferring sid- functional aspects of the nonribosomal erophore mycobactin. Chem Biol. United peptide synthetase condensation domain States; 1998;5: 631–645. superfamily: discovery, dissection and 5. Schomer RA, Thomas MG. Character- diversity. Biochim Biophys Acta. Nether- ization of the functional variance in lands; 2017;1865: 1587–1604. MbtH-like protein interactions with doi:10.1016/j.bbapap.2017.05.010

33 9. Horsman ME, Hari TPA, Boddy CN. of nonmodular enzymes. Proc Natl Acad Polyketide synthase and non-ribosomal Sci. 2014;111: 9259–9264. peptide synthetase thioesterase selec- doi:10.1073/pnas.1401734111 tivity: logic gate or a victim of fate? Nat 16. Bode HB, Brachmann AO, Jadhav KB, Prod Rep. England; 2016;33: 183–202. Seyfarth L, Dauth C, Fuchs SW, et al. doi:10.1039/c4np00148f Structure Elucidation and Activity of

9. References 9. 10. Ansari MZ, Sharma J, Gokhale RS, Mo- Kolossin A, the D-/L-pentadecapeptide hanty D. In silico analysis of methyl- product of a giant nonribosomal pep- transferase domains involved in biosyn- tide synthetase. Angew Chem Int Ed thesis of secondary metabolites. BMC Engl. Germany; 2015;54: 10352–10355. Bioinformatics. England; 2008;9: 454. doi:10.1002/anie.201502835 doi:10.1186/1471-2105-9-454 17. Yin X, Zabriskie TM. The enduracidin bio- 11. Nielsen ML, Isbrandt T, Petersen LM, synthetic gene cluster from Streptomy- Mortensen UH, Andersen MR, Hoof JB, ces fungicidicus. Microbiology. 2006;152: et al. Linker flexibility facilitates module 2969–2983. exchange in fungal hybrid PKS-NRPS doi:10.1099/mic.0.29043-0 engineering. PLoS One. United States; 18. Dowling DP, Kung Y, Croft AK, Taghiza- 2016;11: e0161199. deh K, Kelly WL, Walsh CT, et al. Struc- doi:10.1371/journal.pone.0161199 tural elements of an NRPS cyclization 12. Li Y, Weissman KJ, Muller R. Insights domain and its intermodule docking do- into multienzyme docking in hybrid main. Proc Natl Acad Sci U S A. United PKS-NRPS megasynthetases revealed States; 2016;113: 12432–12437. by heterologous expression and genetic doi:10.1073/pnas.1608615113 engineering. Chembiochem. Germany; 19. Hahn M, Stachelhaus T. Harnessing the 2010;11: 1069–1075. potential of communication-mediating doi:10.1002/cbic.201000103 domains for the biocombinatorial syn- 13. Shen B, Chen M, Cheng Y, Du L, Edwards thesis of nonribosomal peptides. Proc DJ, George NP, et al. Prerequisites for Natl Acad Sci U S A. National Academy combinatorial biosynthesis: evolution of of Sciences; 2006; 103: 275–80. hybrid NRPS/PKS gene clusters. Ernst doi:10.1073/pnas.0508409103 Schering Res Found Workshop. Ger- 20. Marahiel M a, Stachelhaus T, Mootz HD. many; 2005; 107–126. Modular peptide synthetases involved 14. Du L, Shen B. Biosynthesis of hybrid in nonribosomal peptide synthesis. peptide-polyketide natural products. Chem Rev. American Chemical Society; Curr Opin Drug Discov Devel. England; 1997;97: 2651–2674. 2001;4: 215–228. http://www.ncbi.nlm.nih.gov/pubmed/ 15. Wang H, Fewer DP, Holm L, Rouhiainen 11851476 L, Sivonen K. Atlas of nonribosomal 21. Caboche S, Pupin M, Leclère V, Fon- peptide and polyketide biosynthetic taine A, Jacques P, Kucherov G. NORINE: pathways reveals common occurrence a database of nonribosomal peptides.

1 34 Nucleic Acids Res. 2008;36: D326–31. 29. Beer R, Herbst K, Ignatiadis N, Kats I, doi:10.1093/nar/gkm792 Adlung L, Meyer H, et al. Creating func- 22. Sussmuth RD, Mainz A. Nonribosomal tional engineered variants of the sin- 1 peptide synthesis-principles and pros- gle-module non-ribosomal peptide syn- pects. Angew Chem Int Ed Engl. Ger- thetase IndC by T domain exchange. Mol many; 2017;56: 3770–3821. Biosyst. 2014;10: 1709–18.

doi:10.1002/anie.201609079 doi:10.1039/c3mb70594c Introduction 23. Dang T, Sussmuth RD. Bioactive peptide 30. Kries H. Biosynthetic engineering of natural products as lead structures for me- nonribosomal peptide synthetases. J dicinal use. Acc Chem Res. United States; Pept Sci. England; 2016;22: 564–570. 2017; doi:10.1021/acs.accounts.7b00159 doi:10.1002/psc.2907 24. Watanabe K, Oguri H, Oikawa H. Diversi- 31. Smith S, Witkowski A, Joshi AK. Struc- fication of echinomycin molecular struc- tural and functional organization of the ture by way of chemoenzymatic synthesis animal fatty acid synthase. Prog Lipid and heterologous expression of the engi- Res. England; 2003;42: 289–317. neered echinomycin biosynthetic pathway. 32. Du L, Cheng Y-Q, Ingenhorst G, Tang Curr Opin Chem Biol. 2009;13: 189–196. G-L, Huang Y, Shen B. Hybrid peptide-­ doi:10.1016/j.cbpa.2009.02.012 polyketide natural products: biosynthe- 25. Frisvad JC, Smedsgaard J, Larsen TO, sis and prospects towards engineering Samson RA. Mycotoxins, drugs and novel molecules. Genet Eng (N Y). United other extrolites produced by species in States; 2003;25: 227–267. Penicillium subgenus Penicillium. Stud 33. Shen B. Polyketide biosynthesis beyond Mycol. 2004;49: 201–241. the type I, II and III polyketide synthase 26. Gevers W, Kleinkauf H, Lipmann F. The paradigms. Curr Opin Chem Biol. En- activation of amino acids for biosynthe- gland; 2003;7: 285–295. sis of gramicidin S. Proc Natl Acad Sci 34. Jenke-Kodama H, Sandmann A, Muller U S A. United States; 1968;60: 269–276. R, Dittmann E. Evolutionary implica- 27. Zhang K, Nelson KM, Bhuripanyo K, Grimes tions of bacterial polyketide synthases. KD, Zhao B, Aldrich CC, et al. Engineering Mol Biol Evol. United States; 2005;22: the substrate specificity of the dhbe ade- 2027–2039. doi:10.1093/molbev/msi193 nylation domain by yeast cell surface dis- 35. Bushley KE, Turgeon BG. Phylogenom- play. Chem Biol. Elsevier Ltd; 2013;20: 92– ics reveals subfamilies of fungal nonri- 101. doi:10.1016/j.chembiol.2012.10.020 bosomal peptide synthetases and their 28. Thirlway J, Lewis R, Nunns L, Al Nakeeb evolutionary relationships. BMC Evol Biol. M, Styles M, Struck AW, et al. Introduc- England; 2010;10: 26. tion of a non-natural amino acid into a doi:10.1186/1471-2148-10-26 nonribosomal peptide antibiotic by mod- 36. Bergendahl V, Linne U, Marahiel MA. ification of adenylation domain speci- Mutational analysis of the C-domain in ficity. Angew Chemie — Int Ed. 2012;51: nonribosomal peptide synthesis. Eur J 7181–7184. doi:10.1002/anie.201202043 Biochem. England; 2002;269: 620–629.

35 37. Linne U, Marahiel MA. Control of direc- Chem Rev. American Chemical Society; tionality in nonribosomal peptide syn- 1997;97: 2651–2674. thesis: role of the condensation domain doi:10.1021/cr960029e in preventing misinitiation and timing 45. Knudsen M, Sondergaard D, Toft- of epimerization. Biochemistry. United ing-Olesen C, Hansen FT, Brodersen DE, States; 2000;39: 10439–10447. Pedersen CNS. Computational discovery

9. References 9. 38. Payne JAE, Schoppet M, Hansen MH, of specificity-conferring sites in non-ri- Cryle MJ. Diversity of nature’s assembly bosomal peptide synthetases. Bioin- lines — recent discoveries in non-ribo- formatics. England; 2016;32: 325–329. somal peptide synthesis. Mol Biosyst. doi:10.1093/bioinformatics/btv600 The Royal Society of Chemistry; 2017; 46. Eppelmann K, Stachelhaus T, Marahiel MA. doi:10.1039/C6MB00675B Exploitation of the selectivity-conferring 39. Stachelhaus T, Mootz HD, Marahiel MA. code of nonribosomal peptide synthetases The specificity-conferring code of ade- for the rational design of novel peptide an- nylation domains in nonribosomal peptide tibiotics. Biochemistry. 2002;41: 9718–26. synthetases. Chem Biol. 1999;6: 493–505. 47. Von Döhren H, Dieckmann R, Pave- doi:10.1016/S1074-5521(99)80082-9 la-Vrancic M. The nonribosomal code. 40. Marahiel MA. Protein templates for the Chem Biol. 1999;6: 273–279. biosynthesis of peptide antibiotics. Chem 48. Gulick AM. Structural insight into the nec- Biol. United States; 1997;4: 561–567. essary conformational changes of modular 41. Conti E, Franks NP, Brick P. Crystal struc- nonribosomal peptide synthetases. Curr ture of firefly luciferase throws light on Opin Chem Biol. England; 2016;35: 89–96. a superfamily of adenylate-forming en- doi:10.1016/j.cbpa.2016.09.005 zymes. Structure. United States; 1996;4: 49. Miller BR, Sundlov J a, Drake EJ, Makin 287–298. T a, Gulick AM. Analysis of the linker re- 42. Sun X, Li H, Alfermann J, Mootz HD, Yang gion joining the adenylation and carrier H. Kinetics profiling of gramicidin S syn- protein domains of the modular nonri- thetase A, a member of nonribosomal bosomal peptide synthetases. Proteins. peptide synthetases. Biochemistry. 2014;82: 1–12. doi:10.1002/prot.24635 2014;53: 7983–9. 50. Neville C, Murphy A, Kavanagh K, Doyle doi:10.1021/bi501156m S. A 4’-phosphopantetheinyl transferase 43. Bučević-Popović V, Šprung M, Soldo B, mediates non-ribosomal peptide syn- Pavela-Vrančič M. The A9 core sequence thetase activation in Aspergillus fumi- from nrps adenylation domain is rele- gatus. Chembiochem. 2005;6: 679–85. vant for thioester formation. ChemBio- doi:10.1002/cbic.200400147 Chem. 2012;13: 1913–1920. doi:10.1002/ 51. Weber T, Marahiel MA. Exploring the cbic.201200309 domain structure of modular nonribo- 44. Marahiel MA, Stachelhaus T, Mootz HD. somal peptide synthetases. Structure. Modular Peptide Synthetases Involved 2001;9: R3–R9. in Nonribosomal Peptide Synthesis. doi:10.1016/S0969-2126(00)00560-8

1 36 52. Ku J, Mirmira RG, Liu L, Santi D V. Ex- 58. Balibar CJ, Vaillancourt FH, Walsh pression of a functional non-ribo- CT. Generation of D amino acid res- somal peptide synthetase module in idues in assembly of arthrofactin by 1 Escherichia coli by coexpression with dual condensation/epimerization do- a phosphopantetheinyl transferase. mains. Chem Biol. 2005;12: 1189–200. Chem Biol. 1997;4: 203–207. doi:10.1016/j.chembiol.2005.08.010

doi:10.1016/S1074-5521(97)90289-1 59. Teruya K, Tanaka T, Kawakami T, Akaji K, Introduction 53. Reger AS, Wu R, Dunaway-Mariano D, Aimoto S. Epimerization in peptide thio- Gulick AM. Structural characterization ester condensation. J Pept Sci. 2012;18: of a 140 degrees domain movement 669–77. doi:10.1002/psc.2452 in the two-step reaction catalyzed by 60. Haslinger K, Peschke M, Brieke C, Maxi- 4-chlorobenzoate:CoA . Biochem- mowitsch E, Cryle MJ. X-domain of pep- istry. 2008;47: 8016–25. tide synthetases recruits oxygenases cru- doi:10.1021/bi800696y cial for glycopeptide biosynthesis. Nature. 54. Yonus H, Neumann P, Zimmermann S, 2015;521: 105–9. doi:10.1038/nature14141 May JJ, Marahiel MA, Stubbs MT. Crystal 61. Keating TA, Marshall CG, Walsh CT, Keat- structure of DltA. Implications for the ing AE. The structure of VibH represents reaction mechanism of non-ribosomal nonribosomal peptide synthetase con- peptide synthetase adenylation domains. densation, cyclization and epimerization J Biol Chem. 2008;283: 32484–91. domains. Nat Struct Biol. United States; doi:10.1074/jbc.M800557200 2002;9: 522–526. doi:10.1038/nsb810 55. Gulick AM. Conformational dynamics 62. Bloudoff K, Rodionov D, Schmeing TM. in the acyl-CoA synthetases, adenyla- Crystal structures of the first conden- tion domains of non-ribosomal peptide sation domain of CDA synthetase sug- synthetases, and firefly luciferase. ACS gest conformational changes during the Chem Biol. 2009;4: 811–827. synthetic cycle of nonribosomal peptide 56. Branchini BR, Southworth TL, Mur- synthetases. J Mol Biol. Elsevier B.V.; tiashaw MH, Wilkinson SR, Khattak NF, 2013;425: 3137–3150. Rosenberg JC, et al. Mutagenesis evi- doi:10.1016/j.jmb.2013.06.003 dence that the partial reactions of firefly 63. Perez CE, Park HB, Crawford JM. Func- bioluminescence are catalyzed by dif- tional characterization of a condensation ferent conformations of the luciferase domain that links nonribosomal peptide C-terminal domain. Biochemistry. United and pteridine biosynthetic machineries States; 2005;44: 1385–1393. in Photorhabdus luminescens. Biochem- doi:10.1021/bi047903f istry. United States; 2017; 57. Strieker M, Tanović A, Marahiel M a. doi:10.1021/acs.biochem.7b00863 Nonribosomal peptide synthetases: 64. Zane HK, Naka H, Rosconi F, Sandy M, Structures and dynamics. Curr Opin Haygood MG, Butler A. Biosynthesis of Struct Biol. 2010;20: 234–240. amphi-enterobactin siderophores by doi:10.1016/j.sbi.2010.01.009 Vibrio harveyi BAA-1116: identification

37 of a bifunctional nonribosomal peptide in beta-lactam antibiotic biosynthesis. synthetase condensation domain. J Am Nat Chem Biol. United States; 2014;10: Chem Soc. United States; 2014;136: 251–258. doi:10.1038/nchembio.1456 5615–5618. doi:10.1021/ja5019942 71. Tahlan K, Jensen SE. Origins of the be- 65. Cheng Y, Liu X, An S, Chang C, Zou Y, ta-lactam rings in natural products. J An- Huang L, et al. A nonribosomal peptide tibiot (Tokyo). Japan; 2013;66: 401–410.

9. References 9. synthase containing a stand-alone con- doi:10.1038/ja.2013.24 densation domain is essential for phyto- 72. Walsh CT. Insights into the chemical toxin zeamine biosynthesis. Mol Plant logic and enzymatic machinery of NRPS Microbe Interact. 2013;26: 1294–301. assembly lines. Nat Prod Rep. The Royal Available: http://www.ncbi.nlm.nih.gov/ Society of Chemistry; 2016;33: 127–135. pubmed/23883359 doi:10.1039/C5NP00035A 66. Wu X, Garcia-Estrada C, Vaca I, Martin 73. Sundaram S, Hertweck C. On-line enzy- J-F. Motifs in the C-terminal region of the matic tailoring of polyketides and pep- Penicillium chrysogenum ACV synthetase tides in thiotemplate systems. Curr Opin are essential for valine epimerization and Chem Biol. England; 2016;31: 82–94. processivity of tripeptide formation. Bio- doi:10.1016/j.cbpa.2016.01.012 chimie. France; 2012;94: 354–364. 74. Kelly WL, Hillson NJ, Walsh CT. Excision doi:10.1016/j.biochi.2011.08.002 of the epothilone synthetase B cyclization 67. Gonsior M, Muhlenweg A, Tietzmann domain and demonstration of in trans M, Rausch S, Poch A, Sussmuth RD. condensation/cyclodehydration activity. Biosynthesis of the peptide antibiotic Biochemistry. United States; 2005;44: feglymycin by a linear nonribosomal 13385–13393. doi:10.1021/bi051124x peptide synthetase mechanism. Chem- 75. Patel HM, Tao J, Walsh CT. Epimeriza- biochem. Germany; 2015;16: 2610–2614. tion of an L-cysteinyl to a D-cysteinyl doi:10.1002/cbic.201500432 residue during thiazoline ring forma- 68. Dettner F, Hanchen A, Schols D, Toti tion in siderophore chain elongation by L, Nusser A, Sussmuth RD. Total syn- pyochelin synthetase from Pseudomo- thesis of the antiviral peptide antibi- nas aeruginosa. Biochemistry. United otic feglymycin. Angew Chem Int Ed States; 2003;42: 10514–10527. Engl. Germany; 2009;48: 1856–1861. doi:10.1021/bi034840c doi:10.1002/anie.200804130 76. Bloudoff K, Fage CD, Marahiel MA, 69. Gaudelli NM, Long DH, Townsend CA. Schmeing TM. Structural and mutational beta-Lactam formation by a non-ribo- analysis of the nonribosomal peptide somal peptide synthetase during an- synthetase heterocyclization domain tibiotic biosynthesis. Nature. England; provides insight into catalysis. Proc Natl 2015;520: 383–387. Acad Sci U S A. United States; 2016; doi:10.1038/nature14100 doi:10.1073/pnas.1614191114 70. Gaudelli NM, Townsend CA. Epimeriza- 77. Miller DJ, Ouellette N, Evdokimova tion and substrate gating by a TE domain E, Savchenko A, Edwards A, Anderson

1 38 WF. Crystal complexes of a predicted ACS Chem Biol. United States; 2017;12: S-adenosylmethionine-­dependent methyl­ 1316–1326. transferase reveal a typical AdoMet bind- doi:10.1021/acschembio.7b00081 1 ing domain and a substrate recognition do- 84. Ulrich V, Peschke M, Brieke C, Cryle MJ. main. Protein Sci. United States; 2003;12: More than just recruitment: the X-do- 1432–1442. doi:10.1110/ps.0302403 main influences catalysis of the first phe-

78. Al-Mestarihi AH, Villamizar G, Fernandez nolic coupling reaction in A47934 biosyn- Introduction J, Zolova OE, Lombo F, Garneau-Tsodikova thesis by cytochrome P450 StaH. Mol S. Adenylation and S-methylation of cys- Biosyst. England; 2016;12: 2992–3004. teine by the bifunctional enzyme TioN in doi:10.1039/c6mb00373g thiocoraline biosynthesis. J Am Chem Soc. 85. Li Y-H, Han W-J, Gui X-W, Wei T, Tang S-Y, United States; 2014;136: 17350–17354. Jin J-M. Putative nonribosomal peptide doi:10.1021/ja510489j synthetase and cytochrome P450 genes 79. Reimer JM, Aloise MN, Harrison PM, Mar- responsible for tentoxin biosynthesis in tin Schmeing T. Synthetic cycle of the Alternaria alternata ZJ33. Toxins (Basel). initiation module of a formylating non- Switzerland; 2016;8. ribosomal peptide synthetase. Nature. doi:10.3390/toxins8080234 2016;529: 239–242. 86. Pohle S, Appelt C, Roux M, Fiedler H-P, http://dx.doi.org/10.1038/nature16503 Sussmuth RD. Biosynthetic gene clus- 80. Samel SA, Schoenafinger G, Knappe ter of the non-ribosomally synthesized TA, Marahiel MA, Essen L-O. Structural cyclodepsipeptide skyllamycin: deci- and functional insights into a peptide phering unprecedented ways of unusual bond-forming bidomain from a non- hydroxylation reactions. J Am Chem Soc. ribosomal peptide synthetase. Struc- United States; 2011;133: 6194–6205. ture. United States; 2007;15: 781–792. doi:10.1021/ja108971p doi:10.1016/j.str.2007.05.008 87. Uhlmann S, Sussmuth RD, Cryle MJ. Cyto- 81. Marahiel MA. A structural model for chrome p450sky interacts directly with multimodular NRPS assembly lines. Nat the nonribosomal peptide synthetase to Prod Rep. England; 2016;33: 136–140. generate three amino acid precursors in doi:10.1039/c5np00082c skyllamycin biosynthesis. ACS Chem Biol. 82. Winn M, Fyans JK, Zhuo Y, Micklefield United States; 2013;8: 2586–2596. J. Recent advances in engineering non- doi:10.1021/cb400555e ribosomal peptide assembly lines. Nat 88. Haslinger K, Brieke C, Uhlmann S, Siev- Prod Rep. The Royal Society of Chemis- erling L, Sussmuth RD, Cryle MJ. The try; 2016;33: 317–347. structure of a transient complex of a doi:10.1039/C5NP00099H nonribosomal peptide synthetase and 83. Wise CE, Makris TM. Recruitment and a cytochrome P450 monooxygenase. regulation of the non-ribosomal peptide Angew Chem Int Ed Engl. Germany; synthetase modifying cytochrome P450 2014;53: 8518–8522. involved in nikkomycin biosynthesis. doi:10.1002/anie.201404977

39 89. Yin X, Chen Y, Zhang L, Wang Y, Zabriskie 95. Stegmann E, Rausch C, Stockert S, Burk- TM. Enduracidin analogues with altered ert D, Wohlleben W. The small MbtH- halogenation patterns produced by ge- like protein encoded by an internal gene netically engineered strains of Strepto- of the balhimycin biosynthetic gene myces fungicidicus. J Nat Prod. United cluster is not required for glycopep- States; 2010;73: 583–589. tide production. FEMS Microbiol Lett;

9. References 9. doi:10.1021/np900710q 2006;262: 85–92. 90. Pan HX, Li JA, Shao L, Zhu CB, Chen JS, Tang doi:10.1111/j.1574-6968.2006.00368.x GL, et al. Genetic manipulation revealing an 96. Drake EJ, Miller BR, Shi C, Tarrasch JT, unusual N-terminal region in a stand-alone Sundlov JA, Leigh Allen C, et al. Struc- non-ribosomal peptide synthetase in- tures of two distinct conformations volved in the biosynthesis of ramoplanins. of holo-non-ribosomal peptide syn- Biotechnol Lett. 2013;35: 107–114. thetases. Nature; 2016;529: 235–238. doi:10.1007/s10529-012-1056-7 Available: 91. Hoertz AJ, Hamburger JB, Gooden DM, http://dx.doi.org/10.1038/nature16163 Bednar MM, McCafferty DG. Studies on 97. Herbst D a., Boll B, Zocher G, Stehle T, the biosynthesis of the lipodepsipep- Heide L. Structural basis of the interac- tide antibiotic ramoplanin A2. Bioor- tion of mbth-like proteins, putative reg- ganic Med Chem. Elsevier Ltd; 2012;20: ulators of nonribosomal peptide biosyn- 859–865. doi:10.1016/j.bmc.2011.11.062 thesis, with adenylating enzymes. J Biol 92. Wilkinson B, Micklefield J. Chapter 14. Chem. 2013;288: 1991–2003. Biosynthesis of nonribosomal peptide doi:10.1074/jbc.M112.420182 precursors. Methods Enzymol. United 98. Baltz RH. MbtH homology codes to States; 2009;458: 353–378. identify gifted microbes for genome doi:10.1016/S0076-6879(09)04814-9 mining. J Ind Microbiol Biotechnol. Ger- 93. Lyu SY, Liu YC, Chang CY, Huang CJ, Chiu many; 2014;41: 357–369. YH, Huang CM, et al. Multiple complexes doi:10.1007/s10295-013-1360-9 of long aliphatic N-acyltransferases lead 99. Copp JN, Neilan BA. The phosphopanteth- to synthesis of 2,6-diacylated/2-acyl-sub- einyl transferase superfamily: phyloge- stituted glycopeptide antibiotics, effec- netic analysis and functional implications tively killing vancomycin-resistant entero- in cyanobacteria. Appl Environ Microbiol. coccus. J Am Chem Soc. United States; United States; 2006;72: 2298–2305. 2014;136: 10989–10995. doi:10.1128/AEM.72.4.2298-2305.2006 doi:10.1021/ja504125v 100. Leibundgut M, Maier T, Jenni S, Ban N. 94. Buchko GW, Kim CY, Terwilliger The multienzyme architecture of eu- TC, Myler PJ. Solution structure of karyotic fatty acid synthases. Curr Opin Rv2377c-founding member of the MbtH- Struct Biol. England; 2008;18: 714–725. like . Tuberculosis. Else- doi:10.1016/j.sbi.2008.09.008 vier Ltd; 2010;90: 245–251. 101. Beld J, Sonnenschein EC, ­Vickery CR, Noel doi:10.1016/j.tube.2010.04.002 JP, Burkart MD. The phosphopantetheinyl

1 40 transferases: catalysis of a posttransla- 108. du Toit EA, Rautenbach M. A sensitive tional modification crucial for life. Nat standardised micro-gel well diffusion Prod Rep. 2014;31: 61–108. doi:10.1039/ assay for the determination of antimi- 1 c3np70054b crobial activity. J Microbiol Methods. 102. Mofid MR, Finking R, Essen LO, Marahiel Netherlands; 2000;42: 159–165. MA. Structure-based mutational analy- 109. Dingle J, Reid WW, Solomons GL. The

sis of the 4’-phosphopantetheinyl trans- enzymic degradation of pectin and other Introduction ferases Sfp from Bacillus subtilis: carrier polysaccharides. II—Application of the protein recognition and reaction mech- “cup-plate” assay to the estimation of anism. Biochemistry. 2004;43: 4128–36. enzymes. J Sci Food Agric. John Wiley & doi:10.1021/bi036013h Sons, Ltd; 1953;4: 149–155. 103. Salo O V, Ries M, Medema MH, Lank- doi:10.1002/jsfa.2740040305 horst PP, Vreeken RJ, Bovenberg RAL, et 110. Duyen TTM, Matsuura H, Ujiie K, Mura- al. Genomic mutational analysis of the oka M, Harada K, Hirata K. Paper-based impact of the classical strain improve- colorimetric biosensor for antibiotics ment program on beta-lactam produc- inhibiting bacterial protein synthesis. J ing Penicillium chrysogenum. BMC Ge- Biosci Bioeng. Japan; 2017;123: 96–100. nomics. England; 2015;16: 937. doi:10.1016/j.jbiosc.2016.07.015 doi:10.1186/s12864-015-2154-4 111. Kim DH, Hur J, Park HG, Il Kim M. Re- 104. Backus MP, Stauffer JF. The production agentless colorimetric biosensing plat- and selection of a family of strains in form based on nanoceria within an Penicillium chrysogenum. Mycologia. agarose gel matrix. Biosens Bioelectron. JSTOR; 1955;47: 429–463. England; 2017;93: 226–233. 105. Raper KB, Alexander DF, Coghill RD. doi:10.1016/j.bios.2016.08.113 Penicillin: II. Natural variation and peni- 112. Liu H, Ma C, Wang J, Chen H, Wang K. cillin production in Penicillium notatum Label-free colorimetric assay for T4 and allied species. J Bacteriol. United polynucleotide kinase/phosphatase ac- States; 1944;48: 639–659. tivity and its inhibitors based on G-qua- 106. Bray W, Lokey RS. A high-throughput druplex/hemin DNAzyme. Anal Bio- yeast halo assay for bioactive com- chem. United States; 2017;517: 18–21. pounds. Cold Spring Harb Protoc. United doi:10.1016/j.ab.2016.10.022 States; 2016;2016: pdb.prot088047. 113. Mahmoudvand H, Sepahvand A, Ja- doi:10.1101/pdb.prot088047 hanbakhsh S, Ezatpour B, Ayatollahi 107. Ferrini AM, Mannoni V, Aureli P. Com- Mousavi SA. Evaluation of antifungal ac- bined plate microbial assay (CPMA): a tivities of the essential oil and various 6-plate-method for simultaneous first extracts of Nigella sativa and its main and second level screening of antibac- component, thymoquinone against terial residues in meat. Food Addit Con- pathogenic dermatophyte strains. J My- tam. England; 2006;23: 16–24. col Med. France; 2014;24: e155–61. doi:10.1080/02652030500307131 doi:10.1016/j.mycmed.2014.06.048

41 114. Sotoud H, Gribbon P, Ellinger B, Reinsha- modular natural products. Nat Commun. gen J, Boknik P, Kattner L, et al. Develop- England; 2015;6: 8421. ment of a colorimetric and a fluorescence doi:10.1038/ncomms9421 phosphatase-inhibitor assay suitable for 121. Umemura M, Koike H, Machida M. Mo- drug discovery approaches. J Biomol tif-independent de novo detection of Screen. United States; 2013;18: 899–909. secondary metabolite gene clusters-to-

9. References 9. doi:10.1177/1087057113486000 ward identification from filamentous 115. Zhang J, Wang Z, Wen K, Liang X, Shen fungi. Front Microbiol. Switzerland; J. Penicillin-binding protein 3 of Strepto- 2015;6: 371. coccus pneumoniae and its application doi:10.3389/fmicb.2015.00371 in screening of beta-lactams in milk. 122. Finn RD, Clements J, Eddy SR. HMMER Anal Biochem. United States; 2013;442: web server: interactive sequence similar- 158–165. doi:10.1016/j.ab.2013.07.042 ity searching. Nucleic Acids Res. 2011;39: 116. Babington R, Matas S, Marco M-P, Galve W29–37. doi:10.1093/nar/gkr367 R. Current bioanalytical methods for 123. Khaldi N, Seifuddin FT, Turner G, Haft D, detection of penicillins. Anal Bioanal Nierman WC, Wolfe KH, et al. SMURF: Chem. Germany; 2012;403: 1549–1566. Genomic mapping of fungal secondary doi:10.1007/s00216-012-5960-4 metabolite clusters. Fungal Genet Biol. 117. Sternesjo A, Gustavsson E. Biosensor United States; 2010;47: 736–741. analysis of beta-lactams in milk using doi:10.1016/j.fgb.2010.06.003 the carboxypeptidase activity of a bac- 124. Finn RD, Coggill P, Eberhardt RY, Eddy terial penicillin binding protein. J AOAC SR, Mistry J, Mitchell AL, et al. The Pfam Int. United States; 2006;89: 832–837. protein families database: towards a 118. Blin K, Medema MH, Kottmann R, Lee more sustainable future. Nucleic Acids SY, Weber T. The antiSMASH database, Res. England; 2016;44: D279–85. a comprehensive database of microbial doi:10.1093/nar/gkv1344 secondary metabolite biosynthetic gene 125. Prieto C. Characterization of nonri- clusters. Nucleic Acids Res. England; bosomal peptide synthetases with 2017;45: D555–D559. NRPSsp. Methods Mol Biol. United doi:10.1093/nar/gkw960 States; 2016;1401: 273–278. 119. Weber T, Kim HU. The secondary metab- doi:10.1007/978-1-4939-3375-4_17 olite bioinformatics portal: Computa- 126. Röttig M, Medema MH, Blin K, Weber T, tional tools to facilitate synthetic biol- Rausch C, Kohlbacher O. NRPSpredic- ogy of secondary metabolite production. tor2--a web server for predicting NRPS Synth Syst Biotechnol. China; 2016;1: adenylation domain specificity. Nucleic 69–79. doi:10.1016/j.synbio.2015.12.002 Acids Res. 2011;39: W362–7. 120. Johnston CW, Skinnider MA, Wyatt MA, doi:10.1093/nar/gkr323 Li X, Ranieri MRM, Yang L, et al. An au- 127. Chen Y, McClure RA, Kelleher NL. Screen- tomated genomes-to-natural products ing for expressed nonribosomal peptide platform (GNP) for the discovery of synthetases and polyketide synthas-

1 42 es using LC-MS/MS-based proteomics. United States; 2014;3: 748–758. Methods Mol Biol. United States; 2016; doi:10.1021/sb3000673 1401: 135–147. 134. Ackerley DF, Lamont IL. Characteriza- 1 doi:10.1007/978-1-4939-3375-4_9 tion and genetic manipulation of pep- 128. Bumpus SB, Evans BS, Thomas PM, Ntai tide synthetases in Pseudomonas aeru- I, Kelleher NL. A proteomics approach to ginosa PAO1 in order to generate novel

discovering natural products and their pyoverdines. Chem Biol. United States; Introduction biosynthetic pathways. Nat Biotechnol. 2004;11: 971–980. United States; 2009;27: 951–956. doi:10.1016/j.chembiol.2004.04.014 doi:10.1038/nbt.1565 135. Calcott MJ, Owen JG, Lamont IL, ­Ackerley 129. Konno S, Ishikawa F, Suzuki T, Dohmae DF. Biosynthesis of novel pyoverdines N, Burkart MD, Kakeya H. Active site-di- by domain substitution in a nonribo- rected proteomic probes for adenyla- somal peptide synthetase of Pseudomo- tion domains in nonribosomal peptide nas aeruginosa. Appl Environ Microbiol. synthetases. Chem Commun (Camb). United States; 2014;80: 5723–5731. England; 2015;51: 2262–2265. doi:10.1128/AEM.01453-14 doi:10.1039/c4cc09412c 136. Kries H, Niquille DL, Hilvert D. A sub- 130. Ishikawa F, Kakeya H. Specific enrich- domain swap strategy for reengineer- ment of nonribosomal peptide syn- ing nonribosomal peptides. Chem Biol. thetase module by an affinity probe 2015;22: 640–8. for adenylation domains. Bioorg Med doi:10.1016/j.chembiol.2015.04.015 Chem Lett. England; 2014;24: 865–869. 137. Höfer I, Crüsemann M, Radzom M, Geers doi:10.1016/j.bmcl.2013.12.082 B, Flachshaar D, Cai X, et al. Insights into 131. Ishikawa F, Kakeya H. A competitive the biosynthesis of hormaomycin, an ex- enzyme-linked immunosorbent assay ceptionally complex bacterial signaling system for adenylation domains in non- metabolite. Chem Biol. 2011;18: 381–391. ribosomal peptide synthetases. Chembi- doi:10.1016/j.chembiol.2010.12.018 ochem. Germany; 2016;17: 474–478. 138. Shetty RP, Endy D, Knight TF. Engineer- doi:10.1002/cbic.201500553 ing BioBrick vectors from BioBrick parts. 132. Nguyen KT, Ritz D, Gu J-Q, Alexander J Biol Eng. BioMed Central; 2008;2: 5. D, Chu M, Miao V, et al. Combinatorial doi:10.1186/1754-1611-2-5 biosynthesis of novel antibiotics related 139. Yang J, Yu S, Gong B, An N, Alterovitz to daptomycin. Proc Natl Acad Sci U S A. G. Biobrick chain recommendations for United States; 2006;103: 17462–17467. genetic circuit design. Comput Biol Med. doi:10.1073/pnas.0608589103 United States; 2017;86: 31–39. 133. Baltz RH. Combinatorial biosynthesis of doi:10.1016/j.compbiomed.2017.04.019 cyclic lipopeptide antibiotics: a model for synthetic biology to accelerate the evolution of secondary metabolite bio- synthetic pathways. ACS Synth Biol.

43

CHAPTER II

Identification and characterization of nonribosomal peptide synthetase modules that activate 4-hydroxyphenylglycine

Reto D. Zwahlen,1 Remon Boer,3 Ulrike M. Müller,3 Roel A.L. Bovenberg,2,3 and Arnold J.M. Driessen1,4

1Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 2Synthetic Biology and Cell Engineering, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 3DSM Biotechnology Centre, Delft, The Netherlands 4Kluyver Centre for Genomics of Industrial Fermentations, Julianalaan 67, 2628BC Delft, The Netherlands

submitted to PLOS One Abstract

4-hydroxyphenylglycine (hpg) is a small non-proteinogenic amino acid, present in secondary metabolites such as teicoplanin, ram- oplanin or enduracidin. Due to its physico-chemical properties, hpg occupies a crucial role in the determination of compound structures, thereby establishing therapeutically relevant proper- ties. Compounds containing hpg are predominantly synthetized by modular Nonribosomal peptide synthetases (NRPS), contain- ing adenylation domain bearing modules, which can specifically integrate hpg into a growing peptide. NRPS may require the spe- cific binding of an integral helper protein group, the MbtH ho- mologues. Using pFAM software as a basis for the identification of potential hpg activating adenylation domains, 12 modules and their MbtH helper proteins in a total of 17 domain setups, were selected, expressed in E. coli and characterized in vitro including their MbtH homologs, and subsequently subjected to expression and substrate specificity characterization experiments. Heterolo- gous expression in E. coli was observed with 9 targets, and with all 17, when co-­expressed with MbtH or upon introduction of an N-terminal maltose binding protein (MBP) tag, respectively. Pyro­ phosphate exchange assays revealed, that all domains exhibited activity towards L- or D-4-hpg and underlined the essential role of MbtH homologues in the activation of hpg activating domains. In addition, comparing the adenylation velocities, we demonstrated the crucial role of NRPS starter modules, which showed an up to 5-fold increased activity when compared to any elongation module in this study. Overall, we demonstrate here a medium throughput system, allowing for the identification and character- ization of specific NRPS modules. In the context of future com- pound and NRPS engineering efforts, this may serve as a funda- ment for the selection of domains and modules with promising specificity and activity, eventually allowing for the creation of novel bioactive compounds. Introduction

Multi-modular enzymes represent a highly evolved and complex family of large, polycatalytic, biochemical machineries. One of the most versatile rep- resentatives of this superfamily are the nonribosomal peptide synthetases (NRPS). The production of antibiotics, immunosuppressants, cytostatics, but also pigments can all be attributed to NRPS. Members of this group of modu- 2 lar enzymes can consist of up to 15 [1] functionally distinct parts or modules. Modules are characterized by their distinct functions in recognition, activa- tion and condensation of a dedicated compound into a growing nonribosomal peptide (NRP). In order to allow for all the aforementioned functions, a mod- ule contains a set of functionally distinct parts, or domains, each carrying out a distinct catalytic sub-step. Thus, a minimal module must contain an adenylation (A), thiolation (T) and condensation (C) domain. The adenylation domain, consisting of a Acore and Asub part [2] is responsible for the recogni- tion, adenylation and thioesterification of a substrate. Through the hydrolysis of ATP, an AMP-substrate conjugate is formed, which is subsequently trans- ferred to the free thiol group of the 4‘-phosphopantetheinyl-­moiety (ppant), which is anchored to the downstream T domain [3–4]. This thioesterification is guided by a structural rearrangement of the Asub domain, which itself un- dergoes a rotation by about 140° [5–6] relative to the Acore domain. The ppant arm, linked to the activated substrate, is itself covalently attached to a highly conserved serine residue which is part of the GGXS motif of the T domain [7–8]. To finalize peptide bond formation, the activated T-ppant-substrate moiety associates with the C domain where the peptide bond formation is guided. Upon a nucleophilic attack, a dipeptide is created which remains attached to the downstream ppant arm [9]. An exception to those minimal conditions are standalone modules and initiation or N-­terminal modules of multi modular NRPS enzymes. These NRPS modules lack C domains due to the absence of an upstream module. A series of additional domains might be present in NRPS enzymes, but none of them is essential for the peptide chain formation except for the C-terminal thioesterase­ (Te) domain, which is crucial for product release. In the absence of such a domain, standalone thioesterase modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification enzymes may fulfill this function. Multiple highly selective domains for epimerization (E), halogenation, oxi- dation and a multitude of pluripotent C domains have been identified [10–12]. Despite the abundance of many functionally distinct domains, some reac- tions still require direct or indirect involvement of other factors. These may involve essential steps, such as the attachment of the phosphopantheteinyl moiety at the thiolation domain which is a CoA dependent process carried

47 out by a phosphopantheteinyl-transferase (sfp) [13]. An example of poten- tially essential chaperones is the highly conserved MbtH protein [14–15]. The precise role of these relatively small proteins, i.e., ~70 amino acids, has re- mained elusive. However, MbtH proteins have been shown to increase the adenylation activity [16] as well as the levels of NRPS expression [17]. More- over, high affinity association with A domains seems to play a crucial role in the MbtH activity [18–19]. MbtH homologs are essential for some bacterial NRPS enzymes, but seem entirely absent from eukaryotic organisms such as fungi. Also, the number of MbtH copies and variants varies strongly, even among strains from the same organism [20] while the impact on functionality

Introduction is very diverse. It ranges from a complete loss of functionality [21] to insignif- icant variations in product levels [15] as gene deletion studies suggest [22]. Even the replacement of standalone MbtH proteins with a covalently linked, MbtH-like domain has been demonstrated, using RubC [23]. Recently, the first A domain-MbtH structure has become available, suggesting a set of con- served residues which are tightly involved in the MbtH-NRPS association [19]. Many different NRPS products, enzymes, domains and catalytical func- tions have been identified and assigned. A domains have long been consid- ered to be a bottleneck in NRPS systems, fulfilling an essential gatekeeper function. Therefore, they were in the focus of characterization, prediction and engineering attempts for an extended period of time [24–26]. This has lead to the identification of important motifs for adenylation [27], substrate specificity, as well as a better understanding of the domain arrangement through structural analysis [2]. Despite this knowledge and use of an ever increasing reservoir of structures and predictive software, successful site directed engineering efforts are still exceptional [25;28–30]. Experiments targeting the elucidation of catalytic sub steps support the importance of A domains, though also novel targets in substrate specificity emerged, espe- cially C domain characterization. T domains, the smallest and catalytically in- active domain, rely mostly on a core GGXS motif resolving around the highly conserved serine residue, necessary for sfp dependent ppant attachment [7]. Due to their very high degree of structural conservation and the availability of crystal structures, the overall structure is well understood. Not only sin- gle domains were structurally characterized, but also multi-domain struc- tures as well as whole modules [31–33]. More recent studies attribute an engineering potential to T domains, since it has been shown that synthetic T domains can improve product output in vivo and in vitro [34]. Nevertheless, conformational changes of the NRPS enzymes during their catalytic cycle are not well understood. Thus, the potential for accurate predictions in the context of engineering efforts are limited.

2 48 Despite many studies, C domains remain the most elusive. Early experi- ments indicate that NRPS enzymes, including C domains, show considerable promiscuity in substrate incorporation [35]. However, the C domain never played an important role in engineering attempts. A turnaround in C domain perception was marked with the discovery of the essential HHxxDG motif [36] involved in peptide bond formation. Further experiments revealed additional conserved sequences related to stereochemical selection [37], indicating 2 important potential bottlenecks for engineering. In terms of sequence con- servation and domain size, C domains are exceptionally diverse [38] and are linked to multifunctionality [10]. This has been underlined, as condensation-­ epimerization [10] domains were discovered. Recently, more novel functions, most notably oxygenase recruitment, has been described [12]. In biochemi- cal studies, NRPS catalytic sub steps have been investigated and likely bot- tlenecks were identified [39]. However, crucial information about overruling complexities remain widely unknown as no complete NRPS crystal structure is available. Even with the high number of predictive tools and genomic data, a systematic analysis of domains remains to be a delicate task. Given the reservoir of novel amino acids that can be incorporated, the exploitation of A domains­ for novel NRP production will remain an important challenge. 4-hydroxyphenylglycine (hpg) is a non-proteinogenic amino acid that is present in various NRPs. The 4-hydroxy group on the phenol ring makes hpg especially valuable for covalent cross- linking, side branching and cir- cularization of large NRPs [40–41]. The inclusion of higher order structures can vastly improve relevant properties and spectra of bioactive substances. Thus, the introduction of hpg into scaffold molecules such as antibiotics, fungicides and other relevant compounds has the potential to create novel classes of pharmaceuticals, bypassing contemporary problems, such as the widespread antibiotic resistance. Using fundamental techniques for com- pound [42], gene and domain determination [43–44], paired with web based software for substrate specificity prediction [45], a selection of putative hpg activating modules were identified and selected. These were produced in E. coli in the absence and presences of MbtH proteins, purified and assayed for activity and specificity. These data provide a reference for future explora- modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification tion of hpg units and the construction of novel chimeric NRPS enzymes that incorporate hpg in the NRP product.

49 Materials and methods

Selection and determination of hpg activating domains

Putative hpg activating modules were identified using the protein databases UniRef100, NCBI environmental as well as others as a resource. In a first step putative NRPS protein sequences were extracted, based on concurrent pres- ence of Pfam [46] motifs for AMP-binding, ppant-binding, and condensation domains. The hypothetical NRPS domain structures were determined and potential adenylation domains classified as part of a starter or elongation module. To predict the preferred amino acid bound by these adenylation do- main sequences, they were analyzed using the NRPSpredictor platform [47]

Materials and methods which includes the identification of the “specificity conferring” code by Stachelhaus [27]. The NRPS modules containing adenylation domains with at least 70 % identity to the Stachelhaus code for putative hpg binding were used for final selection.

Cloning, plasmids and culture conditions

Cloning was performed using E. coli DH5α. Selecting with 50 µg/ml neomycin for pSCI242/243 plasmids (pBR322ori; pBAD, araC; FdT); 100 µg/ml ampicil- lin for pMAL-c5x plasmids (pMB1ori; bla; pTac; lacIq; rrnb T2) and 15 µg/ml chloram­phenicol for pACYCtac-MbtH plasmids (p15Aori; cat; pTac; lacIq), re- spectively. E. coli BL21 (DE3) strain was used for expression of the adenyla- tion domains. All cultures were grown using 2xPY (15 g/l bacto-­tryptone, 10 g/l yeast extract, 10 g/l sodium chloride, pH 7.0) at 37 °C and 200 rpm. Synthetic DNA fragments of domains were ordered at DNA 2.0 and codons optimized for expression in E. coli. Fragments were subsequently subcloned using the available NdeI × SbfI sites (pMAL-c5x, N8108S, New England Bio- labs Inc., Ipswich, MA).

Expression and purification of hybrid NRPS and hpg modules

Cultures were grown to an OD600 of 0.6 in 2xPY medium, transferred to 18 °C and 200 rpm for 1 h, and subsequently induced using 0.5 mM IPTG and 0.2 % L-arabinose. Harvest followed 18 hours after induction by spin- ning at 3500 g for 15 minutes. After resuspension in lysis buffer (50 mM HEPES pH 7.0, 300 mM NaCl, 2 mM DTT, complete EDTA free protease

2 50 inhibitor (Roche, Basel, CH, REF 04693159001), cells were disrupted using sonication (6 s/15 s; on/off, 60 cycles, 10 µm amplitude) and cell-free lysate obtained by centrifugation at 4 °C, 13000 g, 15 minutes. Enzymes were pu- rified by means of Ni-NTA bead (Qiagen, Hilden, Ger, No. 30210) supported his-tag affinity purification using gravity flow. Wash steps were performed using wash buffer (50 mM HEPES pH 7.0, 300 mM NaCl, 20 mM imidazole) and a one-step elution using elution buffer (50 mM HEPES pH 7.0, 300 mM 2 NaCl, 250 mM imidazole). Imidazole was removed, while simultaneously concentrating the sample with Amicon®-50 spin filters (Amicon® Ultra ­Ultracel — 50K, Merck Millipore, Billerica, MA, USA, UFC 805024). Protein concentrations were determined using BioRad DC assay kit. All fractions were subsequently analyzed on a 10 % SDS-PAGE. Insoluble expression was determined using the solubilized cell debris in 8 M UREA. Gels were stained using a 0.025 % coomassie stain and images taken using an imaging cabi- net (FUJIFILM, Tokyo, Jp, LAS-4000). A domain — MbtH stoichiometry was quantified by means of 2-D densitometric analysis using AIDA Image Ana- lyzer v.4.22 software.

Substrate profiling of putative hpg activating domains

The substrate profile and promiscuity of putative hpg activating domains was evaluated using the in vitro pyrophosphate exchange kit EnzCheck (EnzCheck, Thermo Fischer Scientific, Waltham, MA, USA, E6645). Iso- lated enzyme was tested at a concentration of 1 µM in a 96-well setup and a total reaction volume of 100 µl. L- or D-4-phenylglycine (Pg); L- or D-4- hydroxyphenyl­glycine (hpg) and L-phenylalanine, serving as negative control, were tested respectively. All substrates were screened at a concentration of 0.1 and 1 mM. The absorption at 360 nm of the coupled indicator substrate MESG (2-amino-6-mercapto-7-methylpurine ribonucleoside) was measured in 5 minutes intervals over a period of 4 h. Activity was compared as a func- tion of (PPi) µM min-1 (enzyme) µM-1. Identification and characterization of nonribosomal peptide synthetase modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification

Results

Selection and determination of putative hpg activating domains

Putative hpg activating domains were identified using various databases. NRPS sequences were identified, based on Pfam [46] motifs and obtained

51 550

540

360

350 Results

340

330

320

310

300

290

280

270

260

250

240 9 2 1 3 4 1 9 2 4 6 7 / 4 2 2 2 2 1 2 g p 3 / 1 / 1 / 2 / c g g g g p g 1 3 0 5 T g p p p p p using ClustalX. The motif is almost conserved throughout all domains. throughout conserved is almost The motif using ClustalX. Figure 1 — Stachelhaus code of selected adenylation domains. selected adenylation of code 1 — Stachelhaus Figure H 1 1 2 2 The 10 amino acid motif, involved in adenylation specificity is indicated (red). Motifs were predicted using NRPSpredictor [47] and the alignment was made and the alignment [47] predicted using NRPSpredictor were Motifs (red). specificity is indicated in adenylation involved The 10 amino acid motif, p H H H H H g g g g H p p p p H H H H

2 52 Table 1 — Final selection of target domain constructs.

Construct Organism NRPS setup Initiation Reference Tcp9 Actinoplanes teichomyceticus ATCC53649 AT CATE Y [51] Hpg1 Streptomyces toyocaensis NRRL 15009 AT CATE Y [52] Hpg3/4 Nonomuraea sp. ATCC 39727 AT CATE Y [53] Hpg11/12 Amycolatopsis mediterranei CATE CAT N [54] Hpg13/14 Amycolatopsis mediterranei CATE CAT N [54] 2 Hpg19 Streptomyces coelicolor A3(2) CAT CATE CAT CATE N [55] Hpg20/26 Streptomyces lavendulae CATE N [56] Hpg21 Streptomyces toyocaensis NRRL 15009 CATE CAT N [52] Hpg22 Actinoplanes teichomyceticus ATCC53649 CATE CAT N [51] Hpg23 uncultured CATE CAT N [57] Hpg24 Amycolatopsis orientalis CATE CAT N [58] Hpg25/27 uncultured CATE CAT N [57] Construct code, domain setup of target NRPS and corresponding module are indicated. Fur- thermore module placement, Initiation or other is shown. Constructs derived from the same module, in different domain setups are indicated with double numbers and adjusted colors, e.g. Hpg XX/XY

adenylation domain sequences were subsequently analyzed using the NRP- Spredictor platform [47]. Hpg affinity was finally evaluated using NRPSpre- dictor data, leading to a list of 58 potential candidate modules. Sequences of 43 modules had at least 80 % identity with the predicted Stachelhaus [27]­ hpg core motif (Supplementary figure 1). Upon further analysis of the puta- tive targets, 43 modules could be subdivided into 7 initiation and 36 elon- gation modules. Additionally, we elucidated that those domains can be as- signed to 28 multi-modular NRPS, arranged in 14 secondary metabolite gene clusters. 13 out of 14 can be assigned to a bacterial host, though the origin of two remains elusive, as they were obtained from metagenomic data. After consideration of the adenylation domain placement within the mod- ule as well as the availability and feasibility of source material, 12 module adenylation domains were selected (Figure 1). The adenylation domains were modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification subsequently ordered in different setups, either as single A domain or in- cluding their adjacent T, Te or C domains. All genes were codon optimized for E. coli by the manufacturer (DNA 2.0). This lead to a total of 18 constructs as indicated in Table 1. In addition to the NRPS, their putative corresponding MbtH homologs were identified. MbtH proteins have a chaperoning function, which is potentially essential for adenylation activity and expression. Iden- tification followed taking the conserved MbtH core motif [17], leading to a

53 selection of 7 MbtH homologs associated with dedicated NRPS gene clusters. Their corresponding sequences were furthermore analyzed for hypothetical NRPS binding motifs (Figure 2A/B).

Heterologous expression profiling

In order to determine the expression properties of the selected putative hpg activating domains, they were subjected to small-scale expression experi-

Results ments. Constructs in the pSCI242 vector were expressed without the addi- tion of MbtH proteins. Subsequently, the soluble fraction as well as the cell debris or insoluble fraction were analyzed. 9 out of 17 constructs showed soluble expression of their respective domain(s). However, the expression levels varied strongly. Hpg14, 16 and 20 showed very low amounts hardly traceable using coomassie staining but with the occurrence of insoluble pro- tein (data not shown). One variant, Hpg18, allowed for the detection of insol- uble expression only. The remaining 6 Hpg variants (Hpg11, 12, 21, 22, 23, 24) did not result in any heterologously-expressed protein. In order to improve soluble protein expression, a maltose binding protein (MBP) was fused to the N-terminus of the constructs. Expression trials were repeated, using the MBP-Hpg A domain constructs with and without co-­ expression of MbtH proteins. This lead to significantly improved expression behavior for all tested constructs. We could observe expression of all con- structs and eliminate insoluble protein almost entirely, in favor of the highly soluble MBP-Hpg variant. The qualitative expression results were indepen- dent of MbtH utilization, although, co-expression increased the total amount of soluble MBP-Hpg. The positive effect of MBP-tag introduction is best illus- trated comparing Hpg11, 21, 22 and 23, which were previously not expressed at all (data not shown). Another notable observation is the high affinity bind- ing of MbtH variant Tcp13. It did not only increase protein amount, but also co-eluted with all tested domains. To further elucidate the promiscuity and effect of the different MbtH variants, 4 A domain constructs and their 3 -re spective MbtH variants were chosen for co-expression trials. The selection in- cludes two domains and two MbtH homologs of the teicoplanin biosynthetic cluster, starter module A domain Tcp9 (AT) and elongation module A domain Hpg22 (A) plus the MbtH proteins Tcp13 and Tcp17, and the constructs Hpg25 (A) and Hpg27 (ATE), comprising the same elongation A-domain and their re- spective MbtH VEG8, derived from a metagenomic sample. Comparing the performance of the different MbtH variants, Tcp13 showed the overall best performance with all tested constructs (Table 2). Focusing on teicoplanin

2 54 2

70

60 W

50 ) [19]. The core motif is motif core The ) [19]. indicated ( yellow in binding are involved potentially - Addi highlighted in red. are ( W ) residues tryptophan conserved bold. Highly the interaction of is indicated. Position box) site (red interaction ical A-MbtH Figure 2 A — Alignment of MbtH variants used in the various experiments. MbtH Alignment of — A 2 Figure Figure 2 B — Hypothetical binding side on A domains. A binding side on 2 B — Hypothetical Figure site were determined according to structural studies by [19]. studies by to structural determined according site were - the hypothet around ( S ) resolving serine residue a central in green, tionally, widely unaltered among Tcp9, Hpg22 and Hpg25/27. Tcp9, among unaltered widely The alignments were made in ClustalX [59], in depth analyzed variants are in variants are analyzed in depth [59], made in ClustalX were The alignments ). Other residues residues ). Other A419 ( green A414 and around resolves motif binding core The

40 W g n i d n i b 430 H 30 t b M – W A S e

420 r o 20 c - b i n d g H 410 t M b Identification and characterization of nonribosomal peptide synthetase modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification –

10 A

400 I S 3 7 8 G

A M 1 1

G D O B P T E P P C C C C V E T T 390

55 Table 2 — Pyrophosphate release rates of hpg modules, co-expressed with selected MbtH variants. Values are expressed as a function of (PPi) µM min-1 (Enzyme) µM-1. The color code is set at red = low (0)/ green = high (3). The high activity of Tcp9 is the most striking feature of this data set. N=2; error of mean overall = 0.056.

D-hpg L-hpg D-pg L-pg L-phe 0.1 mM 1 mM 0.1 mM 1 mM 1 mM 1 mM 1 mM Tcp13 2.07 5.28 1.18 1.16 0.05 1.32 0.00 Tcp9 Tcp17 1.71 4.63 1.20 1.34 0.05 1.44 0.00 Results Veg8 3.72 8.44 1.23 1.40 0.07 2.32 0.00

Tcp13 0.08 0.66 0.80 1.03 0.00 0.17 0.00 Hpg22 Tcp17 0.11 0.63 0.95 1.04 0.04 0.23 0.00 Veg8 0.00 0.86 1.38 1.54 0.00 0.09 0.00

Tcp13 0.14 0.92 0.56 0.59 0.01 0.17 0.00 Hpg25 Tcp17 0.17 1.03 0.70 0.64 0.02 0.14 0.00 Veg8 0.18 1.39 0.61 0.61 0.02 0.20 0.00

Tcp13 0.12 0.72 0.54 0.57 0.01 0.15 0.00 Hpg27 Tcp17 0.11 0.62 0.48 0.52 0.01 0.12 0.00 Veg8 0.27 1.42 0.84 0.88 0.02 0.27 0.00

min max Hpg 25 (A) Hpg 27 (ATE)

3 7 3 7 1 1 8 1 1 8 p p eg p p eg Tc Tc V Tc Tc V 97 * Hpg 25 (A) Hpg 27 (ATE) 150 A(TE) domain 75 3 7 3 7 85 1 1 8 1 1 8 p p eg p p eg 13 Tc Tc V Tc Tc V 13 97 * 150 MbtH 8 8 A(TE) domain 75 85 13 13 MbtH 8 8

Hpg 22 (A) Tcp 9 (AT)

3 7 3 7 1 1 8 1 1 8 p p eg p p eg Tc Tc V Tc Tc V 97 Hpg 22 (A) Tcp 9 (AT) 97

3 7 3 7 75 1 1 8 1 1 8 75 p p eg p p eg 13 Tc Tc V Tc Tc V 13 987 987 75 75 Figure13 3 — Hpg22, 25, 27 and Tcp9 co-expression with the MbtH13 Tcp11, 13 and VEG8. 8 8 The constructs Hpg22, 25, 27 and Tcp9 were co-expressed with three MbtH variants. The protein input was adjusted according to culture OD. Differences can be observed between target domain setups (A, AT, ATE) and the corresponding MbtH variants used.

2 56 associated domains, VEG8 seems to surpass Tcp17 in terms of its beneficial effects on expression. Interestingly, Hpg25 expression increased equally by co-expression of VEG8, intrinsic to the domain, and Tcp13. The impact of the non-native MbtH is even more drastic for Hpg27, in combination with Tcp17, which exhibits the weakest effects for the other enzymes. Quantitation of the expression shows that despite affecting the levels of soluble protein, no alter- ations in the 1:2 stoichiometry of A domain to MbtH was evident. 2

Substrate profiling of hpg domains

To confirm the previously predicted specificity towards hpg, the enzymes were subjected to a pyrophosphate exchange reaction to assess their sub- strate dependent adenylation potential. Therefore, domains were overex- pressed and purified using affinity chromatography targeting the C-terminal his tag. Isolated domains and modules were adjusted to a final concentra- tion of 1 µM and screened for L- or D- hpg and L- or D-pg at 0.1 and 1 mM, respectively. Phenylalanine (1 mM) and substrate free reactions served as negative controls. An initial screen showed activity for 15 out of 16 A do- mains towards L-hpg and/or D-hpg (Supplementary table 3). To compare the impact of domain setup and MbtH variants, four A domain constructs were selected for an in depth analysis. Tcp9 (AT) and Hpg22 (A), of the teicoplanin cluster, as well as Hpg25 (A) and 27 (ATE) of the VEG cluster, representing the same selection as previously stated for the in depth expression analysis (Figure 3). The enzymes were co-expressed with one of their intrinsic MbtH variants Tcp13, Tcp17 or VEG8 and subjected to in vitro analysis. Comparing the specificities of the selected domains, an overall higher adenylation veloc- ity can be observed with reactions containing L-hpg and D-hpg. Furthermore, all enzymes show activity towards L-pg, however, only Tcp9 (AT) + VEG8 shows significant activity towards D-pg. None of the reactions containing L-phenylalanine showed any product formation (Table 2). Moreover, there was no clear overall effect on the L- and D-hpg stereoselectivity. Though a slight D-hpg preference was observable with Hpg22, which is even more modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification pronounced for Tcp9. A 5-fold increased velocity towards D-hpg was mea- sured comparing to the 3 other constructs in the selection. Tcp9 showed overall higher velocities, in a range of 0.07 to 8.44 µM min-1 µM-1 for D-pg and D-hpg, respectively. The variance within Hpg22, Hpg25 and Hpg27 was generally lower. Rates of Hpg22 at 0.04 and 1.54 µM min-1 µM-1 for D-pg and D-hpg were observed, respectively. The latter activity represents the highest velocities for the remaining three constructs.

57 - VEG 8 + VEG 8 1.5 1.5

1 1 2 5 0 0 6 6 3 3 g

p A A H

0.5 0.5

0 0 Results 0 1 2 3 4 0 1 2 3 4 �me (h) �me (h) - COM + COM 1.5 1.5

1 1 0 0 2 0 6 6 3 3 g p A A H

0.5 0.5

0 0 0 1 2 3 4 0 1 2 3 4 �me (h) �me (h)

Figure 4 — Effect of MbtH co-expression and co-purification on the adenylation activity of domains Hpg20 and Hpg25. A slight turnover of L-hpg by Hpg25 is detected without an MbtH, the activity increases dramatically upon the presence of its intrinsic MbtH variant VEG8. The impact on Hpg20 is even more drastic. Activity could only be determined after co-expression with the native MbtH COM.

In addition to the substrate specificity, the performance of the differ- ent MbtH homologs was evaluated. VEG8 appears to be the most promis- cuous homologue, as it improves the catalytic properties of the A domains in the most significant manner. Tcp13 and 17 exhibit comparable properties, although at slightly lower levels. To assess the effects of omitting MbtH, Hpg20 and Hpg25 were compared with and without COM and Tcp13, respec- tively (Figure 4). Even though, Hpg25 showed notable activity towards L-hpg, the velocity increased dramatically upon co-expression of VEG8. An even more drastic effect can be observed with Hpg20. Without COM, no substrate

2 58 activation was observed. However, upon COM addition, adenylation of the substrates L-hpg and to a lower degree D-hpg were measured.

Discussion Adenylation domains are essential core components of NRPS enzymes. Their 2 ability to recruit a substrate with a high specificity makes them key elements in the synthesis of complex nonribosomal peptides. The diversity of dedi- cated substrate recognition is tremendous, as over 500 substrates have been identified [42]. However, due to a significant degree of structural conser- vation among A domains, specificity prediction software is well developed. Several programs allow for considerably accurate substrate predictions, in- cluding the NRPSpredictor [47]. This allowed the selection of domains in this study, which have the potential to adenylate the predicted 4-hydroxyphenyl- glycine substrate. Moreover, linker regions, domains and modules were accu- rately pinpointed, leading to a set of fully functional constructs. In addition to protein domain prediction, corresponding adenylation helper proteins or MbtH homologs, were successfully identified, even from unan- notated sources, as for VEG8 and TEG. Because of the high degree of con- servation, 7 MbtH sequences were identified, selected and subjected to co-­ expression and purification studies. Even though the precise stoichiometry of A domains: MbtH is still poorly defined [16], we consistently observed a 1:2 ratio, confirming other studies [16–17]. Furthermore, a typical co-elution pattern emerged. Every tested domain allowed for MbtH binding of native, as well as non-native MbtH variants. Due to the high degree of structural conser- vation among NRPS modules and MbtH proteins, the question, if non-­native MbtH do influence A domain behavior was addressed. It has been shown pre- viously [48] that upon deletion or silencing of MbtH homologs, another ge- nome encoded variant can compensate for the loss. This is in line with the behavior of the tested Tcp13, Tcp17 and VEG8 variants. All three homologs allowed for improved expression and significantly increased amino acid ac- tivation rates, when tested in vitro with the domains Tcp9, Hpg22, 25 and 27. modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification However, the degree of their influence seems to depend on the homolog used, and importantly, native copies did not always result in the strongest effects. Furthermore, also the domain setup of the A domain seems to play a key role as illustrated by the comparison of Hpg25 and Hpg27, which consist of A only and A + Te, respectively. Even though, this is not representative for all tested domains, it points at an overruling structural and conformational mechanism, contributing to the elucidation of the still poorly defined function of MbtH

59 proteins. The differences in performance may be attributed to altered asso- ciation properties. Focusing on the hypothetical interaction site as described by [19], two positions within the binding region of VEG8 compared to Tcp13 and 17 are altered. Residues A28T and D34A represent the only changes in a stretch of 13 amino acids. Especially the change from aspartic acid to alanine at position 34 seems rather drastic. Taking into account, that the A domain in- teraction sites are unaffected (Figure 2B), this mutation might play a key role in the observed differences of MbtH performance. However, only a thorough analysis of both complementing docking sites, paired with extensive muta- tional studies, could lead to a conclusive answer. Finally, the narrow range substrate profiles of the selected enzymes were determined. Overall, every module, independent of domain setup or module placement, has significant activity towards L- and D- hpg, and in most cases also exhibits L-pg adenylation activity. However, only Hpg25 shows trace- Conclusions and Perspectives Conclusions able activity without MbtH addition, underlining the essential role of MbtH proteins in the substrate activation process. Depending upon the MbtH ho- molog used, an effect on activity, non-linear with co-expression performance, can be observed. This suggests that there is at least a dual role of MbtH proteins, i.e., chaperoning the expression/folding and promoting the activity. Despite MbtH homologs, also module placement within the NRPS seems to influence the kinetic properties in vitro, best illustrated with the Initiation module Tcp9 (Table 2). The reason for this may be in the independence of ad- jacent, upstream domains and modules, allowing for a higher degree of flex- ibility, thus more rapid conformational changes of the Asub domain. It could further suggest, that due to the elevated adenylation rates, initiation mod- ules could serve as gatekeepers for the NRPS. Product formation cycles may not be initiated without activation of the first module, which is also indicated by the kinetic intermediate step analysis of GrsA [39]. Foremost, in contrast to previous findings [49], we demonstrate the essential role of MbtH homo- logues in the activation of hpg specifific adenlyation domains.

Conclusions and Perspectives

We have successfully demonstrated how NRPS modules, activating a non-proteinogenic compound, can be identified, overexpressed and char- acterized. Through bioinformatics tools, alongside a non-radiolabel, multi- well, pyrophosphate exchange assay, a medium to high throughput experi- mental setup was established. Due to a high degree of conservation among enzymes of the ANL superfamily, including NRPS adenylation domains, this

2 60 setup allows for the identification and characterization of novel domains, ac- tivating any of the over 500 known substrates. With respect to the rapid im- provements in engineering techniques [50] for multi-modular enzymes, this method provides a fast and straight forward way of increasing the target pool of domains. By using this pool, this could give rise to novel NRPS systems, which are able to efficiently synthetize customized compounds with new application spectra and contribute to ever growing demand of novel bioac- 2 tive compounds, including the thriving search for new, alternative antibiotics.

Acknowledgements

The Authors would like to give their gratitude to Christian Rausch, for the work done on the development of adenylation domain prediction tools and the identification of potential targets for this study. This work was supported by the BE-Basic Foundation, an international public–private partnership.

References

1. Bode HB, Brachmann AO, Jadhav KB, synthetase activation in Aspergillus fu- Seyfarth L, Dauth C, Fuchs SW, et al. migatus. Chembiochem. 2005;6: 679–85. Structure elucidation and activity of doi:10.1002/cbic.200400147 kolossin A, the D-/L-pentadecapeptide 5. Reger AS, Wu R, Dunaway-Mariano D, product of a giant nonribosomal pep- Gulick AM. Structural characterization tide synthetase. Angew Chem Int Ed of a 140 degrees domain movement Engl. Germany; 2015;54: 10352–10355. in the two-step reaction catalyzed by doi:10.1002/anie.201502835 4-chlorobenzoate:CoA ligase. Biochem- 2. Von Döhren H, Dieckmann R, Pave- istry. 2008;47: 8016–25. la-Vrancic M. The nonribosomal code. doi:10.1021/bi800696y Chem Biol. 1999;6: 273–279. 6. Yonus H, Neumann P, Zimmermann S, 3. Ku J, Mirmira RG, Liu L, Santi D V. Expres- May JJ, Marahiel MA, Stubbs MT. Crystal

sion of a functional non-ribosomal peptide structure of DltA. Implications for the re- modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification synthetase module in Escherichia coli by action mechanism of nonribosomal pep- coexpression with a phosphopantetheinyl tide synthetase adenylation domains. J transferase. Chem Biol. 1997;4: 203–207. Biol Chem. 2008;283: 32484–91. doi:10.1016/S1074-5521(97)90289-1 doi:10.1074/jbc.M800557200 4. Neville C, Murphy A, Kavanagh K, Doyle 7. Marahiel MA, Stachelhaus T, Mootz HD. S. A 4’-phosphopantetheinyl transfer- Modular peptide synthetases involved ase mediates non-ribosomal peptide in nonribosomal peptide synthesis.

61 Chem Rev. American Chemical Society; 15. Boll B, Taubitz T, Heide L. Role of 1997;97: 2651–2674. MbtH-like proteins in the adenylation doi:10.1021/cr960029e of tyrosine during aminocoumarin and 8. Weber T, Marahiel MA. Exploring the vancomycin biosynthesis. J Biol Chem. domain structure of modular nonribo- 2011;286: 36281–36290. somal peptide synthetases. Structure. doi:10.1074/jbc.M111.288092 2001;9: R3–R9. 16. Felnagle EA, Barkei JJ, Park H, Podev- doi:10.1016/S0969-2126(00)00560-8 els AM, McMahon MD, Drott DW, et al. 9. Challis GL, Naismith JH. Structural as- MbtH-like proteins as integral compo- pects of non-ribosomal peptide biosyn- nents of bacterial nonribosomal pep-

References thesis. Curr Opin Struct Biol. 2004;14: tide synthetases. Biochemistry. Ameri- 748–756. doi:10.1016/j.sbi.2004.10.005 can Chemical Society; 2010;49: 8815–7. 10. Balibar CJ, Vaillancourt FH, Walsh CT. doi:10.1021/bi1012854 Generation of D amino acid residues in 17. Zhang W, Heemstra JR, Walsh CT, Im- assembly of arthrofactin by dual con- ker HJ. Activation of the pacidamycin densation/epimerization domains. Chem pacl adenylation domain by MbtH-like Biol. 2005;12: 1189–200. proteins. Biochemistry. 2010;49: 9946– doi:10.1016/j.chembiol.2005.08.010 9947. doi:10.1021/bi101539b 11. Teruya K, Tanaka T, Kawakami T, Akaji K, 18. Davidsen JM, Bartley DM, Townsend CA. Aimoto S. Epimerization in peptide thio- Non-ribosomal propeptide precursor in ester condensation. J Pept Sci. 2012;18: nocardicin A biosynthesis predicted from 669–77. doi:10.1002/psc.2452 adenylation domain specificity depen- 12. Haslinger K, Peschke M, Brieke C, Maxi- dent on the MbtH family protein NocI. J mowitsch E, Cryle MJ. X-domain of pep- Am Chem Soc. 2013;135: 1749–59. tide synthetases recruits oxygenases cru- doi:10.1021/ja307710d cial for glycopeptide biosynthesis. Nature. 19. Herbst D a., Boll B, Zocher G, Stehle T, 2015;521: 105–9. doi:10.1038/nature14141 Heide L. Structural basis of the interac- 13. Mofid MR, Finking R, Essen LO, Marahiel tion of MbtH-like proteins, putative reg- MA. Structure-based mutational analy- ulators of nonribosomal peptide biosyn- sis of the 4’-phosphopantetheinyl trans- thesis, with adenylating enzymes. J Biol ferases Sfp from Bacillus subtilis: carrier Chem. 2013;288: 1991–2003. protein recognition and reaction mech- doi:10.1074/jbc.M112.420182 anism. Biochemistry. 2004;43: 4128–36. 20. Baltz RH. Function of MbtH homologs doi:10.1021/bi036013h in nonribosomal peptide biosynthesis 14. Buchko GW, Kim CY, Terwilliger and applications in secondary metabo- TC, Myler PJ. Solution structure of lite discovery. J Ind Microbiol Biotechnol. Rv2377c-founding member of the MbtH- 2011;38: 1747–1760. like protein family. Tuberculosis. Else- doi:10.1007/s10295-011-1022-8 vier Ltd; 2010;90: 245–251. 21. Tatham E, Sundaram Chavadi S, Mo- doi:10.1016/j.tube.2010.04.002 handas P, Edupuganti UR, Angala SK,

2 62 Chatterjee D, et al. Production of myco- 28. Watanabe K, Oguri H, Oikawa H. Diversi- bacterial cell wall glycopeptidolipids re- fication of echinomycin molecular struc- quires a member of the MbtH-like pro- ture by way of chemoenzymatic synthesis tein family. BMC Microbiol. 2012;12: 118. and heterologous expression of the engi- doi:10.1186/1471-2180-12-118 neered echinomycin biosynthetic pathway. 22. Wolpert M, Gust B, Kammerer B, Heide L. Curr Opin Chem Biol. 2009;13: 189–196. Effects of deletions of mbtH-like genes on doi:10.1016/j.cbpa.2009.02.012 2 clorobiocin biosynthesis in Streptomyces 29. Thirlway J, Lewis R, Nunns L, Al Na- coelicolor. Microbiology. 2007;153: 1413–23. keeb M, Styles M, Struck AW, et al. In- doi:10.1099/mic.0.2006/002998-0 troduction of a non-natural amino acid 23. Boll B, Heide L. A domain of RubC1 of ru- into a nonribosomal peptide antibiotic bradirin biosynthesis can functionally re- by modification of adenylation domain place MbtH-like proteins in tyrosine ade- specificity. Angew Chemie — Int Ed. nylation. ChemBioChem. 2013;14: 43–44. 2012;51: 7181–7184. doi:10.1002/cbic.201200633 doi:10.1002/anie.201202043 24. Uguru GC, Milne C, Borg M, Flett F, 30. Kries H, Niquille DL, Hilvert D. A sub- Smith CP, Micklefield J. Active-site mod- domain swap strategy for reengineer- ifications of adenylation domains lead ing nonribosomal peptides. Chem Biol. to hydrolysis of upstream nonribosomal 2015;22: 640–8. peptidyl thioester intermediates. J Am doi:10.1016/j.chembiol.2015.04.015 Chem Soc. 2004;126: 5032–3. 31. Tanovic A, Samel SA, Essen L-O, Mar- doi:10.1021/ja048778y ahiel MA. Crystal structure of the ter- 25. Zhang K, Nelson KM, Bhuripanyo K, mination module of a nonribosomal Grimes KD, Zhao B, Aldrich CC, et al. En- peptide synthetase. Science. 2008;321: gineering the substrate specificity of the 659–63. doi:10.1126/science.1159850 DhbE adenylation domain by yeast cell 32. Lee TV, Johnson LJ, Johnson RD, Koul- surface display. Chem Biol. Elsevier Ltd; man A, Lane GA, Lott JS, et al. Structure 2013;20: 92–101. of a eukaryotic nonribosomal peptide doi:10.1016/j.chembiol.2012.10.020 synthetase adenylation domain that ac- 26. Wang M, Zhao H. Characterization and tivates a large hydroxamate amino acid engineering of the adenylation domain in siderophore biosynthesis. J Biol Chem. of a NRPS-like protein: a potential bio- 2010;285: 2415–27.

catalyst for aldehyde generation. ACS doi:10.1074/jbc.M109.071324 modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification Catal. 2014;4: 1219–1225. 33. Mitchell CA, Shi C, Aldrich CC, Gulick doi:10.1021/cs500039v AM. Structure of PA1221, a nonribosomal 27. Stachelhaus T, Mootz HD, Marahiel MA. peptide synthetase containing adenyla- The specificity-conferring code of ade- tion and peptidyl carrier protein domains. nylation domains in nonribosomal peptide Biochemistry. 2012;51: 3252–3263. synthetases. Chem Biol. 1999;6: 493–505. 34. Beer R, Herbst K, Ignatiadis N, Kats doi:10.1016/S1074-5521(99)80082-9 I, Adlung L, Meyer H, et al. Creating

63 functional engineered variants of the constituent of peptide antibiotics. Chem single-module non-ribosomal peptide Biol. 2000;7: 931–942. synthetase IndC by T domain exchange. doi:10.1016/S1074-5521(00)00043-0 Mol Biosyst. 2014;10: 1709–18. 41. Hoertz AJ, Hamburger JB, Gooden DM, doi:10.1039/c3mb70594c Bednar MM, McCafferty DG. Studies on 35. Baldwin JE, Shiau CY, Byford MF, the biosynthesis of the lipodepsipep- Schofield CJ. Substrate specificity of tide antibiotic ramoplanin A2. Bioor- L-delta-(alpha-aminoadipoyl)-L-cys- ganic Med Chem. Elsevier Ltd; 2012;20: teinyl-D-valine synthetase from Cepha- 859–865. doi:10.1016/j.bmc.2011.11.062 losporium acremonium: demonstration 42. Caboche S, Pupin M, Leclère V, Fon-

References of the structure of several unnatu- taine A, Jacques P, Kucherov G. NORINE: ral tripeptide products. Biochem J. a database of nonribosomal peptides. 1994;301, Pt 2: 367–72. Nucleic Acids Res. 2008;36: D326–31. 36. Roongsawang N, Siew PL, Washio K, doi:10.1093/nar/gkm792 Takano K, Kanaya S, Morikawa M. Phy- 43. Finn RD, Clements J, Eddy SR. HMMER logenetic analysis of condensation web server: interactive sequence sim- domains in the nonribosomal pep- ilarity searching. Nucleic Acids Res. tide synthetases. FEMS Microbiol Lett. 2011;39: W29–37. 2005;252: 143–151. doi:10.1093/nar/gkr367 doi:10.1016/j.femsle.2005.08.041 44. Finn RD, Clements J, Arndt W, Miller BL, 37. Rausch C, Hoof I, Weber T, Wohl- Wheeler TJ, Schreiber F, et al. HMMER leben W, Huson DH. Phylogenetic web server: 2015 update. Nucleic Acids analysis of condensation domains in Res. 2015;43: W30–8. NRPS sheds light on their functional doi:10.1093/nar/gkv397 evolution. BMC Evol Biol. 2007;7: 78. 45. Röttig M, Medema MH, Blin K, Weber T, doi:10.1186/1471-2148-7-78 Rausch C, Kohlbacher O. NRPSpredic- 38. Boettger D, Bergmann H, Kuehn B, She- tor2--a web server for predicting NRPS lest E, Hertweck C. Evolutionary imprint of adenylation domain specificity. Nucleic catalytic domains in fungal PKS-NRPS hy- Acids Res. 2011;39: W362–7. brids. ChemBioChem. 2012;13: 2363–2373. doi:10.1093/nar/gkr323 doi:10.1002/cbic.201200449 46. Finn RD, Coggill P, Eberhardt RY, Eddy 39. Sun X, Li H, Alfermann J, Mootz HD, SR, Mistry J, Mitchell AL, et al. The Pfam Yang H. Kinetics profiling of gramicidin protein families database: towards a S synthetase­ A, a member of nonribo- more sustainable future. Nucleic Ac- somal peptide synthetases. Biochemis- ids Res. England; 2016;44: D279–85. try. 2014;53: 7983–9. doi:10.1093/nar/gkv1344 doi:10.1021/bi501156m 47. Rausch C, Weber T, Kohlbacher O, Wohl- 40. Hubbard BK, Thomas MG, Walsh CT. leben W, Huson DH. Specificity predic- Biosynthesis of L-p-hydroxyphenylgly- tion of adenylation domains in nonribo- cine, a non-proteinogenic amino acid somal peptide synthetases (NRPS) using

2 64 transductive support vector machines species. Chem Biol. 2003;10: 541–549. (TSVMs). Nucleic Acids Res. England; doi:10.1016/S1074-5521(03)00120-0 2005;33: 5799–5808. 54. Recktenwald J, Shawky R, Puk O, Pfen- doi:10.1093/nar/gki885 ning F, Keller U, Wohlleben W, et al. 48. McMahon MD, Rush JS, Thomas MG. Nonribosomal biosynthesis of vanco- Analyses of MbtB, MbtE, and MbtF sug- mycin-type antibiotics: a heptapeptide gest revisions to the mycobactin biosyn- backbone and eight peptide synthe- 2 thesis pathway in mycobacterium tuber- tase modules. Microbiology. 2002;148. culosis. J Bacteriol. 2012;194: 2809–2818. doi:10.1099/00221287-148-4-1105 doi:10.1128/JB.00088-12 55. Bentley SD, Chater KF, Cerdeño-Tárraga 49. Stegmann E, Rausch C, Stockert S, Burk- A-M, Challis GL, Thomson NR, James KD, ert D, Wohlleben W. The small MbtH-like et al. Complete genome sequence of the protein encoded by an internal gene of the model actinomycete Streptomyces co- balhimycin biosynthetic gene cluster is elicolor A3(2). Nature. 2002;417: 141–7. not required for glycopeptide production. doi:10.1038/417141a FEMS Microbiol Lett. 2006;262: 85–92. 56. Chiu HT, Hubbard BK, Shah AN, Eide J, doi:10.1111/j.1574-6968.2006.00368.x Fredenburg RA, Walsh CT, et al. Molecu- 50. Doekel S, Coëffet-Le Gal M-F, Gu J-Q, lar cloning and sequence analysis of the Chu M, Baltz RH, Brian P. Nonribosomal complestatin biosynthetic gene clus- peptide synthetase module fusions to ter. Proc Natl Acad Sci U S A. 2001;98: produce derivatives of daptomycin in 8548–53. doi:10.1073/pnas.151246498 Streptomyces roseosporus. Microbiol- 57. Banik JJ, Brady SF. Cloning and char- ogy. 2008;154: 2872–80. acterization of new glycopeptide gene doi:10.1099/mic.0.2008/020685-0 clusters found in an environmental DNA 51. Sosio M. Organization of the teicoplanin megalibrary. Proc Natl Acad Sci U S A. gene cluster in Actinoplanes teichomy- 2008;105: 17273–7. ceticus. Microbiology. 2004;150: 95–102. doi:10.1073/pnas.0807564105 doi:10.1099/mic.0.26507-0 58. van Wageningen AA, Kirkpatrick PN, 52. Pootoolal J, Thomas MG, Marshall CG, Williams DH, Harris BR, Kershaw JK, Neu JM, Hubbard BK, Walsh CT, et Lennard NJ, et al. Sequencing and analy- al. Assembling the glycopeptide an- sis of genes involved in the biosynthesis tibiotic scaffold: The biosynthesis of of a vancomycin group antibiotic. Chem

A47934 from Streptomyces toyocaensis Biol. 1998;5: 155–162. modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification NRRL15009. Proc Natl Acad Sci U S A. doi:10.1016/S1074-5521(98)90060-6 2002;99: 8962–7. 59. Larkin MA, Blackshields G, Brown NP, doi:10.1073/pnas.102285099 Chenna R, McGettigan PA, McWilliam H, 53. Sosio M, Stinchi S, Beltrametti F, La- et al. Clustal W and Clustal X version 2.0. zzarini A, Donadio S. The gene cluster Bioinformatics. 2007;23: 2947–8. for the biosynthesis of the glycopep- doi:10.1093/bioinformatics/btm404 tide antibiotic A40926 by Nonomuraea

65 Supplementary data

Supplementary data Table 1 — First selection of targets. Stachelhaus code was determined in NRPSpredictor. Targets for final selection are highlighted in grey.

organism enzyme module Actinoplanes ATCC 33076 NRPS 1 Actinoplanes ATCC 33076 NRPS 4 Actinoplanes ATCC 33076 NRPS 5 Actinoplanes ATCC 33076 NRPS 2 Actinoplanes ATCC 33076 NRPS 4 Actinoplanes ATCC 33076 NRPS 8 Chlorobium phaeobacteroides DSM 266 Amino acid adenylation domain 1 Herpetosiphon aurantiacus ATCC 23779 Amino acid adenylation domain 1 Herpetosiphon aurantiacus ATCC 23779 Amino acid adenylation domain 2 Streptomyces griseus subsp. Griseus JCM 4626 NRPS 1 Adineta vaga NRPS-like protein 3 Supplementary data Uncultured soil bacterium NRPS 1 Uncultured soil bacterium NRPS 1 Uncultured soil bacterium NRPS 1 Uncultured soil bacterium NRPS 2 Uncultured soil bacterium NRPS 1 Uncultured soil bacterium NRPS 1 Uncultured soil bacterium NRPS 2 Bacillus mycoides Rock 1–4 ATP-dependent leucine adenylase 1 Dickeya dadantii (strain Ech703) Amino acid adenylation domain 1 Chitinophaga pinensis (strain ATCC 43595) AMP-dependent synthetase and ligase 1 Streptomyces sp. ACT-1 Amino acid adenylation domain 1 Streptomyces roseosporus NRRL 15998 NRPS 2 Streptomyces lividans TK24 NRPS 5 Segniliparus rotundus (strain DSM 44985) Amino acid adenylation domain 1 Amycolatopsis orientalis NRPS 1 Amycolatopsis orientalis NRPS 2 Streptomyces fungicidicus NRPS 2 Streptomyces fungicidicus NRPS 4 Streptomyces fungicidicus NRPS 8 Streptomyces fungicidicus NRPS 1 Streptomyces fungicidicus NRPS 4 Streptomyces fungicidicus NRPS 5 Shewanella denitrificans (strain DSM 15013) Amino acid adenylation domain 1 Frankia sp. (strain CcI3) Amino acid adenylation domain 3 Nocardia farcinica NRPS 1 Actinoplanes teichomyceticus NRPS 1 Actinoplanes teichomyceticus NRPS 2 Actinoplanes teichomyceticus NRPS 1 Actinoplanes teichomyceticus NRPS 1 Actinoplanes teichomyceticus NRPS 2 Actinoplanes teichomyceticus NRPS 1 Nonomuraea sp. ATCC 39727 NRPS 1 Nonomuraea sp. ATCC 39727 NRPS 1 Nonomuraea sp. ATCC 39727 NRPS 2 Streptomyces toyocaensis NRPS 1 Streptomyces toyocaensis NRPS 1 Streptomyces toyocaensis NRPS 2 Amycolatopsis mediteranei DSM 5908 NRPS 1 Amycolatopsis mediteranei DSM 5908 NRPS 2 Streptomyces lavendulae NRPS 1 Streptomyces lavendulae NRPS 1 Streptomyces lavendulae NRPS 2 Streptomyces lavendulae NRPS 1 Streptomyces lavendulae NRPS 1 Streptomyces coelicolor NRPS 6 Streptomyces coelicolor NRPS 1 Streptomyces roseosporus NRRL 11379 NRPS 2

2 66 accession code Stachelhaus motif predicted specificity module code gi|112084466|gb|ABI05682.1|_m1 DAYHLGLLCK D-hydroxyphenylglycine gi|112084466|gb|ABI05682.1|_m4 DAYHLGLLCK D-hydroxyphenylglycine gi|112084466|gb|ABI05682.1|_m5 DAYHLGLLCK D-hydroxyphenylglycine 2 gi|112084469|gb|ABI05683.1|_m2 DAFHLGLLCK D-hydroxyphenylglycine gi|112084469|gb|ABI05683.1|_m4 DAYHLGLLCK D-hydroxyphenylglycine gi|112084469|gb|ABI05683.1|_m8 DAYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_A1BDX6_m1 DVYYLGGICK L/D-hydroxyphenylglycine lcl|UniRef100_A9AUJ0_m1 DIFHFGLIIK L/D-hydroxyphenylglycine lcl|UniRef100_A9B4Q2_m2 DVYYLGGICK L/D-hydroxyphenylglycine lcl|UniRef100_B1W2Q4_m1 DMYHLGLMDK L/D-hydroxyphenylglycine lcl|UniRef100_B3G4H6_m3 DIQHLVLLVK L/D-hydroxyphenylglycine lcl|UniRef100_B7T1B9_m1 DAFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_B7T1C0_m1 DAFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_B7T1C1_m1 DIFHLGLLCK D-hydroxyphenylglycine Hpg25/27 lcl|UniRef100_B7T1C1_m2 DAVHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_B7T1D0_m1 DAFHLGLLCK L/D-hydroxyphenylglycine lcl|UniRef100_B7T1D2_m1 DIFHLGLLCK D-hydroxyphenylglycine Hpg23 lcl|UniRef100_B7T1D2_m2 DALHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_C3AKA8_m1 DARHLVLLAK L/D-hydroxyphenylglycine lcl|UniRef100_C6CCJ1_m1 DIFHFGLILK L/D-hydroxyphenylglycine lcl|UniRef100_C7PT83_m1 DARHLALLVK L/D-hydroxyphenylglycine lcl|UniRef100_D1XA32_m1 DMYHLGLMDK L/D-hydroxyphenylglycine lcl|UniRef100_D6AU62_m2 DIYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_D6ES40_m5 DVYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_D6ZFQ6_m1 DAFHPGFLAK L/D-hydroxyphenylglycine lcl|UniRef100_O52820_m1 DIFHLGLLCK D-hydroxyphenylglycine Hpg24 lcl|UniRef100_O52820_m2 DAVHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q06YZ1_m2 DAYHLGMLCK D-hydroxyphenylglycine lcl|UniRef100_Q06YZ1_m4 DAYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q06YZ1_m8 DAYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q06YZ2_m1 DAYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q06YZ2_m4 DAYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q06YZ2_m5 DAYHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q12IB7_m1 DVRHLALLAK L/D-hydroxyphenylglycine lcl|UniRef100_Q2JA75_m3 DAWHLGLMCK D-hydroxyphenylglycine lcl|UniRef100_Q5Z1T0_m1 DALHPGHVCK L/D-hydroxyphenylglycine lcl|UniRef100_Q6ZZJ4_m1 DIFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q6ZZJ4_m2 DALHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q6ZZJ6_m1 DAFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q70AZ7_m1 DIFHLGLLCK D-hydroxyphenylglycine Hpg22 lcl|UniRef100_Q70AZ7_m2 DALHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q70AZ9_m1 DAFHLGLLCK D-hydroxyphenylglycine Tcp9 lcl|UniRef100_Q7WZ66_m1 DAFHLGLLCK D-hydroxyphenylglycine Hpg3/4 lcl|UniRef100_Q7WZ74_m1 DIFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q7WZ74_m2 DALHLGLLCK D-hydroxyphenylglycine

lcl|UniRef100_Q8KLL3_m1 DAFHLGLLCK D-hydroxyphenylglycine Hpg1 modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification lcl|UniRef100_Q8KLL5_m1 DIFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q8KLL5_m2 DAFHLGLLCK D-hydroxyphenylglycine Hpg21 lcl|UniRef100_Q939Z0_m1 DIFHLGLLCK D-hydroxyphenylglycine Hpg11 lcl|UniRef100_Q939Z0_m2 DAVHLGLLCK D-hydroxyphenylglycine Hpg13/14 lcl|UniRef100_Q93N86_m1 DAFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q93N87_m1 DIFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q93N87_m2 DIFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q93N88_m1 DIFHLGLLCK D-hydroxyphenylglycine Hpg20/26 lcl|UniRef100_Q93N89_m1 DIFHLGLLCK D-hydroxyphenylglycine lcl|UniRef100_Q9Z4X6_m6 DVYHLGLLCK D-hydroxyphenylglycine Hpg19 lcl|UniRef100_UPI0001AF6D2A_m1 DALHPGHVCK L/D-hydroxyphenylglycine lcl|UniRef100_UPI0001AF7653_m2 DIYHLGLLCK D-hydroxyphenylglycine

67 Supplementary data Table 2 — Overview of all over expression experiments. S is soluble, I insoluble expression of the construct; N/A not applicable; MbtH indicates the MbtH variants which were used for co-expression trials. Underlined MbtH enabled activity with the dedicated construct.

Construct pSCI 242/243 pMAL-c5x MbtH S I S I Tcp9 n/a n/a Yes Yes Tcp13; Tcp17; Veg8 Hpg1 n/a n/a Yes Yes Tcp13; Veg8 Hpg3 n/a n/a Yes Yes Tcp13; Veg8 Hpg11 No No Yes Yes Tcp13; Bps Hpg12 No No Yes Yes Tcp13 Hpg13 Yes Yes Yes Yes Tcp13 Hpg14 Yes Yes Yes Yes Tcp13 Hpg15 Yes Yes N/A N/A N/A Hpg16 Yes Yes N/A N/A N/A Supplementary data Hpg17 Yes Yes N/A N/A N/A Hpg18 No Yes N/A N/A N/A Hpg19 Yes Yes Yes Yes Tcp13; CDAI Hpg20 Yes Yes Yes Yes Tcp13; Com Hpg21 No No Yes Yes Tcp13 Hpg22 No No Yes Yes Tcp13; Tcp17; Veg8 Hpg23 No No Yes Yes Tcp13; Teg Hpg24 No No Yes Yes Tcp13 Hpg25 Yes Yes Yes Yes Tcp13; Tcp17; Veg8 Hpg26 Yes Yes Yes Yes Tcp13; Com Hpg27 Yes Yes Yes Yes Tcp13; Tcp17; Veg8

Supplementary table 3 — Overview of the adenylation activity determinations for the different amino acid substrates and the different combinations of adenylation domain constructs with co-purified MbtH-like proteins. MbtHs enabling any adenylation activity are underlined. N/A not applicable.

Construct L-hpg D-hpg L-pg D-pg L-phe MbtH Tcp9 Yes Yes Yes No No Tcp13; Tcp17; Veg8 Hpg1 Yes Yes Yes No No Tcp13; Veg8 Hpg3 Yes Yes Yes No No Tcp13; Veg8 Hpg11 Yes No No No No Tcp13; Bps Hpg12 N/A N/A N/A N/A N/A N/A Hpg13 Yes No No No No Tcp13 Hpg14 Yes No No No No Tcp13 Hpg15 N/A N/A N/A N/A N/A N/A Hpg16 N/A N/A N/A N/A N/A N/A Hpg17 N/A N/A N/A N/A N/A N/A Hpg18 N/A N/A N/A N/A N/A N/A Hpg19 Yes No No No No Tcp13; CDAI Hpg20 Yes No No No No Tcp13; Com Hpg21 Yes Yes No No No Tcp13 Hpg22 Yes Yes Yes No No Tcp13; Tcp17; Veg8 Hpg23 Yes No No No No Tcp13; Teg Hpg24 Yes Yes Yes No No Tcp13 Hpg25 Yes Yes Yes No No Tcp13; Tcp17; Veg8 Hpg26 Yes No No No No Tcp13; Com Hpg27 Yes Yes Yes No No Tcp13; Tcp17; Veg8

2 68 2 Identification and characterization of nonribosomal peptide synthetase modules that activate 4-hydroxyphenylglycine synthetase nonribosomal peptide of and characterization Identification

Supplementary figure 1 — Alignment of all adenylation domains in this study. ClustalX was used to align the sequences of all tested adenylation domains.

69

CHAPTER III

A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases

Reto D. Zwahlen,1 Roel A.L. Bovenberg,2,3 and Arnold J.M. Driessen1,4

1Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 2Synthetic Biology and Cell Engineering, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 3DSM Biotechnology Centre, Delft, The Netherlands 4Kluyver Centre for Genomics of Industrial Fermentations, Julianalaan 67, 2628BC Delft, The Netherlands Abstract

Antimicrobial compounds have been increasingly put into the spotlight of modification and novel discovery approaches, espe- cially since the emergence of various multi-resistant pathogenic strains. These compounds are frequently complex in structure, bearing multi-ring structures and various side branches in their scaffold. This complexity is mainly caused by its nonribosomal peptide production route, guided through nonribosomal peptide synthetases (NRPS) and associated factors, such as MbtH-like pro- teins (MLP). NRPS are large multi-modular enzymes, composed of modules which incorporate a distinct substrate into a growing peptide-compound. Due to its complexity and significant size, en- gineering has always been a delicate task, however site, domain and module directed efforts have led to feasible new NRPS de- signs exhibiting novel activities. On basis of the Nocardia lactam- durans ACVS in combination with a series of L-hydroxyphenyl­ glycine (hpg) incorporating modules, we here seek to establish a golden gate based system, for the exchange of NRPS parts in a dedicated manner. Therefore, a small part library was constructed, allowing for the assembly of chimeric NRPS, which were subse- quently co-transformed with their dedicated MLP, expressed in E. coli HM0079 and utilized in an in vitro peptide formation assay. Ultimately all samples were analyzed on LC/MS, and screened for the proposed novel hpg-cysteine-valine (DLD-hpgCV) tripeptide as well as other potentially interesting compounds. The configured system, allowed for the formation of a multitude of hybrid NRPS constructs in a medium throughput manner. However, we could not confirm the production of DLD-hpgCV or any other related compound. Nonetheless, we have successfully demonstrated, how a system can be implemented, which allows for the shuffling, over- expression and subsequent compound analysis of chimeric NRPS or other modular enzymes. It does not only allow for the fast cre- ation of an expandable library of domains, modules and hybrid NRPS, but also for their subsequent characterization in a medium throughput manner. Introduction

Nonribosomal peptide synthetases are large and complex multi-modular sys- tems. As key enzymes in the production of important, pharmacologically rel- evant compounds, they gradually moved into the spotlight of combinatorial engineering efforts [1–3]. However, since these multi-modular assembly lines contain a variety of under-investigated parts, previous trials on targeted engi- neering yielded only moderate success. Even though there have been success- ful trials on exchanging sub-domains [4], domains [5] and modules, respec- tively, the resulting chimeric enzymes predominantly function at low activities. In order to enable and explore the full potential of those enzymes, a more systematic approach for the exchange of modules, domains and sub-domains­ 3 is desired. However, different bottlenecks, intrinsic to NRPS must be taken into account. Certain NRPS characteristics can be predetermined thorough bioinformatics, such as domain and module boundaries, and adenylation do- main specificities based on conserved sequence motifs [6]. Though as for many large proteins, the major bottleneck is represented by the sheer size of the underlying genes, at a size of up to 65 kb [7]. Even the ever-­decreasing costs of DNA synthesis, as well as increasingly elaborate synthesis tech- niques, do not allow for the inexpensive construction of comprehensive hybrid NRPS libraries. This is a direct consequence of the number of NRPS which are derived from rather uncharacterized or uncultivated organisms [8], bearing overruling complexities, such as very high GC content (>80 %). Es- sentially, a system is required, which allows for the exchange and incorpo- ration of various numbers of DNA parts of different sizes, in an extendable, convenient and specific manner. In the era of synthetic biology, different alternative cloning systems have been established, which enable the precise recombination of DNA. In addi- tion to in vivo systems, such as yeast recombination [9–10], there is a variety of in vitro techniques, which omit the utilization of an intermediate host. All these techniques allow for the recombination of DNA, based on either short homologous sequences or rely on the activity of the unusual type IIS class of restriction enzymes, in combination with a ligase. The latter reaction or A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A golden gate reaction [11], has advantages over homology based methods, as it allows for a one step assembly. Due to the activity of type IIS enzymes, which cleave outside of their recognition sequence, a considerable number of dedicated overhangs, which ultimately allow for the utilization of a high number of potential fragments. Furthermore, no additional oligonucleotides are required for this reaction, giving it an important advantage in terms of ease to handle as well as time consumption.

73 A system based on the golden gate principal, adjusted for the incorpora- tion of differently sized DNA fragments, would not only allow for the shuf- fling of larger genes, but may enable the reprogramming of enzymes rele- vant for the production of bioactive compounds. For instance, the alteration of natural compounds routes such as that of the NRPS derived compounds penicillin or cephalosporine has a long history [12]. To overcome resistance, the production of a variety of novel compounds has been realized, includ- ing cephaloglycin or cephalexin from cephalosporin [13–15], and ampicillin or amoxicillin from penicillin [16]. Unfortunately, all of these compounds are still produced in semi-synthetic manner, creating vast amounts of economi- cally and ecologically non-beneficial side-products. The following research portrays an attempt at creating a novel NRPS which catalyzes the formation of the tripeptide D-hydroxyphenyl-L-cysteinyl-­D- valine (DLD-hpgCV), a precursor to amoxicillin. Therefore, we make use of the Nocardia lactamdurans L-δ-(α-aminoadipyl)-L-cysteinyl-D-valine synthe- tase (ACVS), a NRPS responsible for the production of the L-α-aminoadipyl-­ L-cysteinyl-D-valine (LLD-ACV) penicillin precursor. By exchanging the first Materials and methods L-α-aminoadipic acid (aaa) activating module of the three modular NRPS, with either L-phenylglycine (pg) or L-4-hydroxyphenylglycine (hpg) specific parts, we intend to shift the native LLD-ACV specificity towards the desired DLD-hpgCV. The parts required for the first module, a series of hpg activat- ing modules, have been identified in chapter I and will serve as a fundament for the creation of a part library, intended for the assembly of DLD-hpgCV specific chimeric NRPS. This study describes attempt to generate such a chi- meric NRPS using the golden gate system in a medium throughput manner.

Materials and methods

Bacterial strains, plasmids and culture conditions

Cloning was performed using E. coli DH5α. Selection was with 25 µg/ml zeo- cin for pBAD destination plasmids, 100 µg/ml ampicillin for pMAL-c5x donor plasmids and 15 µg/ml for pACYCtac-MLP plasmids, respectively. E. coli BL21 (DE3) strain was used for expression of domains and modules. All cultures were grown using 2xPY (bacto-tryptone 15 g/l, yeast extract 10 g/l, NaCl 10 g/l, pH 7.0) at 37 °C and 200 rpm (innova shaker).

3 74 Domain prediction and separation

Domain borders prediction on the sequences of all 12 putative hpg activat- ing modules along with the ACV synthetase were done using HMMMR (ref). The Modules, consisting of the domain set A-T-(E)-C were divided into three constructs, A, T(E) and C respectively. Inter-domain linkers were determined by up- and downstream predicted domains and equally placed on the corre- sponding constructs. Constructs were either amplified using gDNA obtained from their corresponding strains (supplementary Table 1) or ordered as syn- thetic genes (Geneart). All donor constructs were cloned into ­pMAL-c5x vector at HindIII, EcoRI or EcoRV according to compatibility. Required BsrDI restriction sites were introduced as part of primer sequences (Supplemen- 3 tary table 1). Correctness of the constructs was confirmed by means of re- striction analysis and subsequent DNA sequencing (MacroGen). The accep- tor plasmid pBAD-Nl-ACVS, represents a truncated version of the full length ­pBAD-Nl-ACVS-6×his plasmid, containing a gateway cloned version of the Nocardia lactamdurans ACV synthetase and was kindly provided by DSM Bio- technology, Delft.

Golden Gate reaction

Donor and destination plasmids were added in an equimolar ratio to a total volume of 4 µl along with 1 µl T4 DNA ligase (HC 5 U/µl), 1 µl T4 DNA ligase buffer 10x, 0.5 µl BsrDI (5 U/µl) and adjusted to 10 µl using MilliQ water. All enzymes were obtained from Thermo scientific. The reactions were- in cubated in a Thermocycler PCR machine (BioRad) using the program: 5 min 37 °C; (2 min 37 °C; 5 min 16 °C) × 50; 5 min 37 °C; 20 min 80 °C; ∞ 4 °C. Re- actions were transformed and correct clones confirmed using restriction analysis. Correct constructs were transformed together with their corre- sponding pACYCtac-MbtH variant into E. coli HM0079. A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A Expression and purification of hybrid NRPS and hpg modules

Cultures were grown to an OD600 of 0.6, transferred to 18 °C and 200 rpm for 1 h and subsequently induced using 0.5 mM IPTG and 0.2 % L-arabinose. Har- vest 18 h after induction was done by spinning at 3500 g for 15 minutes. After resuspension in lysis buffer (HEPES 50 mM pH 7.0, NaCl 300 mM, DTT 2 mM, cOmpleteTM EDTA free protease inhibitor; Roche No. 04693159001), cells

75 were disrupted using sonication (6 s/15 s; on/off, 50×, 10 µm amplitude) and cell-free lysate obtained by centrifugation at 4 °C, 13000 g, 15 minutes. Puri- fied enzyme was extracted by means of Ni-NTA bead (Qiagen) supported his- tag affinity purification using gravity flow. Wash steps were performed using wash buffer (HEPES 50 mM pH7.0, NaCl 300 mM, imidazole 20 mM) and a one-step elution using elution buffer (HEPES 50 mM pH7.0, NaCl 300 mM, imidazole 250 mM). Imidazole was removed, while simultaneously concen- trating the sample with Amicon-100 spin filters (Amicon). Final concentra- tion was determined using A280 (OD 1 = 1 mg/ml).

In vitro product formation assay

Isolated enzymes were subjected to in vitro assays, in order to determine product formation properties. Assay conditions include HEPES 50 mM pH 7.0, NaCl 300 mM, ATP 5 mM pH 7.0, CoA 100 µM, sfp (NEB) 0.2 µM, L-cysteine­ 2 mM, L-valine 2 mM, MgCl2 5 mM, DTT 2 mM, hybrid NRPS 0.3–0.5 µM and Materials and methods either L- or D-hydroxyphenylglycine (L/D-hpg) 5 mM. Negative controls in- cluded reactions omitting hybrid NRPS or L/D-hpg. Full length Nocardia lact- amdurans ACVS served as positive control, using 5 mM L-α-aminoadipic acid instead of L/D-hpg. Reactions were run at 30 °C and sampling took place 0, 60, 120 and 240 minutes after initiation. 0.1 M NaOH was used to stop the reactions. Samples were subsequently reduced with 10 mM DTT at 4 °C for 16 h and stored upon analysis at −80 °C.

Liquid chromatographic and masspectrometric analysis (LC/MS)

Samples (50 µl) obtained from an in vitro reaction, was subjected to LC/MS analysis. Two technical replicates were run per sample at 5 µl each. Analysis was performed using a LC/MS Orbitrap (Thermo Scientific) in combination with a RP-C18 column (Shimadzu Shim pack XR-ODS 2.2; 3.0×75 mm). Scan range was set at 80–1600 M/Z in positive Ion (4.2 kV spray, 87.5 V capillary and 120 V of tube lens) mode, with capillary temperature set at 325 °C. A gradient program with miliQ water (A), Acetonitrile (B) and 2 % Formic acid (C) was run; 0 min; A 90 %, B 5 %, C 5 %; 4 min, A 90 %, B 5 %, C 5 %; 13 min, A 0 %, B 95 %, C 5 %; 16 min A 0 %, B 95 %, C 5 %; 16 min, A 90 %, B 5 %, C 5 %; 21 min A 90 %, B 5 %, C 5 % at a flow rate of 0.3 ml min-1. Standards of LLD-ACV; LLD- and DLD-hpgCV obtained at Bachem were used to identify peaks according to retention time and accurate mass.

3 76 Results

To establish a domain-targeted swapping mechanism for the reshuffling of NRPS, a series of different Initiation modules was used in combination with two ACVS derived di-modules. The specificities of the chosen enzymes was revealed in previous studies (chapter II) and the resulting chimeric NRPS are projected to produce hpgCV as well as potentially pgCV tripeptides, the ul- timate precursor molecules to the production of amoxicillin and ampicillin, respectively. Therefore, we adapted a plasmid based system, analogue to a golden gate reaction, which is expendable and allows for the dedicated ex- change of previously determined NRPS domains (Figure 1). Domain bound- aries were determined using a combinatorial approach focusing on the uti- 3 lization of the NCBI conserved domain database (CDD). Subsequently, all obtained sequences were screened for the presence of type IIS restriction enzyme sites, ultimately leading to the choice of BsrDI, the least abundantly occurring site. With respect to this choice, two plasmids, a donor pMAL-c5x and an acceptor, pBAD, were developed and the remaining BsrDI sites re- moved (Supplementary table 2). Following the BsrDI site removal, the 33 do- nor plasmids and the acceptor plasmid were cloned, using either PCR ampli- fication or synthetic DNA fragments (Supplementary table 1). In addition to the golden gate relevant plasmids, the 6 corresponding MbtH-like proteins (MLP) were cloned, completing the list of all required parts for the subse- quent hybrid NRPS assembly.

Golden gate reaction optimization and chimeric NRPS assembly

The obtained donor plasmids in combination with the acceptor vector, allow for the assembly of a maximum of 40 novel chimeric NRPS using either three or four NRPS parts, respectively. To achieve an efficient and specific reaction, the conditions were optimized, using the ACVS as a model. Taking ACVS; A, T and C constructs and the destination pBAD-CV plasmid, conditions as de- scribed in [11] were tested initially. The initial reaction volume of 30 µl, lead A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A to an overall efficiency of >50 % of correctly assembled constructs. Further adjustments, increasing concentration of T4 DNA ligase and BsrDI, did not lead to a consistent nor significant increase of correct constructs or total number of transformants (Data not shown). However, a comparison of reac- tion volumes of 10 µl, 20 µl and 30 µl, lead to an improvement of correct con- structs to >80 %, using a final volume of 10 µl (Figure 2). Using this as a pro- tocol, all remaining golden gate reactions were carried out. Thereby, 34 out

77 A HPG ACVS C C C A T E A T A T A T E Te

Control

B

C A D A T A T E Te A B 1 A

Acceptor 4 T E T B C B C 2

C C

C D C D 3 +BsrDI +T4 lig Results Donor BsrDI A GCAATGCCNN B GCAATGAGNN C GCAATGCANN D GCAATGTTNN CGTTACGGNN CGTTACTCNN CGTTACGTNN CGTTACAANN

C

C C A T A T A T E Te I

C C A T A T A T E Te II

C C A T E A T A T E Te III

C C A T E A T A T E Te IV

M1 M2 M3

Figure 1 — Experimental setup, part library and resulting hybrid NRPS. (A) Domain borders of hpg modules and ACVS were determined and transferred to the re- spective vectors. (B) The created donor (1–3) and acceptor vectors (4) contain BsrDI restric- tion sites which allow for the dedicated joining of generated overhangs (A–D). The reactions will ultimately lead to the production of one out of four dedicated hybrids (C).

of 40 possible hybrid constructs could be obtained. To exclude the possibility of sequence anomalies due to BsrDI or ligase activity, constructs were se- quenced at the hypothetical ligation sites. No alterations were determined

3 78 10 uL 20 uL 30 uL

10000 8000 6000

bp 3000

1000                        

9/10 8/10 7/10

Figure 2 — Optimization of golden gate assembly. The reaction conditions of the golden gate reactions were optimized, based on the original setup. A decrease in the reaction volume to 10 µl had the most beneficial impact, though no improvements were achieved through the variation of BsrDI, T4 ligase or buffer conditions. 3

- VEG 8 + VEG 8 22 I + VEG 8 .

ACVS 22 I 22 II 22 III 22 IV 22 I 22 II 22 III 22 IV conc E I E II CFL

404

Figure 3 — Sample of NRPS hybrid yield. (A) Expression of Hpg22 hybrids (I–IV) with and without the MLP VEG8. Hybrids including epimerization domains appear at 410 kD (I + II) and without E domain (III + IV) at 370 kD. (B) Expression, purification and enrichment of hybrid 22.I based on a 1000 ml culture.

and the constructs were subsequently co-transformed with their respective MLP into E. coli HM0079.

Expression, purification and in vitro assessment of hybrid NRPS

All 34 constructs were co-transformed with their respective MLP and se- quentially expressed using screening volumes of 100 ml. For this set of A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A constructs, 23 hybrids showed traceable amounts of expression following SDS-PAGE analysis in combination with coomassie brilliant blue staining (Figure 3). Constructs containing Hpg13 or Hpg26 did not reveal any expres- sion. A mixed pattern, allowing for the expression of certain hybrid variants is seen for Hpg11, 21, 24 and 27. The observed levels however, remain con- sistently beneath 1 mg l-1 culture, requiring the utilization of cultures at a 1000 ml volume for the following in vitro analysis. This resulted in a final

79 Table 1 — overview of NRPS assembly and assessment.

M1 parts (Hpg) M2 parts (ACVS) Module Assembly Expression Product A 1 T-E 2 C 3 T C I x - x x - - - - II x - - x x ü ü - 11 III x x x - - ü - - IV x x - - x ü - - I x - x x - - - - II x - - x x ü - - 13 III x x x - - ü - - IV x x - - x - - - I x - x x - ü - - II x - - x x ü - - 21 III x x x - - ü ü - IV x x - - x ü ü - I x - x x - ü ü - II x - - x x ü ü - 22 Results III x x x - - ü ü - IV x x - - x ü ü - I x - x x - ü ü - II x - - x x ü ü - 23 III x x x - - ü ü - IV x x - - x ü ü - I x - x x - ü ü - II x - - x x ü - - 24 III x x x - - ü - - IV x x - - x ü - - I x - x x - - - - II x - - x x ü - - 26 III x x x - - ü - - IV x x - - x - - - I x - x x - - - - II x - - x x ü ü - 27 III x x x - - ü ü - IV x x - - x ü ü - I x - x x - ü ü - II x - - x x ü ü - 1* III x x x - - ü ü - IV x x - - x ü ü - I x - x - - ü ü - 4** II x - - - x ü ü - - I x - x - - ü ü - Tcp9** II x - - - x ü ü - ACVS*** x - - x x ü ü -

concentration of up to 0.5 mg/ml NRPS concentrate at a purity of >50 %. The concentrated chimeric NRPS were sequentially used in an in vitro prod- uct formation assay and the samples were ultimately analyzed using LC/MS.

3 80 The resulting chromatographic profiles were screened for a series of di- and tripeptide compounds, including the desired hpgCV. However, none of the analyzed reactions, independent of substrate stereochemistry revealed any traces of LLD- nor DLD-hpgCV formation. In fact, no tripeptidic compound was found in any of the chimeric NRPS setups. However, the dipeptide con- taining cysteine and valine, CV, was formed in significant amounts, in all re- actions. Furthermore, the reconstituted ACVS exhibits a strong production of LLD-ACV in a linear fashion over the full experimental course of 4 h in addition to minor amounts of a cysteine-valine dipeptide (Data not shown).

Discussion 3

To establish a tool for the construction of large, multi-modular, chimeric NRPS, we initiated the development of a golden gate based method, allowing for a virtually scar-free synthesis of novel NRPS. As a fundament, we used a variety of hydroxyphenylglycine activating modules including their associated MLPs (chapter II) alongside the two C-terminal modules of the Nocardia lactamdu- rans ACVS. Due to the high abundance of the classic type IIS restriction site BsaI within the chosen genes, a series of alternative enzymes were evaluated and BsrDI, a 2-N overhang generating RE, was used subsequently for the pro- jected Donor and Acceptor plasmids. The proposed system is thus expandable for the utilization of novel parts within the boundaries of the dedicated nu- cleotide overhangs, intrinsic to the plasmid system. The determined and gen- erated NRPS parts were subsequently subjected to shuffling reactions and the obtained chimeric NRPS constructs, co-transformed with their respective MLP, expressed and screened for the production of hpgCV and related com- pounds using an in vitro assay in combination with a LC/MS analysis. The first step taken involved the assembly of the NRPS, was the imple- mentation and refinement of the original golden gate method [11]. T4 ligase, BsrDI and buffer adjustments did not improve on the conducted reactions. In contrast, decreasing the reaction volume, without further variation in component concentration significantly increased the reactions efficiency A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A with respect to the share of correctly assembled constructs. A further bot- tleneck was elucidated concerning the short 212 bp ACVS T domain, which appeared to be a shared factor in a series of lower efficiency reactions (Data not shown). However, by simply increasing the partial molarity of this com- ponent, we ultimately managed to obtain almost all desired constructs. The subsequent expression trials revealed a very low amounts of protein, though exclusively in soluble form (Supplementary data). Increasing the overall

81 culture volume was sufficient to bypass this problem for over half of the constructs. The remaining constructs did not show any traceable amounts of chimeric NRPS. Due to the radical reshuffling of parts originated in a variety of organisms, as well as their partial codon optimization, this appears to be a logical consequence, since translational pauses and tRNA availability play a crucial role in NRPS expression [17–19]. Furthermore, an expression pro- file analysis showed overall equally low expression probability values across all hybrid constructs, which is also reflected in the overall low expression levels of the hybrids (Data not shown). The subsequent in vitro analysis of the isolated and partially purified protein revealed a series of noteworthy observations. First of all, it appears, that the assay conditions allow for LLD- ACV formation through the ACVS. This supports that the NRPS is in its active holo state, thus the conditions should be applicable for the chimeric NRPS as well. However, the intended DLD-hpgCV tripeptide was not detected in any of the analyzed samples, though we did detect significant amounts of a cysteine-valine di-peptide. This is also seen to a lesser degree with the ACVS and has been reported in previous studies (DSM). This suggests, that the

Conclusion and outlook Conclusion novel NRPS must be present in a minimal functionally folded state, however not allowing for the incorporation of hpg. Due to the complexity of the en- zyme it is difficult to pinpoint what exactly prohibits the chimeric NRPS from properly performing, only additional experiments focused on the functional characterization could ultimately unravel this information.

Conclusion and outlook

In this study, we propose a combinatorial approach for the shuffling of NRPS including the downstream in vitro characterization and mass spectromet- ric analysis in a streamlined fashion. Despite a lack of anticipated activity in the constructed NRPS, we still show how the adapted golden gate method can be used for the efficient shuffling of NRPS parts. If paired with the lat- est information concerning structural and dynamical properties of NRPS as- sembly lines, this system could be further improved by determining additional non-conserved regions, thereby increasing the number of potential shuffling sites. Furthermore, in an attempt to create functional chimeric enzymes, rel- evant NRPS sub-parts [4] could be considered for the creation of novel parts, rather than domains or full modules, representing a more gentle and less inva- sive approach that likely maintains subdomain interactions. Overall, given the correct adjustments to the system, alongside the decreasing costs of synthetic genes, this approach could serve as a scaffold for the development of new

3 82 hybrid NRPS. Ultimately this may allow for the production of novel bioactive compounds in a more targeted fashion, thus increasing the reservoir of com- pounds to tackle and overcome a variety of contemporary societal challenges.

References

1. Awan AR, Shaw WM, Ellis T. Biosynthesis 7. Bučević-Popović V, Šprung M, Soldo B, of therapeutic natural products using syn- Pavela-Vrančič M. The A9 core sequence thetic biology. Adv Drug Deliv Rev. 2016; from NRPS adenylation domain is rele- doi:10.1016/j.addr.2016.04.010 vant for thioester formation. ChemBio- 2. Kries H, Hilvert D. Tailor-made peptide Chem. 2012;13: 1913–1920. 3 synthetases. Chem Biol. Elsevier Ltd; doi:10.1002/cbic.201200309 2011;18: 1206–7. 8. Müller CA, Oberauner-Wappis L, Pey- doi:10.1016/j.chembiol.2011.10.006 man A, Amos GCA, Wellington EMH, 3. Winn M, Fyans JK, Zhuo Y, Micklefield Berg G. Mining for nonribosomal pep- J. Recent advances in engineering non- tide synthetase and polyketide syn- ribosomal peptide assembly lines. Nat thase genes revealed a high level of Prod Rep. The Royal Society of Chemis- diversity in the Sphagnum bog metag- try; 2016;33: 317–347. enome. Müller V. Appl Environ Micro- doi:10.1039/C5NP00099H biol. N.W., Washington, DC; 2015;81: 4. Kries H, Niquille DL, Hilvert D. A sub- 5064–5072. domain swap strategy for reengineer- doi:10.1128/AEM.00631-15 ing nonribosomal peptides. Chem Biol. 9. Degryse E, Dumas B, Dietrich M, Laru- 2015;22: 640–8. elle L, Achstetter T. In vivo cloning by doi:10.1016/j.chembiol.2015.04.015 homologous recombination in yeast us- 5. Beer R, Herbst K, Ignatiadis N, Kats I, ing a two-plasmid-based system. Yeast. Adlung L, Meyer H, et al. Creating func- England; 1995;11: 629–640. tional engineered variants of the sin- doi:10.1002/yea.320110704 gle-module nonribosomal peptide syn- 10. Iizasa E, Nagano Y. Highly efficient thetase IndC by T domain exchange. Mol yeast-based in vivo DNA cloning of mul- Biosyst. 2014;10: 1709–18. tiple DNA fragments and the simultane- doi:10.1039/c3mb70594c ous construction of yeast/ Escherichia A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A 6. Rausch C, Weber T, Kohlbacher O, Wohl- coli shuttle vectors. Biotechniques. En- leben W, Huson DH. Specificity prediction gland; 2006;40: 79–83. of adenylation domains in nonribosomal 11. Engler C, Kandzia R, Marillonnet S. A peptide synthetases (NRPS) using trans- one pot, one step, precision cloning ductive support vector machines (TSVMs). method with high throughput capability. Nucleic Acids Res. England; 2005;33: PLoS One. 2008;3: e3647. 5799–5808. doi:10.1093/nar/gki885 doi:10.1371/journal.pone.0003647

83 12. Ochiai M, Morimoto A, Miyawaki T, 16. Hoeprich PD. The penicillins, old and Matsushita Y, Okada T, Natsugari H, et new. Review and perspectives. Calif al. Synthesis and structure-activity Med. 1968;109: 301–308. relationships of 7 beta-[2-(2-amino- http://www.ncbi.nlm.nih.gov/pmc/ thiazol-4-yl)acetamido]cephalosporin articles/PMC1503260/ derivatives. V. Synthesis and antibac- 17. Wolin SL, Walter P. Ribosome paus- terial activity of 7 beta-[2-(2-amino- ing and stacking during translation of thiazol-4-yl)-2-methoxyiminoacetami- a eukaryotic mRNA. EMBO J. 1988;7: do]-cephalosporin derivates and related 3559–3569. c. J Antibiot (Tokyo). Japan; 1981;34: http://www.ncbi.nlm.nih.gov/pmc/ 171–185. articles/PMC454858/ 13. Singh J, Arrieta AC. New cephalosporins. 18. Spencer PS, Siller E, Anderson JF, Barral Semin Pediatr Infect Dis. 1999;10: 14–22. JM. Silent substitutions predictably alter doi:10.1016/S1045-1870(99)80005-3 translation elongation rates and protein 14. Sader HS, Jones RN. Historical overview folding efficiencies. J Mol Biol. England; References of the cephalosporin spectrum: Four 2012;422: 328–335. generations of structural evolution. An- doi:10.1016/j.jmb.2012.06.010 timicrobic Newsletter. Elsevier; 1992. 19. Mohammad F, Woolstenhulme CJ, Green doi:10.1016/0738-1751(92)90022-3 R, Buskirk AR. Clarifying the transla- 15. Wick WE. Cephalexin, a new orally ab- tional pausing landscape in bacteria by sorbed cephalosporin antibiotic. Appl ribosome profiling. Cell Rep. 2016;14: Microbiol. 1967;15: 765–769. 686–694. http://www.ncbi.nlm.nih.gov/pmc/ doi:10.1016/j.celrep.2015.12.073 articles/PMC547059/

3 84 RE RV HindIII HindIII HindIII HindIII HindIII HindIII EcoRI HindIII HindIII HindIII - - HindIII - - - - - HindIII HindIII - - - - EcoRI - - - HindIII HindIII HindIII HindIII HindIII EcoRI

3 ACCAAGCTTACGCAATGCTCGGTGACCGGCTCCC ATTAAGCTTACGCAATGCTCACGACCCGCTGCTTTAC ATTGAATTCACGCAATGCTCCGCCTCGGTACGC ATTAAGCTTACGCAATGCTCCGCAGCCGTGGCCGGTG ATTAAGCTTACGCAATGCTCTGCTTCGGTCGCCGG - ATTAAGCTTACGCAATGCTCACGGCTGGCCGTTGC - ATTGAATTCACGCAATGCTCCGGTTCCGTGCCCGGTTC - ATTAAGCTTACGCAATGTGCACGGCCGCCAGCCGTG ATTAAGCTTACGCAATGTGCAGCTCCGCCACGTCCC ATTAAGCTTACGCAATGAAGCCGAGGCGGCCCACCG ATTAAGCTTACGCAATGTGGAACTCCGCCACCTCCC ATTAAGCTTACGCAATGAAGCCGACCTGGCCCACC ATTAAGCTTACGCAATGTGGATCTCCTCCACGTCCTG ATTAAGCTTACGCAATGAAGCCGACCCGGCCCACC ------ATTAAGCTTACGCAATGTGGTCCACCTCGTCCTGGTC - - - - - ATTAAGCTTACGCAATGAACCGGCCCCCGCCCATTC ATTAAGCTTACGCAATGCTCGGTGATTGCCAGGAGTG ATTAAGCTTTGCAATGTGGACCTCTTCGGCCACCACCTC ATTAAGCTTTGCAATGAAGGGCCGCGCGATGTCC GCGCGAAGCTTAAGCAATGGGGGTTAATTCCTCCTCTGC TTTTTTG RV (5’ -> 3’) RV EcoRI EcoRI EcoRV EcoRI EcoRI - EcoRI - EcoRV - EcoRI EcoRI EcoRI EcoRI EcoRI EcoRI EcoRI ------EcoRI - - - - - EcoRI EcoRI EcoRI EcoRI EcoRI RE FW ATTGAATTCACGCAATGCCATGGTGGCCGGCGTCGAGGTG ATTGAATTCACGCAATGCCATGGTAGGCAGACTGGGCGTGAC ATTGATATCACGCAATGCCATGCTTCCAGTCGGTCGCTTAG ATTGAATTCACGCAATGCCTATGCTCACCGTAGCCGCCATC ATTGAATTCACGCAATGCCATGCTGAGAGTTGCCGACG - ATTGAATTCACGCAATGCCATGTGGGTCCTGGAACAACTG - ATTGATATCACGCAATGCCATGGTAGCAAATCAGGCCAATC - ATTGAATTCACGCAATGCCATGAACTCCGCAGCGCAGG ATTGAATTCACGCAATGAGGCCGAACGAGTCCTTTG ATTGAATTCACGCAATGCAGGCCGCGGCGCCCGG ATTGAATTCACGCAATGAGCCGCGCACGGAGGCCGAG ATTGAATTCACGCAATGCAGGCCGCCGTGCCGGGCCTG ATTGAATTCACACGCAATGAGCGGGTGCTGTGCG ATTGAATTCACGCAATGCAGGCCGCCGTTCCGGGAC ------ATTGAATTCACGCAATGAGCCGCGTACTGCCGCTGAAAAG - - - - - ATTGAATTCACGCAATGCAGTTGGCGGCCGTGGCGCGTG ATTGAATTCACGCAATGCCATGACGTCAGCACGACACCTGAAG ATTGAATTCACGCAATGAGCAGCTGCGGGCGATCTG ATTGAATTCACGCAATGCACCCAACTTTTCTATACAAAG GAATTAAGCTTCCGCAATGTTCCGCGAACTGGACCTCATC FW (5’ -> 3’) A golden gate based system for convenient assembly of chimeric nonribosomal peptide synthetases chimeric nonribosomal peptide of assembly convenient for based system gate golden A DSM 44591 DSM 44591 S.toyocaensis pMAL-c5x-Hpg22 pMAL-c5x-Hpg23 Synthetic pSCI242-Hpg26 Synthetic pMAL-c5x-Hpg1 Synthetic pBAD-MBP-tcp9AT DSM 44591 DSM 44591 DSM 44591 DSM 44591 S.toyocaensis S.toyocaensis Synthetic Synthetic Synthetic Synthetic Synthetic Synthetic pSCI242-Hpg26 Synthetic Synthetic Synthetic Synthetic Synthetic S.toyocaensis Synthetic MS his Nl ACVS pBAD MS his Nl ACVS pBAD MS his Nl ACVS pBAD MS his Nl ACVS pBAD Template A A A A A A A A A A-T A-T T-E C T-E C T-E C T-E C T-E C T-E C T-E C T-E C T C C C A T C Acceptor Part 11 13 21 22 23 24 26 27 1 4 Tcp9 ACVS Module Supplementary table 1 — primers used for NRPS part amplification. used for table 1 — primers Supplementary

85 RV (5’ -> 3’) RV CAGCTTCCACAGCAATTGCATCCTGGTCATCCAGC CGGTAATGGCGCGCATAGCGCCCAGCGCCATCTG CAACGTTGTTGCCATAGCTACAGGCATCGTGGTGTC TGGCCCCAGTGCTGCGATGATACCGCGAGACCCACG ATGTCTGATGCAATATGGACAATTGG TGTTGGTTTTGACGATCAACTC TATGTCACCACCTGCGGTTTC References FW (5’ -> 3’) GGATGACCAGGATGCAATTGCTGTGGAAGCTGCC ATGGCGCTGGGCGCTATGCGCGCCATTACCGAGTC CGATGCCTGTAGCTATGGCAACAACGTTGCGCAAAC GTCTCGCGGTATCATCGCAGCACTGGGGCCAGATG CGCCGTCACTGCGTCTTTTAC CTGCGACCGACGGTGG AGCTTACGTGATGTACACGAG Template pMAL-c5x pMAL-c5x pMAL-c5x pMAL-c5x pBAD Nl ACVS MS his Nl ACVS pBAD pBAD Nl ACVS MS his Nl ACVS pBAD pSCI242-Hpg26 mutation C479A A839T A3635T T3818C T33C T14212C T1773A construct pMAL-c5x donor pMAL-c5x pBAD-CV acceptor pBAD-CV Hpg26 Supplementary table 2 — primers used for BsrDI site removal in donor and acceptor vectors. and acceptor in donor BsrDI site removal used for table 2 — primers Supplementary

3 86

CHAPTER IV

Biochemical and structural characterization of the Nocardia lactamdurans L-δ-(α-aminoadipyl)- L-cysteinyl-D-valine synthetase

Reto D. Zwahlen,1 Riccardo Iacovelli,1 Sathish N. Yadav Kadapalakere,2 Gert T. Oostergetel,2 Roel A.L. Bovenberg,3,4 and Arnold J.M. Driessen1,5

1Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 2Electron Microscopy, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 3Synthetic Biology and Cell Engineering, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 4DSM Biotechnology Centre, Delft, The Netherlands 5Kluyver Centre for Genomics of Industrial Fermentations, Julianalaan 67, 2628BC Delft, The Netherlands Abstract

The L-δ-(α-aminoadipyl)-L-cysteinyl-D-valine synthetase (ACVS) is a nonribosomal peptide synthetase (NRPS) that fulfils a crucial role in the biosynthesis of β-lactams. Although some of the en- zymological aspects of ACVS have been elucidated, the large size of the protein, over 400 kD, has hampered expression and stable purification. To biochemically and structurally characterize the enzyme, the Nocardia lactamdurans ACVS was expressed in E. coli HM0079. Using a two-step, affinity — size exclusion approach, the protein was purified to homogeneity and characterized for peptide formation. Conditions were established allowing for the stabiliza- tion of the flexible ACVS protein and electron microscopic analysis yielding a first low resolution structural model for a multi-modular NRPS. Introduction

Nonribosomal peptides (NRP) represent a versatile group of low to medium molecular weight compounds that exhibit various biological activities. These peptides are exclusively produced by nonribosomal peptide synthetases (NRPS) and do not only contain proteinogenic amino acids, but may also con- tain a wide variety of non-proteinogenic amino acids or carboxylic acids [1]. NRP often undergo a series of modifications in trans, whether through the action of the NRPS or by further tailoring enzymes. Due to the relative sim- plicity and overall significance, the b-lactam production pathway has been a paradigm for related research fields. Three distinct enzymatic steps are involved in the production of β-lactam, with the NRPS L-δ-(α-aminoadipyl)-­ L-cysteinyl-D-valine synthetase (ACVS) providing the tripeptide L-δ-(α-­ aminoadipyl)-L-cysteinyl-D-valine (LLD-ACV) as the precursor for β-lactam antibiotics such as penicillins or cephalosporines (Figure 1). ACVS is a three modular NRPS responsible for the incorporation of L-α-aminoadipic acid, L-cysteine and L-valine into the LLD-ACV tripeptide. These amino acids are inserted in the final product in a co-linear fashion, thus the position of the incorporated substrate corresponds to the position of the respective module within the primary NRPS sequence [2]. Peptide formation itself is strictly determined by the selectivity of the domains of the ACVS. NRP synthesis 4 universally starts in every module, with the adenylation (A) domain, serving as a highly selective gate keeper, which recruits and adenylates a distinct substrate, thereby forming an adenyl-substrate conjugate. Subsequently, the conjugate is transferred to the downstream thiolation (T) domain, which serves in its active, phosphopantheteinated (ppant) state as a flexible linker domain, guiding the activated substrates to the donor- and acceptor sites of the up- or downstream condensation (C) domains, where peptide formation occurs with the upstream substrate being released from the ppant moiety. Ultimately, the newly synthesized peptide is modified either in cis or in trans. In ACVS, L-valine is epimerized via an intrinsic epimerization (E) domain, and -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

finally released in a thioesterification reaction by means of a highly selec- -( α δ tive thioesterase (Te) domain. LLD-ACV production takes place in the cytosol, L- however downstream processing of the tripeptide is compartmentalized in filamentous fungi where the cytosolic isopenicillin-N synthase (IPNS) forms the β-lactam ring, and the microbody localized acyltransferase replaces the aminoadipate moiety of isopenicillin N (IPN) for an alternative side chain. lactamdurans the Nocardia of characterization Biochemical and structural Due to the importance of the β-lactam biosynthetic pathway and wide spread of the ACVS across a range of organisms [3–4], it has been the focus point of functional studies. ACVS served as a model NRPS, aiding to establish

91 H O 3 3 H H 3 H O C O H C C O H N C S 3 H N O IPNS O H H S N N H O O 2 2 H H N N H H O O O O Te 3 E H O C O T S C Penicillins 3 H N 2 H Cephalisporines A 3 3 H H H C C O Semi-synthe�c Penicillins Semi-synthe�c S O A N C

Introduction D 3 3 H H H H 2 C C N O O R O O O 1 S T O R S S O S H N N H N O 2 H H H O N H A N O O 2 1 H R N O A C C

O D H 2 H O N H T O S H 3 3 O H H H C C O O A S O N H O N O 2 H N

4 92 the NRPS thiotemplate mechanism [5], and has since been studied in fungi such as penicillium, aspergilli, cephalosporium as well as bacterial nocardia and streptomyces species [6–12]. The biochemical reactions of NRPS sub-domains has been elucidated [13–14], however, studies on the isolated and purified ACVS are scarce [4]. The large size of the protein, 404–425 kD [4], makes it a challenge with respect to expression and purification, to gain appropri- ate amounts of sufficient quality and quantity. In combination with the high conformational dynamics and flexibility that characterizes NRPS enzymes, structural analysis has only progressed slowly [15]. Despite extensive efforts, including the solution of sub-domain, domain, di-domain and entire mod- ular structures [16], no reliable structure of an entire multi-modular NRPS has yet been solved. On the basis of available crystal structures of (sub-) domains and the determined interfaces, a model for multi-modular NRPS enzymes has been proposed [17]. Pilot studies using a combinatorial, crystal- lographic and electron microscopic approach, underlined the validity of the proposed model and further illustrate the additional complexity, originating in the highly dynamic conformation of the NRPS enzymes [18]. Here, we fo- cus on the pcbAB gene of the organism Nocardia lactamdurans that encodes a 404 kD ACVS [12]. To characterize this enzyme, we expressed the protein us- ing the E. coli strain HM0079 [19] as a platform and subsequently purified the enzyme to homogeneity. This allowed for the determination of fundamental 4 biochemical parameters and the intrinsic substrate specificity of the enzyme. Through stabilization by co-factors, substrates and gradient fixation (Grafix) [20] conditions, a conformationally homogenous sample was obtained that for the first time yielded a low resolution structure using negative stain elec- tron microscopy.

Figure 1 — ACVS domain organization and product formation. -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

The ACVS consists of a total of 10 domains arranged in three modules with distinct speci- -( α δ ficities for the incorporation of L-α-aminoadipic acid (L-α-aaa), L-cysteine and L-valine into L- the tripeptide δ-(L-α-aminoadipoyl)-L-cysteinyl-D-valine (LLD-aaa-cys-val). The domain ar- rangement is conserved [4] and follows the order: N-1(AT)-2(CAT)-3(CATTe)-C. The resulting LLD-aaa-cys-val is converted into isopenicillin-N (IPN) guided by the isopenicillin-N synthe- tase (IPNS). In the penicillin biosynthetic pathway, IPN is further converted into a distinct lactamdurans the Nocardia of characterization Biochemical and structural penicillin compound dependent upon the available side chain. Other routes can result in the formation of cephalosporines and related compounds [36], while semi-synthetic routes are applied to produce ampicillin or amoxicillin.

93 Material and methods

Strains, plasmids and general culturing conditions

All cloning procedures were performed using E. coli DH5α and cultures were grown using LB medium at 37 °C and 200 rpm and antibiotic selection was conducted utilizing 25 µg/ml zeocin. The Nocardia lactamdurans pcbAB was cloned using an intermediate gateway vector and was subsequently sub- cloned into the pBAD-plasmid (pBR322 ori; araC; pBAD, ZEO) using SbfI × NdeI sites including the introduction of a 6×his tag on the C-terminal end. This con- struct was kindly provided by DSM Sinochem Pharmaceuticals BV.

Expression and his-tag affinity purification of ACVS

Cultures were grown to an OD600 of 0.6, transferred to 18 °C and 200 rpm for 1 h and subsequently induced using 0.3 mM IPTG and 0.2 % L-arabinose. Harvest was done 18 h after induction by spinning at 3500 g for 15 minutes. After resuspension in lysis buffer (50 mM HEPES pH 7.0, 300 mM NaCl, 2 mM DTT, Complete EDTA free protease inhibitor; Roche No. 04693159001), cells were disrupted using sonication (6 s/15 s; on/off, 50x, 10 µm amplitude) and cell-free lysate obtained by centrifugation at 4 °C, 13000 g, 15 minutes. Pu- rified enzyme was extracted by means of Ni-NTA bead (Qiagen) supported his-tag affinity purification using gravity flow. Wash steps were performed using two column volumes of wash buffer (50 mM HEPES pH 7.0, 300 mM NaCl, 20 mM imidazole) followed by a three-step elution using one bed vol- Material and methods ume of each elution buffer (50 mM HEPES pH 7.0, NaCl 300 mM, imidazole 50/150 or 250 mM). Samples were concentrated if necessary, using Amicon U-100 spin filters (Amicon). Final concentration was determined using A280 or the DC protein assay (Biorad).

In vitro product formation assay

Isolated enzymes were subjected to in vitro assays, in order to determine product formation kinetics. Assay conditions initially used include 50 mM HEPES pH 7.0, 300 mM NaCl, 5 mM ATP pH 7.0, 100 µM CoA, 0.2 µM phospho- pantetheinyl transferase (Sfp, NEB), 5 mM L-α-aminoadipic acid (aaa), 2 mM L-cysteine (cys), 2 mM L-valine (val), 5 mM MgCl2, 2 mM DTT and 0.17 µM ACVS. For velocity and affinity determination, amino acid concentration of

4 94 0.1; 0.25; 0.5; 1; 2 and 5 mM were used, for specificity determination the concentration of the variable amino acid was set at 5 mM. Reactions were run at 30 °C and sampling took place after 0, 10, 20, 30, 45, 60, 120 and 240 minutes for dynamic measurements and after 0 and 240 or 960 min- utes for endpoint value determination. 0.1 M NaOH was used to stop the reactions. Samples were subsequently stored at −80 °C and reduced before LC/MS analysis using 10 mM DTT or TCEP.

Liquid chromatographic and masspectrometric analysis (LC/MS)

Samples (50 µl) obtained from an in vitro reaction were subjected to LC/MS analysis. Two technical replicates were run per sample at 5 µl each. Analy- sis was performed using a LC/MS Orbitrap (Thermo Scientific) in combination with a RP-C18 column (Shimadzu Shim pack XR-ODS 2.2; 3.0 × 75 mm). Scan range was set at 80–1600 M/Z in positive Ion (4.2 kV spray, 87.5 V capillary and 120 V of tube lens) mode, with capillary temperature set at 325 °C. A gra- dient program with miliQ water (A), acetonitrile (B) and 2 % formic acid (C) was run; 0 min; A 90 %, B 5 %, C 5 %; 4 min, A 90 %, B 5 %, C 5 %; 13 min, A 0 %, B 95 %, C 5 %; 16 min A 0 %, B 95 %, C 5 %; 16 min, A 90 %, B 5 %, C 5 %; 21 min A 90 %, B 5 %, C 5 % at a flow rate of 0.3 ml min-1. The Bis-aaa-cys-val standard 4 was obtained from Bachem and used for quantification in a standard curve at concentrations of 0.1; 0.5; 1; 5; 10; 50 and 100 µM. Novel tripeptides were iden- tified according to accurate monoisotopic mass, if not mentioned otherwise.

Affinity and Gel-filtration chromatography

His-tag affinity purified samples were utilized as a basis for further refining using a FPLC (Amersham, FPLC) linked system in combination with a XD-16 column (Amersham, L × D, volume 65 ml) and Superdex 200 column bed -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

material (Superdex 200, Sigma Aldrich). The system was equilibrated using -( α δ miliQ first, at a flow rate of 1 ml/min for 1.5 h, followed by sample buffer L- equilibration (50 mM HEPES pH 7.0, 300 mM NaCl) at the same flow rate for 2 h. Thereafter, 500–1000 µl of sample (Elution fraction or concentrate) was injected (1 ml loop), run over the equilibrated column at a flow rate of

0.2–0.5 ml/min and fractions of 0.5–1 ml were collected. The obtained pro- lactamdurans the Nocardia of characterization Biochemical and structural tein fractions were subsequently measured at either A280 or using the DC protein assay (BioRad) in order to determine the protein concentration and thereafter all collected fractions were visualized on a 5 % SDS-PAGE, 17.5 %

95 Tricine gel or a precast 5–20 % gradient gel (BioRad Nupage, Biorad) to pin- point the fraction with the highest purity and specific protein concentra- tion. Purity was determined using 2-D densitometry and gel-images were processed on AIDA software (AIDA image analyzer). The peak fractions were furthermore analyzed using dynamic light scattering (DLS) to determine par- ticle size and distribution as a parameter for sample homogeneity. DLS was conducted on a DynaPro Nanostar (Wyatt) at 4 °C and 80 % laser intensity.

Grafix gradient separation of purified ACVS

In parallel to the FPLC based purification methods, a gradient centrifugation method was applied, derived from the GraFix method. Therefore, a glycerol gra- dient in the range from 10–30 % was utilized, supplemented with 50 mM HEPES pH 7.0 and 300 mM NaCl. Gradients were created using a series of glycerol-buf- fer solutions (10; 12.5; 15; 17.5; 20; 22.5; 25; 27.5 and 30 % glycerol), layered in an ultracentrifugation tube and frozen in liquid nitrogen after the addition of every solution. The cross-linking agent glutaraldehyde (GA) was applied to the gradi- ent, therefore 0.05–0.2 % were added to the 30 % glycerol-buffer­ solution. Fro- zen gradients were subsequently thawed on ice and 100–1500 pmol of his-tag purified ACVS was carefully added to the top. The tubes were thereafter centri- fuged at 250000 g for 60–90 minutes, removed and fractionated, using a per- istaltic pump in combination with an 110 × 0.8 mm needle at 0.1–0.2 ml/min. The obtained 250–500 µl fractions were analyzed on 5 % SDS-PAGE, 17.5 % tri- cine gel or a precast 5–20 % gradient gel (BioRad Nupage, Biorad) as well as us- ing A280 or the DC protein assay (biorad) to measure the concentration. Selected Material and methods fractions were ultimately subjected to DLS and transferred to the EM facilities for negative stain analysis.

Electron-microscopic (EM) and cryo-EM structural analysis of ACVS particles

Negatively stained specimens for transmission electron microscopy were pre- pared on carbon coated copper grids using 2 % uranyl acetate. Images were re- corded with a Tecnai G2 20 Twin transmission electron microscope (FEI, Eind- hoven, the Netherlands), operated at 200 kV and equipped with an UltraScan­ 4000UHS CCD camera (Gatan, Pleasanton, CA, USA) with a pixel size of 2.24 Å at the specimen level after binning the images to 2048 × 2048 ­pixels. For cryo-electron microscopy 3 µl aliquots of the sample were applied onto

4 96 glow-discharged holey carbon grids (Quantifoil R2/2). Grids were blotted for 5 s at 100 % humidity, before being vitrified in a FEI Vitrobot using liq- uid ethane. Grids were transferred to a G2-Polara (FEI), operating at 300 kV, equipped with a Gatan imaging energy filter (GIF) in zero-loss mode and a ­Gatan 2K CCD camera. Micrographs were recorded with a pixel size of 2.63 Å at the specimen level, with a defocus of 2 µm and with a dose of 25 e− Å−2. Electron micrographs were processed using Relion-1.3 software [21]. The con- trast transfer function parameters of the G2-Polara micrographs were deter- mined with CTFFIND3 [22].

Results

Expression, purification and biochemical characterization of the Nocardia lactamdurans ACVS

The gene encoding the Nocardia lactamdurans ACVS was overexpressed in E. coli HM0079 as a C-terminal 6×his-tagged protein, and purified by Ni+ affin- ity purification followed by size exclusion chromatography. The overall yield from shaking cultures was 13.9 ± 3.4 mg pure ACVS per liter of culture. The isolate of the first step affinity purification lead to a protein solution with 4 a purity of 32.1 ± 4.8 %, which increased to 90.2 ± 0.5 after size exclusion chromatography using gel filtration by FPLC (Figure 2). The purified ACVS was subjected to in vitro product formation assays using conditions outlined in the methods section and in supplementary figure 1. A set of distinct assay variations were prepared using varying concentrations of the three substrate amino acids in combination with 0.17 µM ACVS (Figure 3). Reactions were evaluated over a 4 h time course, and analyzed on LC/MS. Resulting LLD-aaa- cys-val levels were quantified and normalized, showing near to linear prod- uct formation curves (Figure 3A–C) over the indicated time course. Maximal LLD-ACV product levels under the given conditions reached roughly 50 µM -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

after an assay time of 4 h which exceeds the enzyme concentration by more -( α δ than two orders of magnitude indicating multiple turnovers. The enzyme L- catalytic rate was determined and expressed as product formed per minute and µM ACVS. The calculated Kcat value for the ACVS was 0.78 ± 0.14 min−1. KM values were determined from the Michalis mention kinetics with a >98 % curve fit. Values of 640 ± 16, 40 ± 1 and 150 ± 4 µM were determined for lactamdurans the Nocardia of characterization Biochemical and structural L-α-aminoadipic acid, L-cysteine and L-valine respectively (Figure 3D).

97 Results

Figure 2 — Two step purification of the Nocardia lactamdurans ACVS. (A) ACVS was isolated from E. coli HM0079 cells and harvested after overnight expression at 18 °C. A cell free lysate was obtained through sonication and subsequently separated into a clear supernatant (CFL) and the pellet was resuspended in 8 M UREA (CFL (i)). The clear lysate was further purified using gravity flow in combination with a his-tag affinity chroma- tography, using two washing steps (W1, W2) and elution with 50, 150 and 250 mM imidazole, respectively (E1, E2, E3). (B) Fractions E1 and E2 were subjected to gel filtration on FPLC. 1 ml of an elution fraction was injected and fractions were collected in the same volume (1–14). Fractions 10–12 were further analyzed using negative stain EM, cryo EM and biochemical characterization methods. ACVS yields were 13.9±3.4 mg per liter of culture and the purity sets at 32.1 ± 4.8 % and 90.2 ± 0.5 % for samples obtained from the first and second purifi- cation, respectively. ▽ = single particle; ▼ = aggregation.

4 98 70 A L-α-Aminoadipic acid 70 B L-Valine 60 60

50 50 ) ) 40 40 uM uM ACV ( ACV ACV ( ACV - - 30 30 LLD LLD 20 20

10 10

0 0 0 50 100 150 200 250 0 50 100 150 200 250

time (min) time (min) 70 C L-Cysteine 1.2 D Velocity ACVS

60 1.0

50 0.8 ) 1)

40 - uM 0.6 (min ACV ( ACV

- 30 Kcat

LLD 0.4 20

10 0.2

0 0.0 0 50 100 150 200 250 0 1 2 3 4 5 time (min) Substrate (mM)

Figure 3 — Substrate dependence of the rate of ACV synthesis. 4 Three reaction series were conducted using 0.025 -( - -), 0.05 (-), 0.25 (- - -), 1 (-), 2 (- - -) and 5 mM (-) of L-α-aaa (A), L-cys (B) and L-val (C) and analyzed by LC/MS to quantify the amounts of LLD-aaa-cys-val for Michaelis menten kinetics (D) ■ L-α-aaa; ▲ L-val, ● L-cys.

Substrate specificity of the N. lactamdurans ACVS

Next we determined the enzyme substrate specificity. Therefore, three sets of reactions were arranged, varying the substrate for each of the three ACVS modules, using structurally analogous substrates. The concentration of the -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

variable amino acid was set at 5 mM. Product levels were determined as -( α δ end points after 4 h and analyzed for the formation of the predicted tripep- L- tides and related structures using LC/MS. In addition to the three native sub- strates, 14 analogues were tested in a total of 22 reaction setups (Figure 4). Next to the LLD-ACV tripeptide, we managed to detect 10 of the proposed tripeptides (M1: 1; M2: 5; M3: 3) as well as a hypothetical aaa-cys-cys tripep- lactamdurans the Nocardia of characterization Biochemical and structural tide in a reaction using aaa and cys only. Production levels vary strongly for the novel tripeptides, from 0.02 % up to 13.8 % tripeptide production rela- tive to LLD-aaa-cys-val production levels. Significantly abundant tripeptides

99 1 HS 2 HS 3 HS 4 HS Analogue Tripep�de # Mi rel prod (+-err) NH2 O O NH2 O CH3 O CH3 OH O CH3 O NH O NH O NH NH ACVS - LLD-ACV 1 363.146 100 ± 10.3 NH OH NH CH3 NH CH3 O NH CH3 OH O OH O O H3C CH3 HO O HO O NH2 O HO O L-Aspar�c acid Asp-CV 2 335.115 0 SH Phenoxyace�c acid POA-CV 3 354.125 0 5 6 7 8 M1 HS NH O O HO O CH 2 CH3 O O NH2 O O L-Glutamine Glu-CV 4 349.131 0.02 ± 0.009 3 O O NH O NH NH O NH NH OH HO NH NH OH Adipic acid AA-CV 5 348.136 0 NH CH3 OH NH O CH OH O O 2 3 OH O H C CH H C CH HO O 3 3 H3C 3 3

L-Alanine A-Ala-V 6 331.174 0.98 ± 0.02 O OH H3C S

L-Glycine A-Gly-V 7 317.159 0 9 HO 10 11 12 HO CH3 CH3 NH2 O O NH2 O OH OH HN NH2 O OH DL-Homocysteine A-Hcys-V 8 377.162 0 NH O NH O O NH NH CH3 NH OH NH O O O NH O L-Serine A-Ser-V 9 347.169 0 OH O OH O NH OH C CH OH O CH 2 3 3 H3C 3 H3C CH3 H3C CH3 M2 L-Threonine A-Thr-V 10 361.185 0.02 ± 0.003 SH CH3 O L-Penicillamine A-Pen-V 11 391.178 0.05 ± 0.005 HO O

H3C OH 13 14 15 CH3 16 CH L-Methionine A-Met-V 12 391.178 0.07 ± 0.01 CH3 3 O NH OH HN NH2 O OH O OH O H3C L-Leucine A-Leu-V 13 373.221 1.58 ± 0.21 O NH CH3 NH O O NH NH O NH O NH L-Isoleucine A-IsoL-V 14 373.221 0 HO NH2 O NH O O O CH 2 NH2 O H C CH 3 SH 3 3 HO SH HO CH3 L-Norvaline AC-NorVal 15 363.146 13.8 ± 0.59 H3C S CH3 17 18 19 20 H C CH L-Leucine AC-Leu 16 377.162 0.54 ± 0.01 OH O OH O H3C OH O OH O 3 3 CH3 OH SH NH NH NH NH L-Isoleucine AC-IsoL 17 377.162 1.21 ± 0.04 O NH O NH O NH O NH O O NH O O NH O O NH2 O NH2 O 2 2 L-Threonine AC-Thr 18 365.126 0 SH SH SH HO SH HO HO HO M3 L-Methionine AC-Met 19 395.118 0 L-Penicillamine AC-Pen 20 395.118 0 21 22 23* L-Glycine AC-Gly 21 321.099 0 HS OH O OH O O NH2 O O CH L-Alanine AC-Ala 22 335.115 0 NH NH 3 O NH O NH O NH NH OH OH L-Cysteine AC-Cys* 23 367.087 13.6 ± 1.93 NH2 O NH O O OH O SH 2 SH HO H3C CH3

Figure 4 — Substrate promiscuity of the N. lactamdurans ACVS and structures of produced tripeptides. Results Three sets of reactions were analyzed varying the amino acid on one position within the tripeptide. Structurally analogue substrates were added to a concentration of 5 mM, replac- ing either L-α-aaa (M1), L-cys (M2) or L-val (M3) (table left). Reactions were evaluated using LC/MS and peaks of interest were assessed according to accurate monoisotopic mass (Mi)

(No. 6; 13; 15 and 17) were further characterized for β-lactam formation through the action of purified IPNS enzyme, though none of the predicted β-lactam structures were detected, either due to product instability, sub-­ detection limit levels or incompatibility with the IPNS specificity.

Single particle analysis and structural evaluation of the ACVS

In order to achieve a more in depth understanding of the functional dynamics and structures of the ACVS, we further optimized the purification conditions of ACVS for a subsequent electron microscopic characterization. Initially, we

4 100 1 HS 2 HS 3 HS 4 HS Analogue Tripep�de # Mi rel prod (+-err) NH2 O O NH2 O CH3 O CH3 OH O CH3 O NH O NH O NH NH ACVS - LLD-ACV 1 363.146 100 ± 10.3 NH OH NH CH3 NH CH3 O NH CH3 OH O OH O O H3C CH3 HO O HO O NH2 O HO O L-Aspar�c acid Asp-CV 2 335.115 0 SH Phenoxyace�c acid POA-CV 3 354.125 0 5 6 7 8 M1 HS NH O O HO O CH 2 CH3 O O NH2 O O L-Glutamine Glu-CV 4 349.131 0.02 ± 0.009 3 O O NH O NH NH O NH NH OH HO NH NH OH Adipic acid AA-CV 5 348.136 0 NH CH3 OH NH O CH OH O O 2 3 OH O H C CH H C CH HO O 3 3 H3C 3 3

L-Alanine A-Ala-V 6 331.174 0.98 ± 0.02 O OH H3C S

L-Glycine A-Gly-V 7 317.159 0 9 HO 10 11 12 HO CH3 CH3 NH2 O O NH2 O OH OH HN NH2 O OH DL-Homocysteine A-Hcys-V 8 377.162 0 NH O NH O O NH NH CH3 NH OH NH O O O NH O L-Serine A-Ser-V 9 347.169 0 OH O OH O NH OH C CH OH O CH 2 3 3 H3C 3 H3C CH3 H3C CH3 M2 L-Threonine A-Thr-V 10 361.185 0.02 ± 0.003 SH CH3 O L-Penicillamine A-Pen-V 11 391.178 0.05 ± 0.005 HO O

H3C OH 13 14 15 CH3 16 CH L-Methionine A-Met-V 12 391.178 0.07 ± 0.01 CH3 3 O NH OH HN NH2 O OH O OH O H3C L-Leucine A-Leu-V 13 373.221 1.58 ± 0.21 O NH CH3 NH O O NH NH O NH O NH L-Isoleucine A-IsoL-V 14 373.221 0 HO NH2 O NH O O O CH 2 NH2 O H C CH 3 SH 3 3 HO SH HO CH3 L-Norvaline AC-NorVal 15 363.146 13.8 ± 0.59 H3C S CH3 17 18 19 20 H C CH L-Leucine AC-Leu 16 377.162 0.54 ± 0.01 OH O OH O H3C OH O OH O 3 3 CH3 OH SH NH NH NH NH L-Isoleucine AC-IsoL 17 377.162 1.21 ± 0.04 O NH O NH O NH O NH O O NH O O NH O O NH2 O NH2 O 2 2 L-Threonine AC-Thr 18 365.126 0 SH SH SH HO SH HO HO HO M3 L-Methionine AC-Met 19 395.118 0 L-Penicillamine AC-Pen 20 395.118 0 21 22 23* L-Glycine AC-Gly 21 321.099 0 HS OH O OH O O NH2 O O CH L-Alanine AC-Ala 22 335.115 0 NH NH 3 O NH O NH O NH NH OH OH L-Cysteine AC-Cys* 23 367.087 13.6 ± 1.93 NH2 O NH O O OH O SH 2 SH HO H3C CH3 4

and the resulting levels were set relative to the production of LLD-aaa-cys-val (= 100), as- suming similar ionization. The predicted structures of the novel tripeptides and their corre- sponding reactions are numbered 1–23. Italic numbers indicate production of the respective tripeptide. * = hypothetical aaa-cys-cys compound derived from L-α-aaa and L-cys only (23). Values derived from two biological and technical replicates ± standard deviation.

analyzed the ACVS obtained from the two-step purification using dynamic light scattering (DLS). The particle radius range of 8.4–15.7 nm is suggesting -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

protein heterogeneity in the sample. However, a substantial amount of pro- -( α δ tein aggregation occurred, as well as different conformations were present L- in the sample, as shown in the particle size distribution (Figure 5 A). To erad- icate the observed protein aggregation and to obtain a more homogeneous particle conformation, different additives at 1 mM concentration were eval-

uated and tested (Figure 5 B–E). There was no significant change upon the lactamdurans the Nocardia of characterization Biochemical and structural addition of MgCl2 (D), but significant improvement in particle homogeneity was obtained after addition of AMP (B), aaa, cys, val (C) and a combination of these two sets of molecules (E). To further stabilize the sample and prohibit

101 Results

4 102 particle damage we adapted the conditions of (E) to all the buffers used in the two-step purification process, resulting in one predominant conforma- tion (F). The latter sample was subsequently subjected to buffer exchange followed by negative staining and electron microscopy (Figure 6). In addition to intact particles aggregated and partly dissociated particles were visible. It appeared that the previously compact particles enter a more disordered state upon longer incubation deeming it less suitable for detailed electron microscopic analysis (Supplementary figure 3).

Achieving conformational unity assisted by gradient fixation

To overcome issues linked to the intrinsic flexibility and instability of the ACVS, we adapted the GraFix method [23] for the stabilization of ACVS particles. Therefore, a 10–30 % glycerol gradient with and without 0.2 % glutaraldehyde (GA) was prepared, which was subsequently loaded with 100–1500 pmol of the ACVS using a one-step purified sample. The sam- ples retrieved after fractionation contained 0.15–0.40 mg/ml protein (Sup- plementary figure 2A). Although no distinct protein distribution profiles emerged after substrate addition, and the subsequent native gel analysis also showed no significant difference in the state of the ACVS, the previ- 4 ously observed unfolding vanished (Supplementary figure 2B). Samples were subsequently subjected to negative stain analysis, showing an overall ho- mogenous picture without pronounced differences between the 0 % GA and the GA 0.2 % samples (Not shown). Negative stain samples allowed for the picking of a series of particles and their subsequent classification according

Figure 5 — Dynamic light scattering (DLS) of ACVS in the presence of different additives. ACVS protein derived from the two-step purification was supplemented with various addi- tives (A–E) or purified in the presence of additives(F) . Additive concentrations were used -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

as indicated, and the amino acid (aa) mix contained all three native substrates (α-aaa, L-cys -( α δ and L-val, 1 mM each). Samples were measured by DLS at a laser intensity of 80 % and a L- temperature of 4°. Particle radius (r in nm), Polydispersity (PD in %), predicted molecular weight (MW in kD), % of total mass (mass in %) and total sample peaks were determined. The most heterogeneous samples(A & D) showed Rad 15.4–15.6 nm, PD 67.3–68.5 %, MW

2011–2106 kD, and a mass of 79.1–82.5 %. Highest homogeneity was observed in (F) show- lactamdurans the Nocardia of characterization Biochemical and structural ing rad 8.4–8.6 nm, PD 4.8–11 %, MW 495–521 kD, and mass of 95.8–99.7 %. Results derived from three biological replicates and three technical replicates in addition to 10 scans per read and 10 reads per sample. ▽ = single particle; ▼ = aggregation.

103 Results

Figure 6 — Negative stain image of ACVS in the presence of AMP. Picture taken at magnification of 100kx, indicated are the different particle conformations and orientations present in the sample. ▼ = single particle; ▽ = aggregation; ▼ = dissocia- tion/impurities. 1 = Top view of compact particle; 2 = side view of compact particle; 3 = un- folded particle.

to their orientation on the grid, and resulted in the reconstruction of a 3-D model at a resolution of approx. 20 Å (Figure 7). At this limited resolution, it is not possible to assign domains or subdomains to the structure. However, the particle appears to consist of three modules, which are seemingly ar- ranged in a pseudo three-fold symmetry, resembling a helix-like structure that encompasses three channels (Figure 8). The particle projections from cryo-electron micrographs showed many unfolded or aggregated proteins and the resulting density map was of insufficient resolution to draw any sig- nificant conclusions on the 3-dimensional­ structure.

4 104 Figure 7 — Different particle orientations of the ACVS. The structures represent a 2-D average of automatically selected and subsequently grouped particles from negative stained samples. 4 -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine -( α δ L-

Figure 8 — 2-D averaged top (A) and side (B) view of ACVS particle from two shifted angles. lactamdurans the Nocardia of characterization Biochemical and structural The corresponding angle of the 3-D reconstruction is shown on the right side of the respec- tive negative stain image.

105 Discussion and conclusions

Here we report on the overexpression and purification of the Nocardia lact- amdurans ACV synthetase for its biochemical characterization and structural analysis. The gene encoding the ACVS of N. lactamdurans was heterolougsly expressed in E. coli HM0079 which contains a genomic copy of sfp, essential to activate the ACVS using a phosphopantheteine moiety. An efficient two step purification process was developed to obtain highly pure enzyme. Due to the size of over 400 kD of the protein, impurities could be minimized by size exclusion chromatography as the host cell does not expressing any intrin- sic proteins at sizes over 200 kD [24]. An initial enzymatic characterization was performed by following the production of the tripeptide LLD-aaa-cys- val. With respect to the three intrinsic ACVS modules, distinct differences in substrate affinities were noted with the initiating module showing the lowest affinity for its substrate, i.e., aminoadipate. Previous studies on the formation of product intermediates and partial reactions of NRPS enzymes suggest that the initial amino acid thiolation reaction is a rate limiting step in the assembly of nonribosomal peptides, necessary for the subsequent do- mains to adopt their distinct conformations for the peptide bond formation and product release [25–26]. Overall substrate affinities levels appear to be in line with other ACVS homologues, in particular of those of prokaryotic or- igin [27]. Adenylation domain specificities and especially the module specific velocities may also be determined in more detail, using a radiolabel based assay [28], which would allow for a more detailed comparison in relation to other characterized ACV synthetases. We furthermore determined the substrate specificity of the ACVS mod- ules within the context of the full-length enzyme by assessing the produc- Discussion and conclusions tion of tripeptides (Figure 4). Some ACVS homologues [11;29–31] exhibit a certain degree of tolerance towards substrates, despite considerably tight intrinsic control mechanisms that assure correct product formation. How- ever, with the N. lactamdurans ACVS little production was observed when aaa was replaced by potential substrate analogues. Only trace amounts of gln-cys-val resembling tripeptide were found. There seems to be no toler- ance for side chain variation, neither the absence of two side chain carbons, nor bulky-, or amino group lacking side chains were accepted. Furthermore, traceable amounts of three aaa-X-val tripeptides were found when investi- gating the module 2 (L-cysteine) promiscuity. In addition, two tripeptides, aaa-ala-val and aaa-leu-val were generated in substantial amounts. Due to the nature of the leucine and valine side chain though, it is questionable if any sort of functional downstream conversion into a biologically active

4 106 compound could be achieved. Trials using an isopenicillin-N synthetase in- deed showed no β-lactam ring formation (Data not shown). Finally, for the determination of the substrate specificity of the third module (L-valine), two novel products were observed at the level of 0.5–1 %, aaa-cys-leu and aaa- cys-ile and more importantly aaa-cys-norvaline at over 13 % of the native amounts of aaa-cys-val. It appears that there is a significant level of toler- ance towards the side chain length and the distribution of methyl-groups, however, substrates with hydroxy- or thio-groups are not incorporated at all. Further, we observed the production of the aaa-cys-cys tripeptide, which is produced under valine deficient conditions, to a level of over 13 %. Perhaps instead of a third module' cysteine incorporation however, there may be an iterative-double cysteine incorporation by module 2. The observed substrate specificity indicates further engineering potential for the N. lactamdurans ACVS. However, in order to conclusively determine the substrate specificity and potential, an extended tier of natural and non-natural substrates, such as D- or “clickable” amino acids [32–33], should be included in subsequent studies. Possibly by combinatorial methods, a powerful platform for custom NRP synthesis may be developed. Importantly, until the intra-NRPS reaction dynamics, conformational tim- ing and substructure positioning have been conclusively elucidated, global engineering efforts will remain challenging. Thus, as part of the general char- 4 acterization of the ACVS, we tried to evaluate and establish conditions for the purification as well as stabilization to ultimately retrieve a structurally homogenous protein, suitable for an electron microscopic analysis. There- fore, we initially attempted to utilize an ACVS extract obtained from the two step purification. Upon negative stain analysis of the particles after a buffer exchange reaction, a predominant conformation was observed, resembling in its crude shape a barrel or helix-like structure at a radius of about 10 nm. However, this sample is unstable and converts in time into a less ordered state or even an aggregated form. All potential co-factors as well as the three native substrates were added in a systematic manner, in order to stabilize -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

the compact particle conformation. Prediction of size and mass by DLS re- -( α δ vealed that ~20 % of the total protein is compact (Figure 5). Addition of mag- L- nesium Ions did not affect the particle size distribution, but the addition of amino acids, AMP and the combination thereof increases the compact par- ticle category to 40–50 % of the total mass. When the beneficial co-factors are present throughout the two step purification procedure, a sample with lactamdurans the Nocardia of characterization Biochemical and structural over 90 % of homogenous compact particles is obtained. Using negative stain based data collection, a 3-D reconstruction of the ACVS structure at approx- imately 20 Å resolution was generated. Trials on improving the resolution by

107 cryo electron microscopy failed as samples slowly converted into disordered conformations. To solve this problem, a gradient fixation or Grafix method [20;23] was applied in order to stabilize the compact conformation of the particles. Although stabilization could not be directly linked to this fixation, a generic stabilizing effect was observed due to the increased glycerol con- centration in the sample. This stabilizing effect appeared to rely on trace amounts of glycerol that remained after the buffer exchange step, but inter- fered with cryo EM. Also, the ACVS enzyme appears to bind to the carbon film leaving only low amounts of proteins available in the holes for imaging. Although there are multiple ways by which the problem of an unstable or conformational heterogeneous particles can be solved, the use of NRPS sub-structures obtained by X-ray diffraction is an efficient means to inter- pret the low resolution EM structure at a molecular level [18]. Product- and substrate intermediates and analogues, respectively, have been used to lock the conformation of distinct domains. This allowed for a more precise de- termination of intra- and inter-modular interactions and the associated sur- faces. Another approach would be the adjustment of the grid preparation method, shifting it towards a more gentle technique, such as the utilization of antibody-associated­ sample support grids [34–35]. Either of those ap- proaches may eventually lead to the determination of a full, multi-modular structure of a NRPS, thereby eliminating a crucial bottleneck for the system- atic engineering of NRPS as well as the downstream associated, potentially promising, bioactive compounds.

Acknowledgements Acknowledgements

The authors would like to express their gratitude towards DSM Synochem Pharmaceuticals BV, for the Nl ACVS construct, as well as for analytical and scientific support. This work was financially supported by the BE-Basic Foun- dation, an international public–private partnership.

4 108 References

1. Von Döhren H, Dieckmann R, Pave- 8. Jensen SE, Wong A, Rollins MJ, West- la-Vrancic M. The nonribosomal code. lake DW. Purification and partial char- Chem Biol. 1999;6: 273–279. acterization of delta-(L-alpha-amino- 2. Marahiel MA, Stachelhaus T, Mootz HD. adipyl)-L-cysteinyl-D-valine synthetase Modular peptide synthetases involved from Streptomyces clavuligerus. J Bacte- in nonribosomal peptide synthesis. riol. United States; 1990;172: 7269–7271. Chem Rev. American Chemical Society; 9. Theilgaard HB, Kristiansen KN, Henrik- 1997;97: 2651–2674. sen CM, Nielsen J. Purification and char- doi:10.1021/cr960029e acterization of delta-(L-alpha-amino- 3. Suring W, Meusemann K, Blanke A, adipyl)-L-cysteinyl-D-valine synthetase Marien J, Schol T, Agamennone V, et al. from Penicillium chrysogenum. Biochem Evolutionary ecology of beta-lactam J. England; 1997;327 ( Pt 1: 185–191. gene clusters in animals. Mol Ecol. En- 10. van Liempt H, von Dohren H, Kleinkauf gland; 2017;26: 3217–3229. H. delta-(L-alpha-aminoadipyl)-L-cyste- doi:10.1111/mec.14109 inyl-D-valine synthetase from Aspergil- 4. Tahlan K, Moore MA, Jensen SE. δ-(l-α-­ lus nidulans. The first enzyme in peni- aminoadipyl)-l-cysteinyl-d-valine­ synthe- cillin biosynthesis is a multifunctional tase (ACVS): discovery and perspectives. peptide synthetase. J Biol Chem. United J Ind Microbiol & Biotechnol. 2017;44: States; 1989;264: 3680–3684. 4 517–524. doi:10.1007/s10295-016-1850-7 11. Coque JJ, de la Fuente JL, Liras P, Martin JF. 5. Liempt H Van, Kleinkauf H. The first en- Overexpression of the Nocardia lactamdu- zyme in penicillin biosynthesis. IS; 1989; rans alpha-aminoadipyl-cysteinyl-valine 3680–3684. synthetase in Streptomyces lividans. The 6. Baldwin JE, Bird JW, Field RA, O’Cal- purified multienzyme uses cystathionine laghan NM, Schofield CJ. Isolation and and 6-oxopiperidine 2-carboxylate as sub- partial characterisation of ACV synthe- strates for synthesis of the tripeptide. Eur tase from Cephalosporium acremonium J Biochem. England; 1996;242: 264–270. and Streptomyces clavuligerus. The 12. Coque JJ, Martin JF, Calzada JG, Liras P. Journal of antibiotics. Japan; 1990. pp. The cephamycin biosynthetic genes pc- -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

1055–1057. bAB, encoding a large multidomain pep- -( α δ 7. Baldwin JE, Bird JW, Field RA, O’Cal- tide synthetase, and pcbC of Nocardia L- laghan NM, Schofield CJ, Willis AC. Isola- lactamdurans are clustered together in tion and partial characterisation of ACV an organization different from the same synthetase from Cephalosporium acre- genes in Acremonium chrysogenum and

monium and Streptomyces clavuligerus. Penicillium chrysogenum. Mol Microbiol. lactamdurans the Nocardia of characterization Biochemical and structural Evidence for the presence of phosphop- England; 1991;5: 1125–1133. antothenate in ACV synthetase. J Anti- 13. Keating TA, Walsh CT. Initiation, elon- biot (Tokyo). Japan; 1991;44: 241–248. gation, and termination strategies in

109 polyketide and polypeptide antibiotic 20. Kastner B, Fischer N, Golas MM, Sander biosynthesis. Curr Opin Chem Biol. B, Dube P, Boehringer D, et al. GraFix: 1999;3: 598–606. sample preparation for single-particle doi:10.1016/S1367-5931(99)00015-0 electron cryomicroscopy. Nat Methods. 14. Stein T, Vater J, Kruft V, Otto A, Wit- United States; 2008;5: 53–55. tmann-Liebold B, Franke P, et al. The doi:10.1038/nmeth1139 multiple carrier model of nonribosomal 21. Scheres SHW. Semi-automated selec- peptide biosynthesis at modular mul- tion of cryo-EM particles in RELION-1.3. tienzymatic templates. J Biol Chem. Journal of Structural Biology. 2015. pp. United States; 1996;271: 15428–15435. 114–122. doi:10.1016/j.jsb.2014.11.010 15. Payne JAE, Schoppet M, Hansen MH, 22. Rohou A, Grigorieff N. CTFFIND4: Fast Cryle MJ. Diversity of nature’s assembly and accurate defocus estimation from lines — recent discoveries in non-ribo- electron micrographs. bioRxiv. 2015; somal peptide synthesis. Mol Biosyst. http://biorxiv.org/content/early/­2015/ The Royal Society of Chemistry; 2017; 08/13/020917.abstract doi:10.1039/C6MB00675B 23. Stark H. GraFix: stabilization of frag- 16. Drake EJ, Miller BR, Shi C, Tarrasch JT, ile macromolecular complexes for sin- Sundlov JA, Leigh Allen C, et al. Struc- gle particle cryo-EM. Methods Enzy- tures of two distinct conformations of mol. United States; 2010;481: 109–126. holo-non-ribosomal peptide synthetases.­ doi:10.1016/S0076-6879(10)81005-5 Nature; 2016;529: 235–238. 24. Kaur J, Kumar A, Kaur J. Strategies http://dx.doi.org/10.1038/nature16163 for optimization of heterologous pro- 17. Marahiel MA. A structural model for tein expression in E. coli: Roadblocks

References multimodular NRPS assembly lines. Nat and reinforcements. Int J Biol Macro- Prod Rep. England; 2016;33: 136–140. mol. Netherlands; 2018;106: 803–822. doi:10.1039/c5np00082c doi:10.1016/j.ijbiomac.2017.08.080 18. Tarry MJ, Haque AS, Bui KH, Schmeing 25. Walsh CT. Insights into the chemical TM. X-ray crystallography and electron logic and enzymatic machinery of NRPS microscopy of cross- and multi-module­ assembly lines. Nat Prod Rep. The Royal nonribosomal peptide synthetase pro- Society of Chemistry; 2016;33: 127–135. teins reveal a flexible architecture. Struc- doi:10.1039/C5NP00035A ture. United States; 2017;25: 783–793.e4. 26. Sun X, Li H, Alfermann J, Mootz HD, Yang doi:10.1016/j.str.2017.03.014 H. Kinetics profiling of gramicidin S syn- 19. Gruenewald S, Mootz HD, Stehmeier P, thetase A, a member of nonribosomal Stachelhaus T. In vivo production of ar- peptide synthetases. Biochemistry. tificial nonribosomal peptide products 2014;53: 7983–9. doi:10.1021/bi501156m in the heterologous host Escherichia 27. Schwecke T, Aharonowitz Y, Palissa coli. Appl Environ Microbiol. 2004;70: H, von Dohren H, Kleinkauf H, van 3282–3291. Liempt H. Enzymatic characterisa- doi:10.1128/AEM.70.6.3282 tion of the multifunctional enzyme

4 110 delta-(L-alpha-aminoadipyl)-L-cyste- convert peptides to peptoid-based ther- inyl-D-valine synthetase from Strepto- apeutics. Chen P, editor. PLoS One. San myces clavuligerus. Eur J Biochem. En- Francisco, USA: Public Library of Sci- gland; 1992;205: 687–694. ence; 2013;8: e58874. 28. Keller U, Kleinkauf H, Zocher R. 4-Meth- doi:10.1371/journal.pone.0058874 yl-3-hydroxyanthranilic acid activating 34. Yu G, Li K, Jiang W. Antibody-based af- enzyme from actinomycin-producing finity cryo-EM grid. Methods. United Streptomyces chrysomallus. Biochemis- States; 2016;100: 16–24. try. United States; 1984;23: 1479–1484. doi:10.1016/j.ymeth.2016.01.010 29. Zhang J, Wolfe S, Demain AL. Bio- 35. Yu G, Vago F, Zhang D, Snyder JE, Yan chemical studies on the activity of R, Zhang C, et al. Single-step antibody-­ delta-(L-alpha-aminoadipyl)-L-cyste- based affinity cryo-electron microscopy inyl-D-valine synthetase from Strepto- for imaging and structural analysis of myces clavuligerus. Biochem J. England; macromolecular assemblies. J Struct 1992;283 ( Pt 3: 691–698. Biol. United States; 2014;187: 1–9. 30. Etchegaray A, Dieckmann R, Kennedy J, doi:10.1016/j.jsb.2014.04.006 Turner G, von Döhren H. ACV synthetase: 36. Liras P, Demain AL. Chapter 16 Enzy- expression of amino acid activating do- mology of beta-lactam compounds with mains of the Penicillium chrysogenum cephem structure produced by actino- enzyme in Aspergillus nidulans. Bio- mycete. 2009. pp. 401–429. chem Biophys Res Commun. 1997;237: doi:10.1016/S0076-6879(09)04816-2 4 166–169. 31. Baldwin JE, Shiau CY, Byford MF, Schofield CJ. Substrate specificity of L-delta-(alpha-aminoadipoyl)-L-cys- teinyl-D-valine synthetase from Cepha- losporium acremonium: demonstration of the structure of several unnatural tripeptide products. Biochem J. 1994;301, Pt 2: 367–72. http://www.pubmedcentral.nih.gov/ -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

articlerender.fcgi?artid=1137089&tool= -( α δ pmcentrez&rendertype=abstract L- 32. Zhu X, Zhang W. Tagging polyketides/ non-ribosomal peptides with a clickable functionality and applications. Front

Chem. Frontiers Media S.A.; 2015;3: 11. lactamdurans the Nocardia of characterization Biochemical and structural doi:10.3389/fchem.2015.00011 33. Park M, Wetzler M, Jardetzky TS, Bar- ron AE. A readily applicable strategy to

111 Supplementary material

100 100

80 80 ) )

60 60 (% (%

CV CV A A - - LL D 40 LL D 40

20 20

0 0 0 50 100 150 200 250 0 50 100 150 200 250 �me (min) �me (min)

AMP:ATP Glycerol storage 100 C 100 D

80 80 ) ) 60 60 (% (%

CV CV A A - -

LL D 40 LL D 40

20 20

0 0 0 200 400 600 800 1000 0 200 400 600 800 �me (min) �me (min)

Supplementary material Supplementary figure 1 — Overview of ACVS reaction optimization. The activity of ACVS in product formation was assessed under different reaction conditions. (A) Crowding effect of BSA: black = 1 mg/ml ACVS; ■ (■) = 0.75 mg/ml ACVS (+0.25 mg/ml BSA); ▲ (▲) = 0.5 mg/ml ACVS (+0.5 mg/ml BSA); ● (●) = 0.25 mg/ml ACVS (+0.75 mg/ml BSA); ◆ (◆) = 0.1 mg/ml ACVS (+0.9 mg/ml BSA). (B) Effects of different agents and condi- tions to terminate the reactions: black = 10 mM DTT; — ·· — ·· — = 75 °C; 10 min; — - — - — = 95 °C; 10 min; - · - · - = 55 °C; 10 min; ----- = HCl 0.1 M; ····· = NaOH 0.1 M; grey = EDTA 0.1 M. (C) Effect of AMP. Reactions were initiated upon addition of ATP and/ or AMP. ATP + AMP mM - ● =5/0; ■ = 4/1; ◆ = 3/2 ▲ = 2/3. (D) Effect of glycerol on the ACVS activity after storage for 30 days at −80 °C: glycerol concentrations: 15 % — · · —; 25 % - - - ; 35 % ······; 50 %-----.

4 112 4

Supplementary figure 2 — ACVS Gradient Fixation in the presence of substrate additions (A) and fixation with glutaraldehyde (B). -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine

6 gradients (10–30 % glycerol) were run, loading 1500 pmol of ACVS derived from a one- -( α 126 δ step purification. Collected fractions were measured at A280, represented in (A). black = L- ACVS only; grey = +AMP 1 mM; black dashed = +amino acids 1 mM; grey dashed = +MgCl2 1 mM; black dotted = All additives; grey dotted = All additives in buffers. Glycerol concen- tration of the corresponding fractions is indicated with the blue gradient bar. ACVS protein

was also subjected to glycerol gradients with and without 0.2 % glutaraldehyde (B). Frac- lactamdurans the Nocardia of characterization Biochemical and structural tions displayed (I–VI) represent peak ACVS particle concentration samples. Every condition was evaluated from at least 2 biological replicates. ▽ = single particle; ▼ = Aggregation; ▼ = dissociation/impurities.

113 Supplementary figure 3 — Different 2-D averaged images taken from cryo EM prepara- tions, representing multiple top, side and intermediate angles of the ACVS. Supplementary material

4 114 4 Supplementary figure 4 — pBAD plasmid containing the Nocardia lactamdurans ACVS. Parts of the 14992 bp pBAD-NlACVS-6×his plasmids have been assigned as, Nl A, C and V represent modules M1, 2 and 3 of the ACVS, pBAD promotor ▬, terminator ■, zeocin resis- tance marker ▭, pBR322 origin of replication ■, araC ▬ and gateway sites ■.

Supplementary Figure 4 – pBAD plasmid containing the Nocardia lactamdurans ACVS

Parts of the 14992bp pBAD‐NlACVS‐6xHIS plasmids have been assigned as, Nl A, C and V represent -aminoadipyl)-L-cysteinyl-D-valine synthetase -aminoadipyl)-L-cysteinyl-D-valine -( α δ modules M1, 2 and 3 of the ACVS, pBAD promotor �, terminator , Zeocin resistance marker �, pBR322 L- origin of replication , araC � and gateway sites . Nocardia lactamdurans lactamdurans the Nocardia of characterization Biochemical and structural

153

115

CHAPTER V

An engineered two component nonribosomal peptide synthetase (NRPS) producing a novel peptide-like compound in Penicillium chrysogenum

Reto D. Zwahlen,1 Ciprian G. Crismaru,1 Fabiola Polli,2 Catalin M. Bunduc,3 Ulrike Muller,5 Remon Boer,5 Olaf Schouten,5 Roel A.L. Bovenberg,4,5 and Arnold J.M. Driessen1,6

1Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 2Biochemical Laboratory, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 3Molecular Microbiology, Institute for Molecules, Medicines and Systems, Vrije Universiteit Amsterdam, The Netherlands 4Synthetic Biology and Cell Engineering, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 5DSM Biotechnology Centre, Delft, The Netherlands 6Kluyver Centre for Genomics of Industrial Fermentations, Julianalaan 67, 2628BC Delft, The Netherlands

manuscript in preparation Abstract

Semi-synthetic penicillins such as ampicillin and amoxicillin are major antibiotics used today. For manufacturing, a semi-synthetic production process is employed involving fermentation, enzy- matic and chemical conversion processes. To make the synthesis of semi-synthetic b-lactams more sustainable, a fully fermen- tative production process is desired. As a first step towards the development of such a process, the native nonribosomal peptide synthetase (NRPS) L-d-(a-aminoadipyl)-L-cysteinyl-­D-valine syn- thetase (ACVS) involved in b-lactam biosynthesis was engineered into a hybrid two component NRPS system, which uses dedicated inter-NRPS communicating domains (COM). This enabled the re- placement of the first aminoadipate specific module by a heterolo- gous hydroxyphenylglycine (hpg) specific module. The new hybrid NRPS system was expressed in a Penicillium chrysogenum strain lacking the entire penicillin biosynthesis cluster. Analysis of the engineered strains revealed the formation of a new product with the predicted mass of the expected hpgCV tripeptide, but the mol- ecule detected contained a thioester instead of the desired pep- tide bond. The employed two-component system revealed the potential of COM domains in engineering strategies for the devel- opment of hybrid NRPS system. Introduction

Penicillium chrysogenum represents a valuable source of secondary metab- olites and is a suitable host for pathway and strain engineering and refine- ment programs due to its metabolic versatility and long history of industrial production of penicillins, cephalosporins and statins [1–4]. Secondary me- tabolites are typically produced by genes which are clustered in the genome and can readily be recognized by the presence of dedicated sequence motifs for multi-modular enzymes [5–8]. This class comprises nonribosomal pep- tide synthetases (NRPS), polyketide synthases (PKS), terpene synthetases (TRP) and fatty acid synthetases (FAS), which all are multi-modular enzymes, consisting of a series of catalytically distinct domains, responsible for mul- tiple reactions that lead to structurally diverse bioactive compounds [9–11]. Engineering efforts based on this multi-modular organization, however, have been cumbersome, despite significant progress in the resolution of partial structures and new approaches for engineering active sites [12–18]. An alter- native approach for NRPS engineering, involves the use of NRPS communi- cating domains (COM). NRPS frequently appear clustered in functionally dif- ferent, but distinct modular compositions, where interactions are facilitated via COM domains [19]. COM domains are short, roughly 20 amino acid long, helix finger-like structures guiding the association of multi-domain NRPS An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing systems, leading to the synthesis of compounds such as teicoplanin, ramo- planin or enduracidin [9;20–21]. In order to develop more sustainable production methods for major classes of b-lactam antibiotics we here explored the engineering potential of a two 5 component NRPS system. The proposed system relies on two distinct parts, consisting of one and two NRPS modules, respectively. Specifically, two NRPS modules specific of the L- and D-hpg from teicoplanin (Tcp) synthesis in Aspergillus teycomiceticus [22–23], serve as initiation components. Through COM domains (COMA/COMD), these modules were combined in Penicillium chrysogenum with a truncated version of the L-d-(a-aminoadipyl)-L-­cysteinyl- D-valine synthetase (ACVS) in order to synthetize a tripeptide consisting of D-hydroxyphenylglycine (hpg), L-cysteine and D-valine (DLD-hpgCV) which may act as a peptide precursor towards the biosynthesis of amoxicillin. The results show the potential of using COM domains for the engineering of hy- brid NRPS systems, for the production of novel, β-lactam precursors.

119 Materials and methods

Penicillium chrysogenum strains, media, and culture conditions

Penicillium chrysogenum DS58912 (∆Pen-cluster) was kindly provided by DSM Sinochem Pharmaceuticals Netherlands B.V. To obtain mycelium of P. chrysogenum for transformation and DNA isolation, fresh spores were inoculated into YGG medium according to [24]. For the fermentation anal- ysis spores were inoculated in secondary metabolites production medium (PPM), consisting of (in g/liter) glucose, 5.0; lactose, 36; urea 4,5; Na2SO4, 2,9; (NH4)2SO4, 1,1; K2HPO4, 4,8; KH2PO4, 5.2; supplemented with 10 ml of a trace element solution containing (in g/l): FeSO4·7H2O, 24.84; MgSO4·7H2O, 0.0125; EDTA, 31.25; C6H6Na2O7, 43.75; ZnSO4·7H2O, 2.5; CaCl2·2H2O, 1.6; MgSO4·H2O, 3.04; H3BO3, 0.0125; CuSO4·5H2O, 0.625; Na2MoO·2H2O, 0.0125; CoSO4·7H2O, 0.625. The racemic L- and D-hydroxyphenylglycine (L/D-hpg) (Sigma) was added to each culture to a final concentration of 50 µg/ml (300 µM). Solu- tions were adjusted to pH 6.5. The mycelium was grown in a shaking incuba- tor at 200 rpm for 7 days at 25 °C.

NRPS selection and setup

In order to create hpgCV using a COM domain based engineering approach, Materials and methods an initiation module, an elongation/ termination di-module as well as a pair of communicating NRPS linkers were required. Two targets were selected, Tcp9 and Tcp11 (Figure 1), serving as initiation modules. Both are derived from the teicoplanin biosynthetic cluster and have shown to be specific for D/L-­hydroxyphenylglycine (Chapter II/Unpublished data). In addition, the dedicated MbtH-like protein (MLP) Tcp13 associated with Tcp9 and Tcp11 was chosen. Furthermore, two di-modules, activating L-cysteine and L-­ valine, were selected, representing the two C-terminal modules of the ACVS (­PcbAB) of P. chrysogenum (Pc) and Nocardia lactamdurans (Nl), respectively. Finally, the complementing NRPS communicating linkers, acceptor COMA and donor COMD of the Tcp9 and Tcp11 associated NRPS were chosen to enable specific AHPG – CV association. COMD was fused C-terminally to Tcp9 and Tcp11 (Tcp9-COMD/Tcp11-COMD) and COMA to the N-terminus of the CV di-module (Nl CV-COMA/CV-COMA). The setup is further illustrated in Fig- ure 1. Constructs were subsequently cloned, analyzed and transformed to E. coli HM0079 for expression and in vitro activity assays.

5 120 NRPS expression, purification and in vitro activity evaluation

E. coli HM 0079 cultures were grown in 2xPY medium, to an OD600 of 0.6, transferred to 18 °C at 200 rpm for 1 h and subsequently induced using 0.3 mM IPTG and 0.2 % L-arabinose. Harvest followed 18 h after induction by spinning at 3500 g for 15 minutes. After resuspension in lysis buffer (HEPES 50 mM pH7.0, NaCl 300 mM, DTT 2 mM, complete EDTA-free protease in- hibitor; Roche No. 04693159001), cells were disrupted using sonication (6 s/15 s; on/off, 50x, 10 µm amplitude) and cell-free lysate obtained by cen- trifugation at 4 °C, 13000 g for 15 minutes. Purified enzyme was extracted by means of Ni-NTA bead (Qiagen) supported his-tag affinity purification using gravity flow. Wash steps were performed using two column volumes wash buffer (HEPES 50 mM pH7.0, NaCl 300 mM, imidazole 20 mM) and a one- step elution using 5 bed volumes elution buffer (HEPES 50 mM pH7.0, NaCl 300 mM, imidazole 250 mM). Imidazole was removed, while simultaneously concentrating the sample with Amicon-100 spin filters (Amicon). Final con- centration was determined using A280 measurements or a colorimetric assay (DC protein assay, BioRad). Isolated enzyme was subjected to in vitro assays, in order to determine product formation properties. Assay conditions included HEPES 50 mM pH 7.0, NaCl 300 mM, ATP 5 mM pH 7.0, CoA 100 µM, sfp (NEB) 0.2 µM, L-cysteine An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing 2 mM, L-valine 2 mM, MgCl2 5 mM, DTT 2 mM, hybrid NRPS 0.3 µM and either L- or D-hydroxyphenylglycine (L/D-hpg) 5 mM. Negative controls included re- actions omitting enzyme or L/D-hpg. Full length N. lactamdurans ACVS served as positive control, using 5 mM L-α-aminoadipic acid instead of L/D-hpg. Reac- 5 tions were run at 30 °C and sampling took place after 0, 60, 120 and 240 min- utes and reactions quenched using 0.1 M NaOH. Samples were subsequently stored at −80 °C and reduced with 10 mM DTT prior to LC/MS analysis.

P. chrysogenum transformation setup and strain selection

P. chrysogenum DS58912 [25] was co-transformed with different combinations of linear DNA fragments, encoding expression constructs for the MbtH-like protein (MLP) Tcp13, an initiation module (Tcp9/11), a CV di-­module (Pc/NlCV), a modified version of the pcbC gene, encoding an evolved isopenicllin-N-­ synthase variant (Unpublished results) and the selection marker AmdS, ac- cording to Supplementary table 1. Penicillium transformants were selected on regeneration plates containing 0.1 % acetamide as sole nitrogen source. To verify for the presence of the different linear fragments integrated into

121 the P. chrysogenum genome, genomic DNA (gDNA) was isolated after 48 h of growth in YGG medium using a modified yeast genomic DNA isolation pro- tocol [26]. Fungal mycelium was broken in a FastPrep FP120 system (Qbio- gene, Carlsbad, CA, USA). Diagnostic primers used to check for the presence of Tcp9/11, Tcp13, amdS and Nl/PcCV are listed in Supplementary table 2.

Small scale fermentation and sampling

The parental P. chrysogenum DS58912 and selected transformants were grown in duplicates using PPM in a volume of 25 ml. Samples were taken as indicated and centrifuged at 14000 g for 5 minutes. Supernatant was fil- tered using 0.2 µm syringe filters with polypropylene Housing (VWR Inter- national Ltd.) and 60 µl of supernatant was transferred to an auto sampler vial. For separation, Accella 1250TM HPLC system coupled in-line to an ES-MS Orbitrap ExactiveTM (Thermo Fisher Scientific, San Jose, CA) was used. Scan range between m/z 80 and m/z 1600 in positive Ion (4.2 kV spray, 87.5 V cap- illary and 120 V of tube lens) mode, with capillary temperature set at 325 °C was used. Separation was performed on a Shim-Pack XR-ODSTM c18 column (3.0 × 75 mm, 2.2 µm) (Shimadzu, Kyoto, Japan) at 40 °C. A linear gradient was used for the elution. A gradient program with miliQ water (A), acetoni- trile (B) and 2 % formic acid (D) was run; 0 min; A 90 %, B 5 %, C 5 %; 4 min, A 90 %, B 5 %, C 5 %; 13 min, A 0 %, B 95 %, C 5 %; 16 min A 0 %, B 95 %, C Materials and methods 5 %; 16 min, A 90 %, B 5 %, C 5 %; 20 min A 90 %, B 5 %, C 5 % at a flow rate of 0.3 ml min-1. Standards were used to identify peaks according to retention time and accurate mass (amoxicillin, hpgCV). A DLD-hpgCV standard curve (0,5; 1; 5; 10; 25; 50 µM) ensured accurate quantification of the produced compound levels (r>0.98).

Compound isolation

Spores of the P. chrysogenum transformant T1C and parental strain DSDS68530 were used to inoculate shake flask fermentations containing 75 ml of PPM medium in a 200 ml flask at 200 rpm and 25 °C. After 48 hours inoculum was then transferred to a 2 liter flask containing 425 ml PPM medium resulting in a total fermentation volume of approximately 500 ml. DL-hydroxyphenyl­ glycine (hpg) was then supplemented to a concentration of 0.3 mM. Cultures without DL-hpg added served as control. Sampling of the supernatant was performed immediately before and after addition of DL-hpg, and subsequently

5 122 every 24 hours for the following 5 days. For preparative HPLC analysis, the culture supernatants were harvested by centrifugation at 13000 g, 4 °C for 20 min and then passed through a 0.22 mm filter. The supernatant was con- centrated by freezing in liquid nitrogen followed by lyophilization.

Compound enrichment and analysis using preparative HPLC, micro- scale NMR and MSX

Obtained lyophilized samples of culture broths were dissolved in milliQ wa- ter and the amount of the thioester compound present was measured by LC-MS analysis according to the protocol described in the previous section. After confirmation of the presence of the target compound, preparative HPLC analysis was initiated for enrichment of the compound from the larger vol- ume penicillium culture supernatant. Separation was performed by direct injection of 100 ml of concentrated supernatant on a XTerra PREP MS C18 column (D × L 7.8 × 150 mm, ID 10 mm) (Waters) kept at 40 °C. Elution was done with water containing 0.1 % (v/v) trifluoroacetic acid (TFA, Merck) (solvent A) and acetonitrile containing 0.1 % TFA (solvent B) at a flow rate of 1 ml/min. Solvent A and solvent B were used with a gradient program as follows: 0–3 min, 90:10; 3–30.5 min, from 90:10 An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing to 47.5:52.5; 30.5–37 min, from 47.5:52.5 to 5:95; 37–42 min, 5:95; 42–45 min from 5:95 to 90:10; from 45–65 min re-equilibration at 90:10. Detection was performed using an SPD-20A Shimadzu UV/VIS detector at 272 nm. Enriched fractions were collected using a fraction collector. Fractions containing the 5 compound of interest were further lyophilized, dissolved in an appropriate solvent analyzed by means of LC-MS(X) and NMR. Similar analysis were per- formed using the DLD-hpgCV standard.

Results

In order to explore the feasibility of a novel, more sustainable biosynthetic pathway for the production of amoxicillin, a novel NRPS is required. There- fore, we chose two D-/L-hydroxyphenylglyine recruiting modules, Tcp9 and Tcp11 and their associated MLP Tcp13, from the teicoplanin biosynthetic cluster, in combination with two L-cysteine and L-valine specific di-modules, PcCV and NlCV of P. chrysogenum and N. lactamdurans, respectively. Two complementing NRPS communicating domains, the donor COMD and accep- tor COMA were integrated C-terminally in Tcp9 and Tcp11 (Tcp9/11-COMD)

123 H O 3 3 3 H H H C O H C C O H S N C 3 O H N S H O IPNS* H O H N N O O 2 2 H H N N Te O O H H E Te

2

H

N

3

H

C O T S

C

O

H

3 E

3

3

H

H

2

SH

C

C

H A

N O

H

H T S

N

N

O

O

O

O

H A A C

D

2

H

N A T O S C

D

O S

H

2

SH

H A

N T O

H S

N Results

O

O

O

H A A C

D A A COM C C

D D COM T T E E

2 2

H H

N N T T O O S S

O O

O O

H H A A

5 124 constructs and N-terminally in CV- constructs (Nl/PcCV- COMA), respectively. Different combinations of Tcp and CV parts, together with a DLD-hpgCV spe- cific amoxicillin synthase (unpublished results) were included in the experi- mental design (Figure 1). The different transformations resulted in 22 strains containing all desired DNA expression constructs. A selection of strains was sequentially subjected to small scale fermentations and the culture super- natant was screened for pathway relevant products, DLD-/LLD-hpgCV and DL-amoxicillin, respectively. None of the screened transformant strains nor any of the controls led to the production of amoxicillin. However, in a series of the transformants we were able to detect significant amounts of a com- pound resembling DLD-hpgCV­ in retention time on HPLC and by accurate mass on LC/MS (Compound X). The strains producing compound X showed significant differences in their metabolite profile (Figure 2A). Other observed metabolites appear to contain cysteine and valine according to an accurate mass reference search via METLIN [27]. The presence of a reduced com- pound in addition to the oxidized, bis-form of each hypothetical peptide, fur- ther points towards cysteine containing compounds, linked via a side-chain di-sulfide bond (Figure 2B). Compound X levels were further quantified in all available T1–7 strains and peak production levels were compared (Figure 3). Product levels were recorded in a range of 0.07 µM up to 13.94 µM in the top producer strain T1C. An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing NlCV containing strains T2 and T4, however, did not produce compound X. Furthermore, no effect was observed upon the genomic presence of Tcp13, the MLP for Tcp9 and Tcp11, as shown for strains T5 and T7. Only the com- bination of either Tcp9 or Tcp11 with PcCV results in average compound X 5 levels of 5.8–6.7 µM. A distinct inter-strain variation resulted from the non-­ targeted genome insertion strategy, leading to different genomic integration sites in addition to variations in copy numbers of the transformed genes. This was further reflected in the diverse compound production profiles of the different compound X producer strains (Figure 4A), peaking at days

Figure 1 — NRPS parts and pathway setup used in this study. (A) A schematic representation of Tcp9, Tcp11, Tcp13 and the NlCV and PcCV di-modules. The NRPS is displayed in an active state, having the amino acids loaded onto the phospho- pantetheine arm. (B) Proposed pathway, including the subsequent condensation reactions between the three substrates hpg, cysteine and valine and the ultimate cyclization into amoxicillin performed by an amoxicillin synthase (IPNS*). A = adenylation domain; T = thio- lation domain/peptidyl-carrier-protein; C = condensation domain; E = epimerization domain; Te = thioesterase domain.

125 Results

5 126 Control

TCP 11 TCP 9 TCP 13 -TCP PcA

12

10

8

6 μM

4

2

0 T1 T2 T3 T4 T5 T7 T11 T21

TCP 9 - - X X - X - - 2 TCP 11 X X - - X - - - 2 PcCV X - X - X X X X NlCV - X - X - 1 - 1 - 1 - 1 TCP 13 X X X X - - - - An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing

n= 42 22 23 43

Figure 3 — Overview of analyzed strains, and production levels of the 370.14 M/Z [H]+ compound. 5 Average compound levels at the peak day of production are shown here, grouped by po- tential producer strains (T1–T4) and controls (T5–T21). The precise genetic setup is shown under the respective strains. n = number of different purified and analyzed transformants. 1 NlCV appeared inactive and has not been further tested. 2 PcA, the first module of the ACVS, substitutes Tcp9 and Tcp11.

Figure 2 — Overview of total ion chromatograms of different producing and non-produc- ing strains. (A) Selection of different metabolic profiles of producer, T1C; T5C; T7X, and non-producer strains, T2B and wild type DS58912. Additionally, a 100 µM DLD-hpgCV standard is shown. The most prominent peaks are additionally annotated and displayed in(B) . The list includes the retention time, the protonated Monoisotopic mass of potential monomers and dimers, the molecular composition and the METLIN compound prediction. nd = not determined.

127 Results

Figure 4 — Production profiles of selected strains and hpg feed dependency. (A) Different production profiles of independent strains have been analyzed daily for a time 181frame of 5 days, revealing different compound production profiles.(B) Peak production lev- els observed in strains grown with and without the addition of 300 µM L-/D-hpg. All values represent averages of at least two biological and two technical replicates. Error bars indicate the standard deviation.

5 128 3, 4 or 5 after DL-hpg addition, respectively. Subsequent experiments omit- ting the racemic addition of 300 µM DL-hpg to the culture at day 0, revealed a significant increase in compound X levels by up to nearly 2-fold (Figure 4B). A 10-fold increase in DL-hpg addition to 3 mM did not show any difference in compound X formation (Data not shown). In addition to the transformant strains T1–7, further control strains T11 and T21, both lacking Tcp9 and Tcp11, showed traceable amounts of compound X formation in their supernatant. In order to further elucidate the origin and chemical structure of compound X, focus was subsequently laid on its purification and structural characteriza- tion analysis.

Compound X characterization indicates a thioester-bond peptide

Compound X was isolated using the supernatant of the transformant strain T1C. Supernatant was collected, processed and subsequently analyzed using HPLC and LC/MS(x) in combination with a mass fragmentation analysis. Under preparative HPLC conditions the standard DLD-hpgCV tripeptide eluted at 17.5 min and had a mass of 370.1428 M/Z. Preparative-HPLC fractions were subsequently collected between elution times 15.5 min and 18.2 min (Fig- ure 5A). The LC-MS analysis performed on the collected fractions revealed An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing that fractions 6 to 10 contained a compound, with matching M/Z (370.1428) and elution time (17.5 min) values as the DLD-hpgCV standard. For further analysis fractions 8 to 10 were subsequently lyophilized (Fig- ure 5B), dissolved in an appropriate amount of solvent and subjected to MS2 5 and MS3 analysis. The DLD-hpgCV tripeptide standard was analyzed under the same conditions. The obtained MS2 and MS3 data of the DLD-hpgCV standard and compound X were used for the possible structure elucidation of compound X. The MS2 spectrum of DLD-hpgCV standard clearly differs from that of compound X suggesting that these are different compounds (Figure 5C). The MS2 spectrum of DLD-hpgCV is difficult to elucidate since cyclisation seems to occur during MS2 analysis (Figure 5D). On the other hand, the MS2 spectrum of compound X shows two fragments, with M/Z values of 120.0542 and 221.0946. Subsequent fragment assignment of com- pound X and DLD-hpgCV ultimately lead to the conclusion that the observed compound is composed of hpg, cysteine and valine in which hpg and cysteine are coupled via a thioester bond, while cysteine and valine are connected via a peptide bond. Subsequent µ-scale NMR results were inconclusive due to the intrinsic instability and low abundance of the purified compound X.

129 Results

Figure 5 — LC/MS2 fragmentation of DLD-hpgCV and the compound isolated from T1C supernatant. (A) Chromatogram derived from preparative HPLC of strain T1C supernatant. Displayed is the time of 15.25–18.25 minutes of the 60 minutes method. Fractions are additionally indi- cated as green or blue peak areas including an assigned number (1–10). (B) Fractions dis- played as total retrieved volume (ml) per fraction (y-axis, right) and compound X content is overlaid in the figure asblue bars (y-axis, left).(C) MS2 fragmentation of compound X and DLD-hpgCV, showing clearly diverging patterns. (D) Expected tripeptide structure of DLD-hpgCV with indicated peptide bond (red) versus probable structure of compound X, highlighting the thioester bond (green).

5 130 An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing 5 Figure 6 — Production of compound X by P.chrysogenum strains containing the S497A or H130V mutation in the thiolation domain of Tcp11 and condensation domain of the cyste- ine module of PcCV, respectively. Two mutations were introduced into Tcp11 and PcCV, which result in reduced product levels (S497A) and a virtually complete interruption of compound X production (H130V).

Enzymatic nature of compound X

To further investigate the involvement of the hpg activating initiation module as well as the PcCV di-module, a series of mutations were introduced into functional sequence motifs of the Tcp11 T domain as well as the first PcCV C domain. First, Tcp11 was mutated at the conserved serine (S497) in the se- quence LGGDSISSM, serving as anchor for the phosphopantetheine (ppant) group attachment, thus rendering the domain, and hence the module inactive.

131 In addition, two downstream serine residues, LGGDSISSM at position 499 and 500 were targeted for mutation, to exclude for their functional replacement of the conserved serine residue at position 497. The different single, double and triple serine mutants were subsequently expressed in E. coli BL21 (DE3), purified and tested in vitro for phosphopantetheine attachment, using fluo- rescent CoA647 as ppant-donor (Supplementary figure 2). Only in the S497A mutants, we recorded no ppant attachment. Thus, the downstream serine residue 499 and 500 could not substitute for the mutation in the functional motif. Tcp11-COMD S497A was further transformed to P. chrysogenum, where sequential fermentations revealed a 6.9-fold decrease in compound X levels, relative to strain T1C, or in average a decline from 9.85 µM to 1.42 µM, re- spectively (Figure 6). The second functional motif we targeted is at the ac- tive site of the first PcCV condensation domain and hypothetically enabled the formation of a peptide bond between the upstream and downstream activated substrates. The conserved histidine at position 130 in the conden- sation domain sequence SCHHAILDGWSL was mutated (H130V) and the re- sulting construct transformed and analyzed in P. chrysogenum. Compound X level quantification indicated a near to complete abolition of production with H130V strains. Compound X levels exhibited a small peak at a concentration of 0.03 µM, essentially leaving the PcCV di-module inactivated (Figure 6). These data demonstrate that the thioester bonded hpgCV peptide results from enzymatic catalysis involving the two-component hybrid NRPS. Discussion Discussion

The feasibility of a new pathway for the production of the β-lactam antibiotic amoxicillin was evaluated. The new process aims at incorporating hydroxy- phenylglycine, cysteine and valine into the amoxicillin precursor tripeptide DLD-hpgCV, and thus requires a novel nonribosomal peptide synthetase ac- tivity. Therefore, we created four genes encoding two hpg activating modules (Tcp9/11-COMD) and two cys-val activating di-modules (Nl/PcCV-­COMA), in- cluding two complementing communicating domains (COMA/D), projected to facilitate the association of the two protein products. The ultimate conver- sion of DLD-hpgCV into amoxicillin requires an amoxicillin synthase, which was developed in a separate study through the mutagenesis of the native isopenicillin N synthetase rendering it capable of converting DLD-hpgCV into amoxicillin (unpublished results). The corresponding genes were subse- quently transformed to P. chrysogenum and genetically characterized. The fer- mentation of the strains and the analysis of the extracellular broth revealed

5 132 the presence of a novel compound with the exact accurate mass as well as the retention time of the DLD-hpgCV standard. However, in none of the sam- ples L-amoxicillin­ nor D-amoxicillin was detected. This was not entirely un- expected, due to the low activity of the amoxicillin synthase, at <2 % of the native IPNS enzyme (unpublished results). Furthermore, the low abundance of the potential substrate, might interfere with amoxicillin formation. Intrigu- ingly, increased novel compound levels were observed when omitting the DL- hpg precursor from the fermentation. Although P. chrysogenum is not known to contain a hpg biosynthesis pathway [28], the unequivocal detection of hpg as a constituent of compound X implies the presence of such a pathway. Further analytical characterization revealed that the novel product does not corre- spond to DLD-­hpgCV but rather to a unique thioester bonded hpgCV structure. To gain additional information on the structure of compound X, a preparative enrichment and purification was conducted. During the purification process, compound X appeared temperature sensitive (Data not shown) and readily de- composed. MS fragmentation patterns of compound X pointed towards a weak bond, leading to essentially two fragments which by accurate mass could be identified as hpg and cys-val. NMR analysis was not conclusive due to the low abundance and lability of the compound. Overall, it appeared that compound X is the product of an incomplete peptide bond formation, which may leave the thiol group of the cysteine tethered with the carboxyl group of the hpg. An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing This formation remains reactive and is probably an intrinsic intermediate in the peptide bond formation process of an NRPS condensation domain. Due to the nature of the NRPS setup or composition, however, the geometry of our altered NRPS modules and in particular the interaction with the first conden- 5 sation domain of the CV di-module may prohibit this process from completion. The truncation of the ACVS into the CV di-module, potentially interfered with crucial inter-domain interactions of the native ACVS, leaving the cysteine side chain in a position more favorable for peptide bond formation, rather than the amino group. Because of the unusual chemical characteristics of compound X, its enzymatic origin was further elucidated through additional transformants. Those data showed that the PcCV di-module is strictly required to allow for any compound X production while the addition of either Tcp9 or Tcp11 is crucial for the high product levels. No activity was observed with the NlCV module. Remarkably, the activity was independent of Tcp13, the MLP of Tcp9 and Tcp11, despite the beneficial effects on Tcp9 and Tcp11 adenylation activity in in vivo and in vitro studies (Chapter II/Unpublished data). To unravel further details on the precise role of Tcp11 and PcCV, two muta- tions were introduced, targeting conserved residues in the functional motifs of the thiolation domain of Tcp11 (S497A) and the N-terminal condensation

133 domain of the PcCV di-module (H130V), respectively. Analysis of the S497A mutants revealed drastically reduced amounts of compound X, approximately to the levels observed with T11 strains, containing the PcCV di-module only. Further mutations of downstream serine residues in the core T domain se- quence Tcp11 — FFALGGDSISSMQ, at position 499 and 500, demonstrated a non-complementing role of those two positions for the conserved S497 residue. In vitro experiments using fluorescent CoA 647 as a phosphopan- tetheine donor (Supplementary figure 2), supported the in vivo data and was successfully used as a new tool for the in vitro analysis of NRPS apo- and holo- states [29] (This study). The second mutation, PcCV H130V, was expected to render PcCV inac- tive, which was confirmed usingP. chrysogenum H130V transformant strains. Compound X production was essentially abolished, which ultimately demon- strates that the catalytic activity of PcCVs' first condensation domain is es- sential for the formation of the observed thioester containing compound X and that the hpg activating module leads to increased compound X levels. However, the low, but significant levels of compound X seen in penicillium strains containing only PcCV remained elusive. Two possible explanations could be considered: (i) an inefficient chemical conversion of NRPS released cys-val leading to compound X formation or, (ii) the PcCV or COMA depen- dent recruitment of an intrinsic Penicillium “Orphan” NRPS module [30]. With respect to the first hypothesis, chemical conversion is difficult to- ex amine in a complex cellular setup, especially where the resulting product, Discussion the thioester bonded compound X is chemically unstable. The high amount of potentially cys-val containing compounds observed in the culture broth may suggest that a substantial level of compound X is produced that due to instability decomposes into the CV product. With respect to the second hypothesis, NRPS module or domain recruitment is a phenomenon which has just recently been observed in several studies [31–32]. Inspection of all annotated single NRPS modules in P. chrysogenum, revealed one interesting target, Pc20g12670 (Supplementary figure 1), which was predicted to accept a bulky substrate. However, in order to further analyze and potentially proof the origin and mechanism of compound X formation and accomplish the de- sired DLD-hpgCV formation, a redesign of the COM linker based NRPS genes, including a subsequent targeted gene integration in Penicillium, and a sys- tematic compound characterization needs to be conducted. Based on this work we present here clear evidence for the production of a novel compound, guided by a two component NRPS system, which effectively associates using NRPS inter-module communicating domains, representing the first example of successful in vivo implementation in a heterologous host.

5 134 Conclusion and perspectives

Here we have employed a COM linker strategy to generate hybrid NRPS en- zymes in vivo, yielding a novel NRPS product, a thioester bonded hpgCV pep- tide. In a wider context, NRPS COM domains as laid out in this research, have the potential to complement extensive site, region or domain directed engi- neering approaches, in strategies to create NRPS with novel activities. This is not only crucial for the production and discovery of new compounds, but may also lead to the replacement of semi-synthetic or synthetic production routes in favor of more sustainable, fully fermentative ones.

Acknowledgements

The authors would like to give their gratitude to DSM Sinochem Pharmaceu- ticals BV, for the construction of plasmids, transfer of strains and further materials as well as for analytical and intellectual support. The research has been financially supported by the research program of the biobased ecologi- cally balanced sustainable industrial chemistry (BE-BASIC).

References (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing

1. Cantwell C, Beckmann R, Whiteman P, et al. Single-step fermentative produc- Queener SW, Abraham EP. Isolation of tion of the cholesterol-lowering drug 5 deacetoxycephalosporin C from fermen- pravastatin via reprogramming of Peni- tation broths of Penicillium chrysoge- cillium chrysogenum. Proc Natl Acad Sci num transformants: construction of a U S A. 2015;112: 2847–52. new fungal biosynthetic pathway. Proc doi:10.1073/pnas.1419028112 Biol Sci. 1992;248: 283–9. 4. Veiga T, Nijland JG, Driessen AJM, Boven- doi:10.1098/rspb.1992.0073 berg RAL, Touw H, van den Berg MA, et 2. Harris DM, Westerlaken I, Schipper D, van al. Impact of velvet complex on tran- der Krogt Z a, Gombert AK, Sutherland J, scriptome and penicillin G production in et al. Engineering of Penicillium chrysoge- glucose-limited chemostat cultures of num for fermentative production of a a β-lactam high-producing Penicillium novel carbamoylated cephem antibiotic chrysogenum strain. OMICS. 2012;16: precursor. Metab Eng. 2009;11: 125–37. 320–33. doi:10.1089/omi.2011.0153 doi:10.1016/j.ymben.2008.12.003 5. Weber T, Blin K, Duddela S, Krug D, Kim 3. McLean KJ, Hans M, Meijrink B, van HU, Bruccoleri R, et al. antiSMASH Scheppingen WB, Vollebregt A, Tee KL, 3.0 — a comprehensive resource for the

135 genome mining of biosynthetic gene peptide synthetase heterocyclization clusters. Nucleic Acids Res. 2015;43: domain provides insight into catalysis. W237–43. doi:10.1093/nar/gkv437 Proc Natl Acad Sci U S A. United States; 6. Blin K, Medema MH, Kottmann R, Lee 2016; doi:10.1073/pnas.1614191114 SY, Weber T. The antiSMASH database, a 13. Winn M, Fyans JK, Zhuo Y, Micklefield comprehensive database of microbial sec- J. Recent advances in engineering non- ondary metabolite biosynthetic gene clus- ribosomal peptide assembly lines. Nat ters. Nucleic Acids Res. England; 2017;45: Prod Rep. The Royal Society of Chemis- D555–D559. doi:10.1093/nar/gkw960 try; 2016;33: 317–347. 7. Umemura M, Koike H, Machida M. Mo- doi:10.1039/C5NP00099H tif-independent de novo detection of sec- 14. Nielsen ML, Isbrandt T, Petersen LM, ondary metabolite gene clusters-toward Mortensen UH, Andersen MR, Hoof JB, identification from filamentous fungi. et al. Linker Flexibility Facilitates Mod- Front Microbiol. Switzerland; 2015;6: 371. ule Exchange in Fungal Hybrid PKS- doi:10.3389/fmicb.2015.00371 NRPS Engineering. PLoS One. United 8. Khaldi N, Seifuddin FT, Turner G, Haft D, States; 2016;11: e0161199. Nierman WC, Wolfe KH, et al. SMURF: doi:10.1371/journal.pone.0161199 Genomic mapping of fungal secondary 15. Doekel S, Marahiel MA. Dipeptide for- metabolite clusters. Fungal Genet Biol. mation on engineered hybrid peptide United States; 2010;47: 736–741. synthetases. Chem Biol. 2000;7: 373–84. doi:10.1016/j.fgb.2010.06.003 Available: http://www.ncbi.nlm.nih.gov/ 9. Hoertz AJ, Hamburger JB, Gooden DM, pubmed/10873839 Bednar MM, McCafferty DG. Studies on 16. Eppelmann K, Stachelhaus T, Marahiel References the biosynthesis of the lipodepsipeptide MA. Exploitation of the selectivity-con- antibiotic ramoplanin A2. Bioorganic ferring code of nonribosomal peptide Med Chem. Elsevier Ltd; 2012;20: 859– synthetases for the rational design of 865. doi:10.1016/j.bmc.2011.11.062 novel peptide antibiotics. Biochemistry. 10. Finking R, Marahiel MA. Biosynthesis of 2002;41: 9718–26. nonribosomal peptides. Annu Rev Micro- 17. Stachelhaus T, Schneider A, Marahiel MA. biol. Annual Reviews; 2004;58: 453–488. Rational design of peptide antibiotics by doi:10.1146/annurev.micro.58.030603. targeted replacement of bacterial and 123615 fungal domains. Science. 1995;269: 69–72. 11. Cane DE, Walsh CT. The parallel and 18. Mootz HD, Schwarzer D, Marahiel MA. convergent universes of polyketide syn- Construction of hybrid peptide synthe- thases and nonribosomal peptide syn- tases by module and domain fusions. thetases. Chem Biol. 1999;6: R319–R325. Proc Natl Acad Sci U S A. National Acad- doi:10.1016/S1074-5521(00)80001-0 emy of Sciences; 2000;97: 5848–53. 12. Bloudoff K, Fage CD, Marahiel MA, doi:10.1073/pnas.100075897 Schmeing TM. Structural and muta- 19. Hahn M, Stachelhaus T. Harnessing the tional analysis of the nonribosomal potential of communication-mediating

5 136 domains for the biocombinatorial syn- n’ Grab. BMC Biotechnology. London; thesis of nonribosomal peptides. Proc 2004. p. 8. doi:10.1186/1472-6750-4-8 Natl Acad Sci U S A. National Academy 27. Smith CA, O’Maille G, Want EJ, Qin C, Trauger of Sciences; 2006;103: 275–80. SA, Brandon TR, et al. METLIN: a metabolite doi:10.1073/pnas.0508409103 mass spectral database. Ther Drug Monit. 20. Yin X, Zabriskie TM. The enduracidin bio- United States; 2005;27: 747–751. synthetic gene cluster from Streptomy- 28. van den Berg MA, Albang R, Albermann ces fungicidicus. Microbiology. 2006;152: K, Badger JH, Daran JM, Driessen AJM, 2969–2983. doi:10.1099/mic.0.29043-0 et al. Genome sequencing and analysis 21. Wang H, Fewer DP, Holm L, Rouhiainen of the filamentous fungus Penicillium L, Sivonen K. Atlas of nonribosomal chrysogenum. Nat Biotechnol. 2008;26: peptide and polyketide biosynthetic 1161–1168. doi:10.1038/nbt.1498 pathways reveals common occurrence 29. Beld J, Sonnenschein EC, Vickery CR, of nonmodular enzymes. Proc Natl Acad Noel JP, Burkart MD. The phosphopan- Sci. 2014;111: 9259–9264. doi:10.1073/ tetheinyl transferases: Catalysis of a pnas.1401734111 posttranslational modification crucial for 22. Pootoolal J, Thomas MG, Marshall CG, Neu Life. Nat Prod Rep. 2014;31: 61–108. JM, Hubbard BK, Walsh CT, et al. Assem- doi:10.1039/c3np70054b bling the glycopeptide antibiotic scaffold: 30. Marta M. Samol, Oleksandr Salo, Peter The biosynthesis of A47934 from Strepto- Lankhorst, Roel A.L. Bovenberg, Ar- myces toyocaensis NRRL15009. Proc Natl nold J.M. Driessen. 9. Secondary me- An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing Acad Sci U S A. 2002;99: 8962–8967. tabolite formation by the filamentous 23. Sosio M, Kloosterman H, Bianchi A, de fungus Penicillium chrysogenum in the Vreugd P, Dijkhuizen L, Donadio S. Orga- post-genomic era. Aspergillus Penicil- nization of the teicoplanin gene cluster lium Post-genomic Era. 2016; 145–172. 5 in Actinoplanes teichomyceticus. Micro- doi:doi.org/10.21775/9781910190395.08 biology. Microbiology Society; 2004;150: 31. Haslinger K, Peschke M, Brieke C, Maxi- 95–102. doi:10.1099/mic.0.26507-0 mowitsch E, Cryle MJ. X-domain of pep- 24. Polli F, Meijrink B, Bovenberg RAL, tide synthetases recruits oxygenases Driessen AJM. New promoters for strain crucial for glycopeptide biosynthesis. engineering of Penicillium chrysogenum. Nature. 2015;521: 105–9. Fungal Genet Biol. 2015; doi:10.1038/nature14141 doi:10.1016/j.fgb.2015.12.003 32. Ulrich V, Peschke M, Brieke C, Cryle MJ. 25. Kovalchuk A, Weber SS, Nijland JG, More than just recruitment: the X-do- Bovenberg RAL, Driessen AJM. Fungal main influences catalysis of the first phe- ABC transporter deletion and localization nolic coupling reaction in A47934 biosyn- analysis. Methods Mol Biol. 2012;835: thesis by Cytochrome P450 StaH. Mol 1–16. doi:10.1007/978-1-61779-501-5_1 Biosyst. England; 2016;12: 2992–3004. 26. Harju S, Fedosyuk H, Peterson KR. Rapid doi:10.1039/c6mb00373g isolation of yeast genomic DNA: Bust

137 Supplementary material

Supplementary table 1 — DNA used to set up different P. chrysogenum COM-linker path- way transformations. Genetic setup of analyzed strains in this study. Indicated are the utilized genetic fragments and the respective amounts used for transformation purposes to P. chrysogenum.

hpg module CV module IPNS MbtH-like Marker part amount part amount part amount part amount part amount

T1 Tcp11-1-ComD 3 µg ComA-PcCV 5 µg AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T2 Tcp11-1-ComD 3 µg ComA-NlCV 5 µg AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T3 Tcp9AT-ComD 3 µg ComA-PcCV 5 µg AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T4 Tcp9AT-ComD 3 µg ComA-NlCV 5 µg AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T5 Tcp11-1-ComD 3 µg ComA-PcCV 5 µg AnI101 2 µg - - Amds 0.25 µg

T6 Tcp11-1-ComD 3 µg ComA-NlCV 5 µg AnI101 2 µg - - Amds 0.25 µg

T7 Tcp9AT-ComD 3 µg ComA-PcCV 5 µg AnI101 2 µg - - Amds 0.25 µg

T8 Tcp9AT-ComD 3 µg ComA-NlCV 5 µg AnI101 2 µg - - Amds 0.25 µg

T9 Tcp11-1-ComD 3 µg - - AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T10 Tcp9AT-ComD 3 µg - - AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T11 - - ComA-PcCV 5 µg AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T12 - - ComA-NlCV 5 µg AnI101 2 µg Tcp13 0.8 µg Amds 0.25 µg

T13 Tcp11-1-ComD 3 µg ComA-PcCV 5 µg - - Tcp13 0.8 µg Amds 0.25 µg

T14 Tcp11-1-ComD 3 µg ComA-NlCV 5 µg - - Tcp13 0.8 µg Amds 0.25 µg

T15 Tcp9AT-ComD 3 µg ComA-PcCV 5 µg - - Tcp13 0.8 µg Amds 0.25 µg

T16 Tcp9AT-ComD 3 µg ComA-NlCV 5 µg - - Tcp13 0.8 µg Amds 0.25 µg

T21 PcA-COM 3 µg Coma-PcCV 5 µg - - Tcp13 0.8 µg Amds 0.25 µg Supplementary material

T22 NlA-Com 3 µg Coma-NlCV 5 µg - - Tcp13 0.8 µg Amds 0.25 µg

T23 PcA-COM 3 µg Coma-PcCV 5 µg - - - - Amds 0.25 µg

T24 NlA-Com 3 µg Coma-NlCV 5 µg - - - - Amds 0.25 µg

5 138 Gene Accession Class Length AA Domains A range Prediction A spec Pred rel Pc20g12670 CAP86596.1 NRPS-like 1241 [A]-[CNAD] 235-719 Phenylalanine-(Acetate) + Pc12g13170 XP_002558125 NRPS-like 1056 [A]-T-[CNAD] 1-545 Proline +

Pc13g12570 XP_002559673.1 NRPS-like 1000 A-T-[CNAD] 1-507 Serine +/-

Pc21g22530 CAP97150.1 NRPS-like 878 [A]-T 265-774 Aminoadipic acid + Pc22g09430 CAP98231.1 NRPS-like 1030 [A]-T-[CNAD] 1-538 Alanine +/-

Pc14g01790 XP_002560173.1 NRPS-like 1038 [A]-T-[CNAD] 1-537 Leucine + Pc06g01540 CAP79147 NRPS-like 1025 A-T-[CNAD] 1-530 Alanine +

Pc20g02590 CAP85588.1 NRPS-like 1011 A-T-[CNAD] 1-516 Alanine ++

Pc20g02260 CAP85555.1 NRPS-like 1080 [A]-T-[CNAD] 1-575 Alanine +

Pc22g09430 CAP98231.1 NRPS-like 1030 [A]-T-[CNAD] 1-538 Alanine +/-

Pc12g09980 XP_002557823 NRPS-like 1619 [A]-T-[Te] 93-620 Alanine + Pc16g09930 XP_002561303.1 NRPS-like 967 [A]-T-[Te/X] 1-573 Alanine +

Pc18g00380 XP_002561885.1 NRPS-like 1276 A-T-[CNAD]-[X] 1-525 Alanine +

Pc20g09690 CAP86298.1 PKS-like 1038 [A]-T-[CNAD] 1-535 Valine +/-

Pc21g22650 CAP97162.1 PKS-like 1234 A-T-[CNAD]-[X] 1-548 Alanine +

Supplementary figure 1 — Orphan NRPS modules overview. List of functionally unassigned NRPS modules of P. chrysogenum [30]. Protein size and pro- posed adenylation domain placement are indicated. Furthermore, adenylation domain se- quences were used for in silico specificity determination, including an indication to the reli- ability of the prediction (-; +/-; +; ++). An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing 5

139 Supplementary table 2 — Primer used in this study. List of diagnostic primers used for the detection of genomically integrated expression cassettes.

Name Sequence 5’ → 3’ Purpose FW Pipns CTTATACTGGGCCTGCTG Promoter IPNS integration check RV Pipns TGATATCCTGTCTTCAGTCTTA Promoter IPNS integration check FW Ipns ATGGGTTCAGTCAGCAAAG IPNS integration check RV Ipns CTAGGTCTGGCCGTTCTTG IPNS integration check FW Tcp13 ATGACGAACCCATTTGACAAC Tcp13 integration check RV Tcp13 ATCGCTAATCTGGGAGATCAGG Tcp13 integration check Nocardia lactamdurans AT module FW NlA ATGACGTCAGCACGACAC integration check Nocardia lactamdurans AT module RV NlA TTACTGCTCGATTTCCCACTC integration check FW PcA ATGACTCAACTGAAGCCACC P.chrysogenum AT module integration check RV PcA TTACTGCTCGATCTCCCATTC P.chrysogenum AT module integration check FW Tcp9 ATGAATTCCGCAGCGCAG Tcp9 integration check RV Tcp9 TTACTGCTCGATCTCCCATTC Tcp9 integration check FW Tcp11 ATGGCGGAACGGCTC Tcp11 integration check RV Tcp11 TTACTGCTCGATCTCCCATTC Tcp11 integration check Nocardia lactamdurans CV module FW-1 NlCV ATGTCGCAGCAGCCGACC integration check Nocardia lactamdurans CV module FW-2 NlCV TCGAGCAGCTGGTCAAGGAAC integration check Nocardia lactamdurans CV module RV-1 NlCV ACGTCGCCCGAGGTCAG integration check Nocardia lactamdurans CV module RV-2 NlCV TTAGTCGCTCCCTAGGCTCGTC integration check Supplementary material FW-1 PcCV ATGAGCCAGCAGCCCACC P.chrysogenum CV module integration check FW-2 PcCV CTGAGCAAGAAGGAAACGGAGAAC P.chrysogenum CV module integration check RV-1 PcCV CTGGTTCAACGACGCTACTTTCTG P.chrysogenum CV module integration check RV-2 PcCV TTAATAGCGAGCGAGGTGTTCC P.chrysogenum CV module integration check FW Amds ATGTTAGACCTCCGCCTC Amds integration check RV Amds CTATGGAGTCACCACATTTCCC Amds integration check

5 140 An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing 5 mutant S497A/S499A/S500A. Pictures on the left display DIA light exposure of the gel and on the right fluorescent images at excitation of 647nm. Furthermore Furthermore of 647nm. excitation at images gel and on the right fluorescent the of exposure DIA light on the left display Pictures mutant S497A/S499A/S500A. tested. were (2:1; 1:1; 1:2) CoA647:CoA of ratios as different as well conducted were (-CoA) control (-hpg) and no CoA control no protein (sfp-), no-sfp controls downstream serine residues (S499/ S500). The four panels show the wildtype (Tcp11 WT), single mutant S497A, double mutant S499A/S500A and the triple WT), single mutant S497A, the wildtype (Tcp11 panels show four The S500). (S499/ serine residues downstream (S497) as well as two additional as two as well GGD S (S497) domain motif T the functional mutants targeting are Shown constructs. Tcp11 sfp into by CoA647 of Incorporation Supplementary figure 2 — CoA 647 labelling of Tcp11 mutants. of 647 labelling CoA 2 — figure Supplementary

141 PsiI SspI HindIII AclI EcoRI SfiI ApaI ZraI PspOMI AatII ScaI XmaI RsrII SmaI 'F1 ori NdeI SgrAI EcoRI 'F1 ori Pipns SmaI Pipns FseI XmaI ApaI Zeo Zeo 5000 PspOMI AatII ZraI 6000

1000 1000 MluI pI-Tcp9AT-COMD-AT-zeo4000 pUC oripI-tcp11-1-COMD-AT-zeo5000 5096 bps MluI pUC ori SfiI BsiWI 6353 bps 2000 2000 Bsp1407I 3000 BsrGI BstAUI 4000 MluI Tcp9AT-ComD SspBI MluI 'Tat BsiWI 3000 MluI 'Tat SmaI XmaI tcp11-1ATE-COMD MluI SmaI XmaI MluI BsiWI SmaI Bsu36I XmaI PasI SgrAI XmaI BclI SmaI FbaI Ksp22I

HindIII NotI EcoRI

NdeI XbaI

'F1 oriPipns 'F1 oriPipns Zeo Zeo NotI XbaI pUC ori pUC ori XbaI NsiI EcoRI NotI 'Tat 'Tat 10000 10000 2000 2000 pI-COMA-NlCV-AT-zeo pI-COMA-PcCV-AT-zeo 11640 bps 11812 bps 8000 8000 NotI 4000 4000

6000 6000

COMA-NlCV NotI COMA-PcCV

NotI Supplementary material NotI

5 142 HindIII EcoRI PshAI PsiI AccIII SspI ApaI AclI PspOMI SfiI ScaI BssHII TatI ZraI 'F1 ori AatII RsrII SgrAI Pipns BmgBI Ecl136II SexAI SacI 4000 FseI Zeo NdeI ClaI BsmBI DraI pIAT-AnI101-zeo 1000

3000 4278 bps EcoO109I PpuMI

pUC ori 2000 AnI101

XmnI BamHI 'Tat SbfI PstI

MluI BfrBI NsiI BsiWI

HindIII EcoRI PshAI

PsiI SspI AccIII AclI ApaI PspOMI 'F1 ori ScaI SfiI TatI BssHII ZraI Pipns RsrII AatII SgrAI BmgBI 3000 500 Ecl136II SexAI SacI Zeo pIAT-tcp13-zeo

2500 1000 FseI 3492 bps tcp13

DraI 2000 1500 An engineered two component nonribosomal peptide synthetase (NRPS) (NRPS) synthetase nonribosomal peptide component two An engineered chrysogenum in Penicillium compound a novel peptide-like producing

'Tat pUC ori

BsiWI MluI 5 NheI NheI PciI BspTI ApaI SmaI PstI EcoRI ori SphI attL1 SphI MunI

PvuI 5000 SalI KanR 1000 pDONR221-AMDS

4000 5710 bps NruI 2000 PgpdA attL2 3000 EcoRV PstI AT BglII NcoI NheI NcoI XbaI amdS NdeI

NcoI PvuI EcoRV EcoRV NcoI

Supplementary figure 3 — Plasmids used in this study.

143

CHAPTER VI

Prokaryotic MbtH like proteins stimulate secondary metabolism in the filamentous fungus Penicillium chrysogenum

Reto D. Zwahlen,1 Carsten Pohl,1 Roel A.L. Bovenberg,2,3 and Arnold J.M. Driessen1,4

1Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 2Synthetic Biology and Cell Engineering, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands 3DSM Biotechnology Centre, Delft, The Netherlands 4Kluyver Centre for Genomics of Industrial Fermentations, Julianalaan 67, 2628BC Delft, The Netherlands

manuscript in preparation Abstract

MbtH-like proteins or MLPs, are small (~10 kDa) proteins, which associate non-covalently with adenylation domains of nonribo- somal peptide synthetases (NRPS). MLPs increase folding speed, stability, and improve kinetic properties of the interacting NRPS adenylation domain. MLPs are highly conserved amongst a wide range of prokaryotic species, however, they appear absent from all fungal species. Here we transform a selection of five prokaryotic MLPs into two industrially relevant P.chrysogenum strains in order to evaluate potential changes in nonribosomal peptide metabolite profiles. The over 60 obtained strains shown significantly changed dynamics of nonribosomal peptide formation, with the majority of the strains showing increased concentrations of intermediate and final products of the roquefortine, chrysogine and penicillin bio- synthetic pathways. These findings demonstrate that the expres- sion of bacterial MLPs can be highly instrumental to stimulate the secondary metabolism of various fungal NRPS biosynthetic gene clusters. Introduction

Penicillium chrysogenum is a filamentous fungus, most prominently known as a producing host of β-lactam antibiotics [1–4]. In addition, P. chrysogenum generates numerous other secondary metabolites such as the mycotoxin roquefortine, the yellow pigment chrysogine, and others [5–10]. However, in high yielding penicillin producing strains, the production of undesired secondary metabolites was reduced by mutations introduced during sev- eral decades of classical strain improvement resulting in an increased flux of metabolic resources into penicillin production [11–12]. This re-distribution of metabolic fluxes, efficiently eliminated low abundant secondary metab- olites under penicillin production conditions. As typical for secondary me- tabolism the core enzymes of biosynthetic gene clusters (BGC) are either nonribosomal peptide synthetases (NRPS), polyketide synthetases (PKS) or a fusion thereof, termed NRPS-PKS hybrid enzymes. These enzymes belong to a family of poly-catalytic molecular nano-machineries, composed of func- tionally distinct domains and modules [13–14]. The genome of P.chrysoge- num encodes 10 NRPS, 20 PKS and 2 NRPS-PKS hybrid BGCs [15]. Enzymes and associated compounds have been identified for several of the BGCs [5–6;10;16–17]. The most prominent example is pcbAB, encoding the three modular NRPS L-δ-(α-aminoadipyl)-L-cysteinyl-D-valine synthetase (ACVS) which synthetizes L-δ-(α-aminoadipyl)-L-cysteinyl-D-valine (LLD-ACV), a tripeptide precursor of penicillin [18–20]. Another well described pathway includes RoqA, a NRPS synthesizing histidyltryptophanyl-diketopiperazine (HTD), which is further converted into a series of roquefortine and melea- grin derivatives[5–6;21–22]. The two-modular NRPS ChyA, is yet another ex- metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium ample of an NRPS, it is responsible for the formation of 2-aminobenzamide structures [23], the precusrsor for chrysogine biosynthesis. RoqA, ChyA and PcbAB are all multi-modular NRPS and, given their complex nature and typ- ical low turn-over-rates, they may represent potential bottlenecks in their 6 biosynthetic pathways [24–26]. NRPS have a high degree of functional con- servation with a very strict modular setup, much like a molecular assem- bly line. Each module possesses a minimal domain requirement, comprising an adenylation domain (A), consisting of a N-terminal Acore part and a C-­ terminal ~110 residue Asub part, a thiolation domain (T) and a condensation domain (C), which selects, activates, transfers and condensates a specific substrate amino acid, respectively. In addition, optional domains could be part of the NRPS like domains for methylation and epimerization and thio- esterification domains for final product release. To facilitate the complex biochemical process of nonribosomal peptide (NRP) formation, activating

147 and chaperoning factors are additionally required. Most notably are the 4'-phosphopanthetenyl-­transferase (PPTase) [27], an essential thiolation do- main activator and the MbtH-like proteins (MLPs), an A domain associating, catalytically inactive chaperone. PPTases are essential for NRPS activation (Weber and Marahiel 2001)[28] and different classes of PPTase genes are conserved in one or more genomic copies across a majority of organisms. Sfp-type PPTases are essential for NRPS phospopantetheination and show an outstanding degree of promis- cuity in bacterial and fungal species, respectively [29–30]. MLPs however, are more elusive, as they are not essential to fungal NRPSs, though partially essential to prokaryotic NRPSs [31]. MLPs are relatively small, i.e., ~10kD, catalytically inactive and associate with the A domain of NRPS enzymes [32–33]. Initially discovered in Mycobacterium tuberculosis, it has become in- creasingly apparent that they .play a key role in bacterial NRPS function as well as NRP formation [34–36]. MLPs often cluster among NRPS relevant genes, although one genomic copy is often sufficient to facilitate a generic NRPS activating function [37]. Upon binding, the MLP does not only enable or enhance the adenylation activity [38], but also increases soluble NRPS lev- els [39]. Despite the essential function it fulfills for some bacterial NRPSs, it appears completely absent from any fungal genome analyzed up to date. De- spite a high level of MLP conservation [40], only two structures are available displaying an A-MLP interface in SlgNI (PDB:4GR4) and EntF (4ZXJ) [33;41]. Both structures highlight the same potential interaction region, differing only in the absence of an Asub domain in the structure of SlgNI as well as co- valently linked versus un-linked MLP in SlgNI versus EntF, respectively. Thus, there seems to be a defined and rather conserved part of the Acore surface which enables the dynamic association of different MLP variants.

Introduction In this study, we use five prokaryotic MLP proteins from various sources in order to analyze their potential for fungal NRPS activation upon introduction into the filamentous fungus Penicillium chrysogenum. Remarkably, the bac- terial MLP proteins broadly impacts the activity of various NRPSs, showing changed kinetics and levels of secondary metabolites, including the primary NRPS products. This study shows for the first time that MLPs can be utilized to enhance secondary metabolism in a fungal host, providing a potentially new approach for host strain development and pathway engineering.

6 148 Materials and methods

Cloning, plasmids and culture conditions for E. coli

Cloning was performed using E. coli DH5α. Plasmid selection was conducted with 25 µg/ml zeocin for pIAT and pBAD plasmids and 15 µg/ml chloram- phenicol for pACYCtac-MbtH (p15Aori; cat; pTac; lacIq), respectively. All cul- tures were grown using 2xPY (15 g/l bacto-tryptone, 10 g/l yeast extract, 10 g/l sodium chloride, pH 7.0) at 37 °C and 200 rpm (New Bruinswick Innova 44R, Eppendorf, Hamburg, Germany). MLP were derived from 4 different sources as indicated in Table 1. Genes were codon optimized for P. chrysogenum, and ordered as synthetic DNA (gBlocks, IDT, Leuven, Belgium) Subcloning into the pIAT or pACYCtac vector was done using restriction with NdeI and NsiI. Assembled constructs were sequence-verified using Sanger sequencing (Macrogen, Amsterdam, The Netherlands). The N. lactamdurans ACVS gene was kindly provided by DSM- Sinochem Pharmaceuticals BV, (DSM biotechnology, Delft, The Netherlands). (Supplementary data — primers constructs)

Model creation and evaluation

To establish a basis for interaction probabilities of prokaryotic MLP and eu- karyotic NRPS, we utilized P.chrysogenum ACV synthetase (ACVS) as a model NRPS and focused on interactions concerning the adenylation domain (M2 A) of the second module. Adenylation domain boundaries were determined us- metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium ing a prediction by the conserved domain database (CDD) of the NCBI [42].

Table 1 — Overview of MLP variants in this study. 6 The displayed MLP variants were identified and in part characterized for their essential sta- tus to NRPS. A full list of all considered homologues can be found in chapter I.

Name Organism Biosynthetic cluster Accession Reference COM Streptomyces lavendulae Complestatine WP_030951033 CDAI Streptomyces coelicolor Calcium dependent antibiotic WP_003978376 Tcp13 Actinoplanes teichomyceticus Teicoplanin CAE53354.1 VEG8 uncultured VEG - TEG uncultured TEG -

149 The obtained sequences were subsequently used to generate models using RaptorX [43] and SWISS-MODEL [44]. Models were classified according to the model quality (global model quality estimation, GMQE) and were struc- turally aligned to SlgNI (PDB: 4GR4) [33] in a global and interface targeted fashion, respectively, using PyMOL (Schrödinger 2015). The resulting models, were then used for superpositioning and visualization in combination with 4GR4. Finally, a docking analysis was performed using 3DIANA [46].

Protein expression analysis

In order to evaluate interactions, N. lactamdurans ACVS was co-expressed with each MLP variant separately. Expression of the proteins was performed using the E. coli K12 derivative strain HM0079 [26]. Cells were grown at 37 °C and 200 rpm to an OD600 = 0.6, transferred to 18 °C, 200 rpm for 1 h and subsequently induced using 0.2 % L-arabinose (pBAD — ACVS) and 0.3 mM IPTG (pACYCtac — MLP). Cells were collected 18 hours after induction by spinning at 3500 g for 15 minutes at 4 °C. After resuspension in lysis buf- fer (50 mM HEPES pH 7.0, 300 mM NaCl, 2 mM DTT, complete EDTA free protease inhibitor; Roche, Basel, Switzerland), cells were disrupted using sonication (6 s/15 s; on/off, 60 cycles, 10 µm amplitude) and cell-free lysate obtained by centrifugation for 15 minutes 13000 g and 4 °C. Enzymes were purified by means of his-tag affinity purification using Ni-NTA beads (Qiagen, Venlo, The Netherlands). Wash steps were performed using two column vol- umes wash buffer (50 mM HEPES pH 7.0, 300 mM NaCl, 20 mM imidazole) and a one-step elution using 5 bed volumes elution buffer (50 mM HEPES pH 7.0, 300 mM NaCl, 250 mM imidazole). Protein concentrations were de-

Materials and methods termined using the BioRad DC assay kit (BioRad, Utrecht, The Netherlands). All fractions were analyzed on 5 % SDS-PAGE gels and stained using 0.025 % coomassie blue solution. Images were acquired using a FUJIFILM LAS-4000 scanner (Fujifilm, Tilburg, The Netherlands). Protein levels were additionally compared in-gel by 2-D densitometry using LAS-4000 AIDA software.

Transformation of MLP into P. chrysogenum

To introduce the MLPs into P. chysogenum, pIAT-MLP plasmids were cut with NotI and SmaI, recovered from gel, purified and subsequently concentrated by desalting. Protoplasts were prepared as described [47]. Co-­transformation was performed using the obtained MLP expression cassette together with

6 150 linearized pDONR221AMDS marker vector in a ratio of 1:10. After 5 to 7 days, colonies were screened for integration of the MLP expression cassette with primers (Sigma Aldrich, Zwijndrecht, The Netherlands) (Supplementary table 2) targeting the IPNS promoter and the AT terminator of each MLP expression cassette using colony PCR (Phire Plant Direct PCR Kit, Thermo Fisher Scientific, Bleiswijk, The Netherlands). Positive colonies were purified using three sporulation-selection transfer cycles.

Small scale fermentations of P. chrysogenum

Penicillium strains carrying the MLP genes were subjected to shake-flask cul- tivation experiments for up to 5 days. For preculture, spores stored on rice grains were inoculated in 25 ml YGG medium. After 24 h, the sporulated pre- culture was 10-fold diluted in a total volume of 30 ml penicillin production medium [48]. All cultures were grown at 25 °C and 200 rpm under semi-dark conditions in standard laboratory shakers (Innova44, New Brunswick Sci- entific, Nijmegen, The Netherlands). Sampling was performed on day 2 and day 5 after pre-culture transfer by aseptically harvesting 2 ml of culture from the shake flask followed by centrifugation at 4 °C and 14000 g for 10 min- utes to pellet mycelium. From 4 strains expressing the MLP COM, mycelium was mixed with trizol reagent (Thermo Fisher Scientific, Bleiswijk, The Neth- erlands), transferred to screw-cap tubes containing glass beads (0.75–1 mm, Sigma Aldrich, Zwijndrecht, The Netherlands) and stored at −80 °C until analysis. The supernatant was subsequently filtered using a 0.2 µM PTFE

membrane syringe filter (VWR, Amsterdam, The Netherlands) and stored at metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium −80 °C up until analysis. Samples were reduced using 10 mM TCEP prior to analysis. The remaining volume of the culture after 5 days was additionally used for dry weight determination. In all experiments two biological and two technical replicates were use. 6

qPCR analysis of gene copy number and NRPS expression in MLP strains

Mycelium stored in Trizol was disrupted with a FastPrep FP120 system (Qbiogene, France). Total RNA was isolated from aqueous phase by chloro- form/isopropanol isolation and purified using the Ambion Turbo DNA‐free kit (Thermo Fisher Scientific, Bleiswijk, The Netherlands). RNA degradation was analyzed by electrophoresis using a 2 % agarose gel and concentrations were determined with a nanodrop ND‐1000 (ISOGEN, Utrecht, the Netherlands).

151 The iScript cDNA synthesis kit (Bio‐Rad, Veenendaal, The Netherlands) was used for reverse transcription with 500 ng total RNA as input. The primers used for qPCR of Pc21g21390 (pcbAB), Pc21g15480 (roqA) and Pc21g12630 (chyA) as well as the expression level of com are listed in (Table 2). Theγ ‐actin gene (Pc20g11630) was used as internal standard for data normalization. As master mix for qPCR, the SensiMix SYBR Hi‐ROX (Bioline Reagents, London, England) was used. All runs were performed on a MiniOpticon system (Bio‐ Rad). The following conditions were employed for amplification: 95 °C for 10 min, followed by 40 cycles of 95 °C for 15 s, 60 °C for 30 s and 72 °C for 30 s, following an acquisition step. Raw ct data were exported and analysis of relative gene expression was performed with the 2−ΔΔCT method [49]. The expression analysis was performed with three technical replicates per strain. The copy number of pcbAB and the MLP homologue COM was determined from isolated genomic DNA of a selection of strains, following the same pro- tocols of qPCR as described above. For normalization and reference point of single copy, γ‐actin was used.

LC/MS analysis and evaluation of metabolite profiles

Clarified culture broth samples from fermentations were subjected to LC/MS analysis. Analysis was performed by injecting 5 µl sample on aC18 column

Table 2 — Primers used in this study. Primers were ordered in standard desalt quality.

Name Sequence (5’ to 3’) Purpose Materials and methods Verification of MLP-expression cassette pIPNS-FW GTCTGCCATTGCAGGGTATATATGGC integration Verification of MLP-expression cassette tAT-R TAGGTACGGTGCAGGGTAACGC integration BPS-CN-FW CTTGTTCTTGTTAACGACGAGGGCC Calculation of BPS copy number BPS-CN-R GAAGTGAAGACGGAGTCGGTTGAAGC Calculation of BPS copy number Actin (reference for copy number/ normal- act-CN-FW ATGGAGGGTATGTTATTCCAGTTGTGG ization of qPCR data) Actin (reference for copy number/ normal- act-CN-R TGCGGTGAACGATGGAAGGACC ization of qPCR data) Calculation of ACVS copy number/ pcbAB-FW CGATATCAGCTCGCATCGACTG expression Calculation of ACVS copy number/ pcbAB-R CGCTTCGATACTCTCGACC expression roqA-FW CTTGGTGGATGCAGCGAAGG roqA expression roqA-R CTGTGAGAGAGGCTCTTGAGTA roqA expression chyA-FW GAGCCAACTCTGTTGTCTACG chyA expression chyA-R CAGGGCAATTTGCCTCATTCTG chyA expression

6 152 (Shim pack XR-ODS 2.2; 3.0×75 mm, Shimadzu, Japan) coupled to a LC/MS Orbitrap device (Thermo Fisher Scientific, Bleiswijk, The Netherlands) oper- ated in positive ionization mode. A gradient program with water (A), aceto- nitrile (B) and 2 % formic acid in water (D) was run; 0 min; A 90 %, B 5 %, C 5 %; 4 min, A 90 %, B 5 %, C 5 %; 13 min, A 0 %, B 95 %, C 5 %; 16 min A 0 %, B 95 %, C 5 %; 16 min, A 90 %, B 5 %, C 5 %; 20 min A 90 %, B 5 %, C 5 % at a flow rate of 0.3 ml min-1. Two technical replicates were recorded for each sample. Available standards were used to identify peaks according to reten- tion time and accurate mass. A complete list of all metabolites and interme- diates which were identified is summarized in Supplementary table 1.

Results

MLP selection, model evaluation and in vitro analysis

In order to examine if a prokaryotic MLP has the potential to interact with eukaryotic NRPS, possible co-complexes were modeled based on the SlgNI structure in complex with various MLP. For this purpose, 5 MLPs were se- lected, which are derived from 4 sources. TEG and VEG8, were extracted from metagenomic data and Tcp13, CDAI and COM from various prokaryotic sources (Table 1). Moreover, 7 adenylation domains were selected, derived from the three fungal NRPS enzymes ACVS (3), RoqA (2) and ChyA (2), asso- ciated with the penicillin, roquefortine and chrysogenin pathways, respec- tively. The two platforms RaptorX and SWISS-MODEL served as a basis for

the creation of over 100 in silico models, using 23 PDB templates of 18 pro- metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium teins. General model quality estimates (GMQE 0–1) for adenylation domain models are in a range from 0.44 for ChyA up to 0.71 for ACVS and 0.59–0.82 for the MLPs (Supplementary table 2). The obtained models were structur- ally aligned to SlgNI (PDB: 4GR4) using PyMOL, in a global as well as inter- 6 face directed fashion. Due to the absence of an Asub domain in 4GR4, only the ~340 Acore residues were aligned, covering the hypothetical A-MLP interface.

The process allowed for a precise alignment of up to 305 (340) αC carbons in ACVS models and 49 (61) in COM MLP models, at RMSD values of <1 (Sup- plementary table 2). All sequences of the adenylation domains, MLP and the corresponding templates were subsequently aligned and the secondary structural data ex- tracted and overlaid (Figure 1A/B). Hereby all hypothetically interface form- ing (IF) and interactive residues (IA) were determined, according to the SlgNI structure, which counts 29 IF and 6 IA on the MLP domain and 32 IF and

153 12 IA on the adenylation domain, respectively. The alignment reveals that the selected MLP are conserved on 20–23 (29) IF positions as well as for all 6 (6) IA residues (Figure 1C). With respect to the adenylation domains, 8–13 (32) IF and 4–10 (12) IA are conserved (Figure 1D). The region resolving around α-helix 11 and β-sheet 22 (4GR4) on the adenylation domains, furthermore includes two core alanine residues at position 428 and 433, centered at the interface area. Although A433 seems to be completely absent from all in- vestigated domains, ACVS M2 and RoqA M2 both contain A428 nonetheless. Especially ACVS M2 A appears to have a high similarity to SlgNI. With re- spect to the MLPs, not only the A428/A433 interacting partners Ser23 and Lys24 are conserved, but all MLPs contain the minimal MLP functionality motif (Figure 1D). Overall, the MLPs structures are highly conserved, exhib- iting a ubiquitously seen structural pattern, composed of three anti-parallel β-sheets and one α-helix. The most significant differences can be found in the disordered N-terminus, which is however not thought to be involved in A-MLP association. A more diverse secondary structure topology is observed in the adenylation domains. Even though the upstream region of α-helix 11 is fairly conserved, there is a 15 amino acid stretch downstream of β-sheet 22, containing a short insertion, only observed in the P. chrysogenum adenyla- tion domains. However, this does not involve any hypothetical interacting residues. Compared to the given templates, the local structure seems well preserved, which is further underlined by the comparison of ACVS M2 A and COM MLP with SlgNI (Figure 1A). In spite of the absent Asub domain in SlgNI, the remaining analogous structures align well and the local interface align- ment reveals a good positioning of α-helix 11 and β-sheet 22 (Figure 1B) at the core interface. In order to test the presumed A-MLP interaction findings, in vitro co-­

Results purification experiments were conducted, employing A-domains of the Nocar- dia lactamdurans ACVS. The N. lactamdurans ACVS is a functionally conserved ACVS with a high homology to the P. chrysogenum ACVS. However, the latter is poorly expressed in E. coli (Unpublished data/Chapter II). Nonetheless, nei- ther of these proteins has known interactions with MLPs. Modules were de- termined using the Conserved Domain Database (CCD) prediction tool or were split at the center of the inter-module linkers (Unpublished data/Chapter II). Subsequently, the modules were co-expressed along with one of the MLPs, purified using a C-terminal bound HIS-tag and visualized using SDS-PAGE. De- spite the considerable high level of MLP conservation, a distinct co-purification pattern emerged (Figure 2). COM, TEG and Tcp13 co-purify significantly with their respective co-expressed ACVS module while CDAI nor VEG8 failed to co-purify with the ACVS module or full length protein (Supplementary Data).

6 154 Prokaryotic MbtH like proteins stimulate secondary metabolism metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium 6

155 Results

6 156 MLP strain selection, analysis and evaluation

To examine the impact of a set of bacterial MLPs in vivo, the genes of all five proteins were transformed into P. chrysogenum using a non-targeted genome insertion strategy. Two strains served as basis for this purpose, P. chrysoge- num DS47274 and DS17690. Strain DS17690 emerged from the classical strain improvement program (CSI) to optimize penicillin production and con- tains eight copies of the penicillin biosynthetic gene clusters. Strain DS47274 was derived from strain DS17690 and differs only by containing a single bio- synthetic gene cluster, but is further genetically identical. The genomic in- tegration of the different MLP variants was determined using colony PCR (Supplementary figure 3 and 4). MLP transformation yielded 62 strains, whereof 32 originated from DS17690 and 30 from DS47274 as parental strain, respectively. Shake flask cultivations were subsequently performed and the supernatant was subjected to secondary metabolites profiling using LC-MS (Supplementary table 1). The metabolites of three pathways were analyzed in more detail, i.e., penicillin (ACVS — Pc21g21390), chrysogine (ChyA — Pc21g12630) and roquefortine (RoqA — Pc21g15480), respectively. A total of at least 26 metabolites is known to be associated with these pathways. Metabolite levels represent peak area values, normalized for dry weight, expressed in a relative ratio to the parental strain and were determined after two and five days of the fermentation.

Figure 1 — Structural model comparison and alignment of SlgNI with ACVS and COM MLP.

Alignment of SlgNI (PDB: 4GR4 — red) structure with the most refined A domain model re- metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium trieved from SWISS Model (GMQE = 0.80), showing ACVS M2 A (rose-pink) (A). Indicated are MLP domain (green), MLP-A linker region (blue) as well as the two adenylation domain parts (Asub and Acore). The structural alignment of SlgNI — ACVS M2 A indicates a structural con- served interface, shown in an enhanced representation (B). The core binding structure α-11 6 and β-22 as well as the important interactive residues S23, L24 in addition to A428 and A433 are labelled and furthermore also indicated in the A domain and MLP alignments. The align- ments cover all relevant residues for interface formation IF (■) and interaction IA (X) as com- pared to SlgNI (PDB:4GR4). Conservation of IA and IF is indicated in the utmost right column. The list includes the five MLPs in this study(D) , the seven A domains of 3 NRPS (C) alongside all relevant templates used for the generation of the models, including common gene names and PDB entry. In addition, the essential MLP motif is shown in A [37]. Sequences in C are further annotated according to fungal origin (*) as well as MLP dependency (**) according to STRING [51]. A second layer contatining the extracted secondary structures of all proteins was added, showing α-helixes (red), β-sheets (green) and 310-helixes (yellow).

157 Figure 2 — Co-purification of ACVS M2 with various MLP proteins. ACVS M2 (CAT domain composition) and MLPs were co-expressed and purified using the ACVS M2 C-terminal his-tag. The NRPS module as well as the MLP are indicated with red

Results arrows. The distinct co-purification pattern favoring COM, Tcp13 and TEG emerges clearly. M = marker; CFE = cell free extract; E = elution (2-fold (II) or 5-fold (V) diluted).

LC-MS analysis of total ion chromatograms revealed large scale changes in the metabolic profile of P. chrysogenum DS17690 and DS47274 (Figure 3). These concerned in particular chrysogine and the associated metabolite chrysogine 13 as well as the penicillin precursor LLD-ACV and further de- rivatives of the penicillin pathway. In addition to these NRPS products, a series of lower abundant products appear, including all key metabolites of the three investigated clusters, penicillin, chrysogine and roquefortine and a series of unidentified compounds. With respect to the data of all 62 strains in both genetic backgrounds, product levels associated with all three pathways

6 158 Prokaryotic MbtH like proteins stimulate secondary metabolism metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium 6

Figure 3 — Example of total Ion current (TIC) of DS47274 (A) and DS17690 (B) with and without the COM MLP. The region displayed spans from 4–12 minutes of the 20 minutes total run, representing the majority of all investigated metabolites, as well as predicted, but non-assigned metabolites. Indicated in numerical order are (1) LLD-ACV, (2) Chrysogine 13, (3) Chrysogine, (4) Monoc- erin*, (5) Penicilloic G acid (Penicillin G degradation product), (6) Penicillin G.

159 Figure 4 — Overview first round of MLP strain analysis. The values indicated represent relative fold changes in peak area of a dedicated metabolite.

Results Peaks were identified prior according to accurate mass and retention time (Supplementary table 1). Numbers are further normalized for dry weight of the cultures and unchanged levels (=1) marked (- - - -). Strains are separated according to DS17690 (top) and DS47274 (bottom) strain background and day of fermentation, 2 and 5. The transformed MLP variant is further indicated following COM (●), CDAI (●), TEG (●), VEG8 (●) and Tcp13 (●). Each point represents the average of data derived from 3 experiments using each, two biological and two technical replicates.

increased (Figure 4). This trend is supported by median levels of >1 for a vari- ety of pathway associated products, underlining the consistency of the effect across a majority of the MLP expressing strains. However, multiple strains show varying effects, including lower product levels or even the abolishment

6 160 of any secondary metabolic activity in one strain (Veg8 — Figure 4). This is likely a side effect of the non-targeted genomic insertion strategy of the MLPs. Moreover, a marked difference between the two strain backgrounds DS47274 and DS17690 arises, showing overall stronger effects in lower pen- icillin producing DS47274 strains as compared to DS17690 strains. DS47274 strains expressing MLPs showed positive effects on the penicillin, roquefor- tine and chrysogine biosynthesis, while DS17690 borne strains experience the strongest effects on chrysogine derived metabolites. The NRPS targeting effect is highly evident with the core NRPS derived metabolites LLD-ACV and HTD, which are significantly increased in many DS47274 strains and are also stimulated but to a lesser degree in DS17690 strains. In addition, a series of low abundance metabolites of the roquefortine biosynthesis (Roq12, Roq13, Roq15, Roq16) seem to experience very high relative gains (>1000-fold), but it should be stressed that these are minor compounds in the WT strains. Increased levels of penicillin and roquefortine-related compounds were al- ready evident after two days, while the chrysogine-related compound that already is detected early in fermentation does not show an altered time-­ dependent trend (Figure 4). The results were further analyzed according to the specific transformed MLP (Supplementary figure 1). The 62 MLP expressing strains, represented as — (DS17690/DS47274) the following can be grouped into 22 strains con- taining COM (10/12), 10 containing CDAI (4/6), 12 Tcp13 (7/5), 10 TEG (6/4) and 8 VEG8 (5/3) — (DS17690/DS47274). Due to the experimental setup that involves random genomic integration of the MLP genes, it is not possible to clearly define differences in the abilities of the various MLPs to stimulate the different pathways. However, strains containing Tcp13, TEG and COM seem metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium to be more stimulated than the CDAI and VEG8 bearing strains. An excep- tional and coherent pattern is only observed with the COM MLP transformed strains, which experience a consistent increase in the chrysogine metabo- lites chrysogine B and N-pyrovoylanthranilamid after 2 days of fermentation. 6

COM MLP elevates the production of LLD-ACV and HTD in P. chrysogenum

Another transformation using the COM MLP was conducted and the ob- tained strains, 3 DS17690 and 6 DS47274, were further characterized for pcbAB and com copy numbers as well as com expression at 2 and 5 days of the fermentation (Figure 5). All strains contained one copy of the com gene, except for DS47274 7.1 and 8.1, which contain 5 and 8 copies of com,

161 Results

Figure 5 — Overview and characterization of obtained strains with the COM MLP. Determination of pcbAB (A) and MLP (B) copy number as well as com expression levels in se- lected strains (C). Strains are displayed in a separated manner, showing DS17690 (­8xpcbAB — left) and DS47274 (1xpcbAB — right) strain backgrounds apart. Expression levels are ad- ditionally divided into two days (light grey) and five days (dark grey) after the start of the fermentation. Expression levels were normalized using α-actin and compared in a relative manner, setting the lowest value (7–3) to 1.

6 162 Figure 6 — Heat map of relative fold changes of the COM MLP strain set. The data of the nine strains is separated by strain background, day and observed cluster. Data is derived of two independent experiments using two biological and two technical replicates. The effects targeting the penicillin and roquefortine cluster appear to be well

transferable. metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium

respectively (Figure 5B). Furthermore, DS47274 strains consistently contain one copy of pcbAB, while DS17690 strains show 6 to 8 copies (Figure 5A), The 6 com expression profile of the strains 1.2; 2.4 (8xpcbAB) and 7.3; 8.1 (1xpcbAB), indicates that the gene is expressed, after 2 and 5 days and that the ex- pression increases with the copy number (Figure 5C). The strains were sub- sequently subjected to small scale fermentations and metabolite profiling (Figure 6). Again, various effects in all three secondary metabolite clusters can be observed. In particular, the penicillin and roquefortine related me- tabolites in both the DS17690 and DS47274 strains increased with the most pronounced effects in DS47274 strains. In case of the chrysogine metabo- lites, levels increased most notably in DS47274 strains. When analyzing the increase in metabolite levels in time, and compared to the parental strain

163 LLD-ACV HTD 18 6

16 5 14 t h g e i 12 4 g e n

a

1 h r c d 3 d

08 l g o f M

u 06 2

04 1 02

0 0 WT 1-2 1-5 2-4 WT 1-2 1-5 2-4 18 5

16 45

14 4 t

h 35 g i 12 e e g n 3

a

1 h r c d

25

d

08 l g o

f 2 M

u 06 15 04 1

02 05

0 0 WT 2-412 2-414 7-1 7-3 8-1 8-4 WT 2-412 2-414 7-1 7-3 8-1 8-4

Figure 7 — LLD-ACV and HTD levels in COM MLP strains. Levels of the secreted NRPS derived compounds LLD-ACV (ACVS) and HTD (RoqA) in P.chrysogenum strains DS 47274 and DS 17690. Figures are clustered based on metabolite and strain background and day 2 (˜) and day 5 (˜) values shown in separate bars. Numbers are derived from two biological and two technical replicates, error bars indicate the stan- dard deviation.

(Supplementary figure 2), the COM-dependent effect on penicillin produc- tion levels mostly concerns an earlier onset of production, i.e., after 2 days of

Results fermentation, whereupon it stabilizes to the WT levels after 5 days. A similar behavior is observed with the chrysogine related metabolites, whereas the production of roquefortine-related compounds is stimulated in particular in the later stages of fermentation, i.e., after 5 days. The RoqA and ACVS derived products HTD and LLD-ACV increased up to 4.5- and 4.7-fold, and 2.3 and 2.4-fold in strain DS47274 and DS17690, respectively (Figure 7B). ACV levels were in particular impacted during the earlier stages (2 days) of fermenta- tion, whereas HTD levels increased more dramatically during the later stages (5 days). The levels of ACV were additionally quantified and corrected for the cell dry weight (CDW) (Figure 7A). Extracellular levels increased from 0.144 µmol/ g(CDW) in DS47274 WT up to 0.700 in the DS47274 COM MLP strain 2-4_12 at day 2 and 0.608 to 1.476, in DS17690 WT and DS17690 COM MLP strain 2-4, respectively. All these observed effects are not related to

6 164 changes in expression of the NRPSs, since the changes measured by qPCR are neglectable. (Supplementary figure 4)

Discussion

MbtH like proteins (MLP) improve the solubility and catalytic properties of NRPS complexes or NRPS adenylation domains, respectively. Even though MLPs are not intrinsic to fungal species, fungi contain a multitude of second- ary metabolite gene clusters harboring NRPS. Due to a significant degree of structural conservation among MLPs as well as NRPS adenylation domains, these NRPS may serve as a potential interaction targets for prokaryotic MLP variants. Here, 5 MLPs from various sources (Table 1) were used alongside the adenlyation domains of the P.chrysogenum ACVS (PcbAB) to create a se- ries of structural models, allowing for an evaluation of potential NRPS — MLP interactions. Herein, the models were derived from SlgNI (PDB: 4GR4), the first resolved structure showing an adenylation domain — MLP interaction. Despite a high functional conservation of all known adenylation domains, only prokaryotic domains partially rely, or even strictly require the presence of a MLP. Despite significant changes in the hypothetical interface composi- tion, many directly interacting residues are conserved in the analyzed fungal adenylation domains (Figure 1). In conjunction with the strict MLP conser- vation and the very concise NRPS A — SlgNI — MLP superposition shown in Figure 1A, this data hints, that also fungal NRPS are potentially capable of interacting with bacterial MLPs. To address this experimentally, we used the

N. lactamdurans ACVS, a homologue to the P. chrysogenum ACVS, though with metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium significantly improved heterologous expression properties inE. coli. Each of the three PcbAB modules were found to interact with the MLPs COM, TEG and Tcp13, while CDAI and VEG8 appeared less effective. Surprisingly, the chosen MLPs do not show a high divergence in their amino acids sequences 6 nor any specific motifs for the two classes of interacting and non-interacting MLPs [41]. Only detailed structural studies of A-MLP complexes may unveil precise mechanisms as to why CDAI and VEG8 performed poorly. Subsequently, all five MLP variants were testedin vivo for their impact on secondary metabolite formation. For this purpose, two P.chrysogenum strain backgrounds were used DS47274 and DS17690, containing one and eight penicillin biosynthetic clusters, respectively. The resulting strains were screened for genomic MLP integration and subjected to small scale fer- mentations and metabolite profiling. The obtained profiles revealed a truly diverse effect. Not only could we observe changes in the levels of cluster

165 related compounds in penicillin, chrysogine and roquefortine biosynthetic pathways, but also other unrelated metabolites appear to be affected by the MLP presence (Figure 3). Even though a majority of strains seem to experi- ence beneficial effects, some strains underperform relative to the respec- tive wildtype strain. Due to the random genomic integration transformation approach, however, this seems to be an effect largely caused by disruptive insertions of the MLPs which can be omitted by a targeted integration in the future. Nonetheless, with the NRPS derived products LLD-ACV (PcbAB) and HTD (RoqA), a general increase in metabolite levels is observed. The effect, especially on LLD-ACV, strongly depends on the strain background and ap- pears lower in the high penicillin yielding DS17690 strains as compared to single copy DS47274 strains suggesting that the extent of stimulation is lim- ited by the secondary metabolic capacity of the respective strain. In an attempt to further determine more specific effects of the MLPs, the strains were separated by MLP variant and strain background. Even though some specific effects were seen for N-Pyr and ChrB at Day 2 in the DS17690 COM MLP strains, overall the effect appears not to be limited to a specific MLP. Because of the coherent effects observed with a variety of NRPS derived products and related metabolites, it is evident that prokaryotic MLPs inter- act in a generic fashion with fungal NRPS. Novel strains bearing the ambiva- lent COM MLP were created resulting in 9 independent COM MLP containing strains. Further characterization of com and pcbAB copy number as well as com expression levels (Figure 5), revealed a single copy of com MLP with the exception of two DS47274 strains (Figure 5B). Despite showing significantly higher expression levels after 5 days, there was no further additional effect linked to copy number nor expression levels. MLP proteins are highly con- served and have a high intrinsic promiscuity for different NRPS [50], thus it

Conclusion and perspectives Conclusion seems that no additional benefits result from higher intracellular levels. The expression of the NRPS

Conclusion and perspectives

Using five selected bacterial MLPs expressed in two strains of the filamen- tous fungi Penicillium chrysogenum, we have successfully shown that fungal secondary metabolism can be effectively stimulated by prokaryotic MLPs. The presence of the MLPs not only increased metabolite levels in three in- dividual NRP pathways, but consistent effects were observed across multi- ple strain backgrounds, as well as with all MLP variants used in this study. This suggests a transferable effect, applicable to a wide range of other NRPS

6 166 containing organisms, making the MLP-dependent effect not only of interest from a fundamental point of view, but also of value for biotechnological man- ufacturing processes or even as a tool for the discovery of low-abundance metabolites of potential therapeutical relevance.

Acknowledgments

We would like to thank the BE-Basic consortium for the financial support of this project. Furthermore we thank DSM Sinochem Pharmaceuticals for their technical and intellectual support in this study. Carsten Pohl was supported by the European Commission Marie Curie Initial Training Network Quantfung (FP7-People-2013-ITN, grant no. 607332)

References

1. Wang F-Q, Zhong J, Zhao Y, Xiao J, Liu 5. Ries MI, Ali H, Lankhorst PP, Hankemeier J, Dai M, et al. Genome sequencing of T, Bovenberg RAL, Driessen AJM, et al. high-penicillin producing industrial Novel key metabolites reveal further strain of Penicillium chrysogenum. BMC branching of the roquefortine/melea- Genomics. England; 2014;15 Suppl 1: grin biosynthetic pathway. J Biol Chem. S11. doi:10.1186/1471-2164-15-S1-S11 United States; 2013;288: 37289–37295. 2. Houbraken J, Frisvad JC, Seifert KA, Overy doi:10.1074/jbc.M113.512665 DP, Tuthill DM, Valdez JG, et al. New pen- 6. Ali H, Ries MI, Nijland JG, Lankhorst PP,

icillin-producing Penicillium species and Hankemeier T, Bovenberg RAL, et al. A metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium an overview of section Chrysogena. Per- branched biosynthetic pathway is involved soonia. Netherlands; 2012;29: 78–100. in production of roquefortine and related doi:10.3767/003158512X660571 compounds in Penicillium chrysogenum. 3. Raper KB, Alexander DF, Coghill RD. PLoS One. United States; 2013;8: e65328. 6 Penicillin: II. Natural variation and peni- doi:10.1371/journal.pone.0065328 cillin production in Penicillium notatum 7. Matsuda Y, Awakawa T, Abe I. Reconsti- and allied species. J Bacteriol. United tuted biosynthesis of fungal meroter- States; 1944;48: 639–659. penoid andrastin A. Tetrahedron. 2013;69: 4. Fleming A. On the antibacterial action 8199–8204. doi:10.1016/j.tet.2013.07.029 of cultures of a penicillium, with special 8. Frisvad JC, Smedsgaard J, Larsen TO, reference to their use in the isolation of Samson RA. Mycotoxins, drugs and B. influenzæ. Br J Exp Pathol. 1929;10: other extrolites produced by species in 226–236. http://www.ncbi.nlm.nih.gov/ penicillium subgenus penicillium. Stud pmc/articles/PMC2048009/ Mycol. 2004;49: 201–241.

167 9. Scott PM, Kennedy PC. Analysis of blue of Penicillium chrysogenum. Biotechnol cheese for roquefortine and other alkaloids Prog. United States; 1995;11: 299–305. from Penicillium roqueforti. J Agric Food doi:10.1021/bp00033a010 Chem. United States; 1976;24: 865–868. 17. Aharonowitz Y, Cohen G, Martin JF. Pen- 10. Hikino H, Nabetani S, Takemoto T. Struc- icillin and cephalosporin biosynthetic ture and biosynthesis of chrysogine, a genes: structure, organization, regula- metabolite of Penicillium chrysogenum. tion, and evolution. Annu Rev Microbiol. Yakugaku Zasshi. Japan; 1973;93: 619–623. United States; 1992;46: 461–495. 11. Salo O V, Ries M, Medema MH, Lank- doi:10.1146/annurev.mi.46.100192.002333 horst PP, Vreeken RJ, Bovenberg RAL, et 18. Tahlan K, Moore MA, Jensen SE. δ-(l-α- al. Genomic mutational analysis of the aminoadipyl)-l-cysteinyl-d-valine­ synthe- impact of the classical strain improve- tase (ACVS): discovery and perspectives. ment program on beta-lactam produc- J Ind Microbiol & Biotechnol. 2017;44: ing Penicillium chrysogenum. BMC Ge- 517–524. doi:10.1007/s10295-016-1850-7 nomics. England; 2015;16: 937. 19. Byford MF, Baldwin JE, Shiau C-Y, Scho- doi:10.1186/s12864-015-2154-4 field CJ. The Mechanism of ACV Synthe- 12. Weber SS, Bovenberg RAL, Driessen tase. Chem Rev. 1997;97: 2631–2650. AJM. Biosynthetic concepts for the pro- http://www.ncbi.nlm.nih.gov/pubmed/ duction of beta-lactam antibiotics in 11851475 Penicillium chrysogenum. Biotechnol J. 20. van Liempt H, von Dohren H, Kleinkauf H. Germany; 2012;7: 225–236. delta-(L-alpha-aminoadipyl)-L-cysteinyl-­ doi:10.1002/biot.201100065 D-valine synthetase from Aspergillus 13. Smith JL, Sherman DH. An enzyme assem- nidulans. The first enzyme in penicillin bly line. Science. 2008;321: 1304–5. biosynthesis is a multifunctional pep- doi:10.1126/science.1163785 tide synthetase. J Biol Chem. United 14. Weber T, Marahiel MA. Exploring the States; 1989;264: 3680–3684. domain structure of modular nonribo- 21. Zheng CJ, Sohn M-J, Lee S, Kim W-G. Me-

References somal peptide synthetases. Structure. leagrin, a new FabI inhibitor from Penicil- 2001;9: R3–R9. lium chryosogenum with at least one ad- doi:10.1016/S0969-2126(00)00560-8 ditional mode of action. PLoS One. Public 15. van den Berg MA, Albang R, Albermann Library of Science; 2013;8: e78922. https:// K, Badger JH, Daran J-M, Driessen AJM, doi.org/10.1371/journal.pone.0078922 et al. Genome sequencing and analysis 22. Garcia-Estrada C, Ullan R V, Albillos SM, of the filamentous fungus Penicillium Fernandez-Bodega MA, Durek P, von chrysogenum. Nat Biotechnol. United Dohren H, et al. A single cluster of coreg- States; 2008;26: 1161–1168. ulated genes encodes the biosynthesis of doi:10.1038/nbt.1498 the mycotoxins roquefortine C and me- 16. Nielsen J, Jorgensen HS. Metabolic con- leagrin in Penicillium chrysogenum. Chem trol analysis of the penicillin biosyn- Biol. United States; 2011;18: 1499–1512. thetic pathway in a high-yielding strain doi:10.1016/j.chembiol.2011.08.012

6 168 23. Ames BD, Walsh CT. Anthranilate-­ 30. Chen D, Wu R, Bryan TL, Dunaway-Mari- activating modules from fungal nonri- ano D. In vitro kinetic analysis of sub- bosomal peptide assembly lines. bio- strate specificity in enterobactin bio- chemistry. American Chemical Society; synthetic lower pathway enzymes 2010;49: 3351–3365. provides insight into the biochemical doi:10.1021/bi100198y function of the hotdog-fold thioesterase 24. Winn M, Fyans JK, Zhuo Y, Micklefield J. Re- EntH. Biochemistry. 2009;48: 511–513. cent advances in engineering nonribosomal doi:10.1021/bi802207t peptide assembly lines. Nat Prod Rep. The 31. Stegmann E, Rausch C, Stockert S, Burk- Royal Society of Chemistry; 2016;33: ert D, Wohlleben W. The small MbtH-like 317–347. doi:10.1039/C5NP00099H protein encoded by an internal gene of the 25. Beer R, Herbst K, Ignatiadis N, Kats I, Ad- balhimycin biosynthetic gene cluster is lung L, Meyer H, et al. Creating functional not required for glycopeptide production. engineered variants of the single-module FEMS Microbiol Lett. 2006;262: 85–92. non-ribosomal peptide synthetase IndC by doi:10.1111/j.1574-6968.2006.00368.x T domain exchange. Mol Biosyst. 2014;10: 32. Davidsen JM, Bartley DM, Townsend 1709–18. doi:10.1039/c3mb70594c CA. Nonribosomal propeptide precursor 26. Gruenewald S, Mootz HD, Stehmeier P, in nocardicin A biosynthesis predicted Stachelhaus T. In vivo production of ar- from adenylation domain specificity tificial nonribosomal peptide products dependent on the MbtH family protein in the heterologous host Escherichia NocI. J Am Chem Soc. 2013;135: 1749–59. coli. Appl Environ Microbiol. 2004;70: doi:10.1021/ja307710d 3282–3291. doi:10.1128/AEM.70.6.3282 33. Herbst D a., Boll B, Zocher G, Stehle T, 27. Nakano MM, Corbell N, Besson J, Zuber Heide L. Structural basis of the interac- P. Isolation and characterization of sfp: a tion of MbtH-like proteins, putative reg-

gene that functions in the production of ulators of nonribosomal peptide biosyn- metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium the lipopeptide biosurfactant, surfactin, thesis, with adenylating enzymes. J Biol in Bacillus subtilis. MGG Mol Gen Genet. Chem. 2013;288: 1991–2003. Springer-Verlag; 1992;232: 313–321. doi:10.1074/jbc.M112.420182 28. Marahiel MA, Stachelhaus T, Mootz HD. 34. Boll B, Taubitz T, Heide L. Role of 6 Modular peptide synthetases involved in MbtH-like proteins in the adenylation nonribosomal peptide synthesis. Chem of tyrosine during aminocoumarin and Rev. American Chemical Society; 1997;97: vancomycin biosynthesis. J Biol Chem. 2651–2674. doi:10.1021/cr960029e 2011;286: 36281–36290. 29. Beld J, Sonnenschein EC, Vickery CR, doi:10.1074/jbc.M111.288092 Noel JP, Burkart MD. The phosphopan- 35. Tatham E, Sundaram Chavadi S, Mo- tetheinyl transferases: catalysis of a handas P, Edupuganti UR, Angala SK, posttranslational modification crucial Chatterjee D, et al. Production of my- for life. Nat Prod Rep. 2014;31: 61–108. cobacterial cell wall glycopeptidolipids doi:10.1039/c3np70054b requires a member of the MbtH-like

169 protein family. BMC Microbiol. 2012;12: Nucleic Acids Res. England; 2015;43: 118. doi:10.1186/1471-2180-12-118 D222–6. doi:10.1093/nar/gku1221 36. Wolpert M, Gust B, Kammerer B, Heide 43. Källberg M, Wang H, Wang S, Peng J, L. Effects of deletions of mbtH-like Wang Z, Lu H, et al. Template-based genes on clorobiocin biosynthesis in protein structure modeling using the Streptomyces coelicolor. Microbiology. RaptorX web server. Nat Protoc.; 2012;7: 2007;153: 1413–23. 1511–1522. doi:10.1099/mic.0.2006/002998-0 http://dx.doi.org/10.1038/nprot.2012.085 37. Baltz RH. Function of MbtH homologs 44. Biasini M, Bienert S, Waterhouse A, in nonribosomal peptide biosynthesis Arnold K, Studer G, Schmidt T, et al. and applications in secondary metabo- SWISS-MODEL: modelling protein ter- lite discovery. J Ind Microbiol Biotechnol. tiary and quaternary structure using 2011;38: 1747–1760. evolutionary information. Nucleic Acids doi:10.1007/s10295-011-1022-8 Res. England; 2014;42: W252–8. 38. Felnagle EA, Barkei JJ, Park H, Podev- doi:10.1093/nar/gku340 els AM, McMahon MD, Drott DW, et al. 45. Schrödinger, LLC. The {PyMOL} Molecular MbtH-like proteins as integral compo- graphics system, Version~1.8. 2015 Nov. nents of bacterial nonribosomal peptide 46. Segura J, Sanchez-Garcia R, Tabas-Ma- synthetases. Biochemistry. American drid D, Cuenca-Alba J, Sorzano COS, Chemical Society; 2010;49: 8815–7. Carazo JM. 3DIANA: 3D domain interac- doi:10.1021/bi1012854 tion analysis: A toolbox for quaternary 39. Zhang W, Heemstra JR, Walsh CT, Imker structure modeling. Biophys J. United HJ. Activation of the pacidamycin pacl States; 2016;110: 766–775. adenylation domain by MbtH-like pro- doi:10.1016/j.bpj.2015.11.3519 teins. Biochemistry. 2010;49: 9946–9947. 47. Kovalchuk A, Weber SS, Nijland JG, doi:10.1021/bi101539b Bovenberg RAL, Driessen AJM. Fungal 40. Caboche S, Pupin M, Leclère V, Fontaine ABC transporter deletion and localiza-

References A, Jacques P, Kucherov G. NORINE: a da- tion analysis BT — Plant Fungal Patho- tabase of nonribosomal peptides. Nu- gens: Methods and Protocols. In: Bolton cleic Acids Res. 2008;36: D326–31. MD, Thomma BPHJ, editors. Totowa, NJ: doi:10.1093/nar/gkm792 Humana Press; 2012. pp. 1–16. 41. Miller BR, Drake EJ, Shi C, Aldrich CC, Gu- doi:10.1007/978-1-61779-501-5_1 lick AM. Structures of a nonribosomal pep- 48. Kovalchuk A, Weber SS, Nijland JG, Boven- tide synthetase module bound to MbtH- berg RAL, Driessen AJM. Fungal ABC like proteins support a highly dynamic transporter deletion and localization anal- domain architecture. J Biol Chem . 2016; ysis. Methods Mol Biol. 2012;835: 1–16. doi:10.1074/jbc.M116.746297 doi:10.1007/978-1-61779-501-5_1 42. Marchler-Bauer A, Derbyshire MK, Gon- 49. Livak KJ, Schmittgen TD. Analysis of zales NR, Lu S, Chitsaz F, Geer LY, et al. relative gene expression data using CDD: NCBI’s conserved domain database. real-time quantitative PCR and the

6 170 2(-Delta Delta C(T)) Method. Meth- ods. United States; 2001;25: 402–408. doi:10.1006/meth.2001.1262 50. Boll B, Heide L. A domain of rubc1 of ru- bradirin biosynthesis can functionally re- place MbtH-like proteins in tyrosine ade- nylation. ChemBioChem. 2013;14: 43–44. doi:10.1002/cbic.201200633 51. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein inter- action networks, integrated over the tree of life. Nucleic Acids Res. England; 2015;43: D447–52. doi:10.1093/nar/gku1003 Prokaryotic MbtH like proteins stimulate secondary metabolism metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium 6

171 Supplementary data

D D A Day Day Day Day n 10 n 12 C

n 4 n 6

CDA

n 7 n 5 ge cha n change d

CP fold fo l ela tive elative R R

n 6 n 4 G Supplementary data n 5 n 3 G Chr Chr Chr Chr PN PN PN PN ACV ACV ACV ACV TD TD TD TD ele ele ele ele Neo Neo Neo Neo la la la la la A la A la A la A Chr C Chr C Chr C Chr C Chr Chr Chr Chr DTD DTD DTD DTD N-Pyr N-Pyr N-Pyr N-Pyr Ro Ro Ro Ro Ro C Ro C Ro C Ro C Pen Pen Pen Pen Ro D Ro D Ro D Ro D Chr 7 Chr 7 Chr 7 Chr 7 Ro N Ro N Ro N Ro N 6-APA 6-APA 6-APA ǂ ǂ ǂ 6-APA ǂ ǂ ǂ ǂ Ro Ro Ro Ro Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 ǂ Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Penicillin Chrysogenin Roquefortine Penicillin Chrysogenin Roquefortine

6 172 D D A Day Day Day Day n 10 n 12 C

n 4 n 6

CDA Supplementary figure 1 — First experiment

separated by n 7 n 5 MLP. Separated view, ge cha n change

d displaying all

CP fold fo l strains of one ela tive elative R R MLP, divided by strain back-

ground and day n 6 n 4 of fermentation.

metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium The number n indicates G the amount of different strains which 6 were analyzed. n 5 n 3 Each point represents the

G average of data derived from 3 experiments using each, two Chr Chr Chr Chr PN PN PN PN biological and ACV ACV ACV ACV TD TD TD TD ele ele ele ele Neo Neo Neo Neo la la la la la A la A la A la A Chr C Chr C Chr C Chr C Chr Chr Chr Chr DTD DTD DTD DTD N-Pyr N-Pyr N-Pyr N-Pyr Ro Ro Ro Ro Ro C Ro C Ro C Ro C Pen Pen Pen Pen Ro D Ro D Ro D Ro D Chr 7 Chr 7 Chr 7 Chr 7 Ro N Ro N Ro N Ro N 6-APA 6-APA 6-APA ǂ ǂ ǂ 6-APA ǂ ǂ ǂ ǂ Ro Ro Ro Ro Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 ǂ Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Penicillin Chrysogenin Roquefortine Penicillin Chrysogenin Roquefortine two technical replicates.

173

Ro N Ro

Ro Ro

Ro Ro

Ro D Ro

Ro C Ro

Neo

ele

la la

la A la

DTD TD ǂ

N-Pyr

16 Chr

Chr 15 Chr

Chr 14 Chr

Day 13 Chr

Chr 12 Chr

Chr 11 Chr

Chr 7 Chr

Chr C Chr

Chr Chr

Chr

Pen Pen

ACV

PN ǂ 6-APA

C

Ro N Ro

Ro Ro

Ro Ro

Ro D Ro

Ro C Ro

Neo

ele

la la

la A la

DTD TD

ǂ

N-Pyr

Chr 16 Chr

Day 15 Chr

Chr 14 Chr

Chr 13 Chr

Chr 12 Chr

Chr 11 Chr

Chr 7 Chr

Chr C Chr

Chr Chr

Chr

Pen Pen

ACV

PN ǂ 6-APA

Ro N Ro Ro Ro

e Ro Ro

32 30

n n D Ro r ti n Ro C Ro

f o

Neo e Supplementary data

ele

q u la la

la A la R o

DTD TD

ǂ

N-Pyr

Chr 16 Chr

Day 15 Chr Chr 14 Chr

ne Chr 13 Chr

Chr 12 Chr

Chr 11 Chr

Chr 7 Chr rysog i

Chr C Chr C h

Chr Chr

Chr

Pen Pen in l ACV

cil PN ǂ

n i 6-APA P e A

e g n a h c d l o f e v i t a l e R D D

6 174 Supplementary figure 3 — Colony PCR results confirm integration of MLP expression Supplementary Figure 3casettes. – Colony PCR results confirm integration of MLP expression casettes Colony PCR was performed on various colonies that were transformed with MLP expres- Colony PCR was performed on various colonies that were transforsion casettes using pIPNS-FW and AT-R. The integration ofmed with MLP expression casettes the helper protein expression cassette was confirmed in most of the colonies. The results suggest that only few colonies using pIPNS‐FW and AT‐R. The integration of the helper protein expression cassette was confirmed in were not co-transformed with the helper protein construct. Expected product sizes: BPS = most of the colonies. The results suggest that only few colonie1353 bp; COM = 1353 bp; CDAI = 1344 bp; Tcp13 = 1338s were not co‐transformed with the helper bp; TEG = 1341 bp; VEG8 = 341 bp. protein construct. Expected product sizes: BPS = 1353 bp; COM =Marker: 1kb ladder (Fermentas). As positive control (+), 1353 bp; CDAI = 1344 bp; TCP13 = 1338pIAT-COM was used with primers used for colony PCR. bp; TEG = 1341 bp; VEG8 = 341 bp. Marker: 1kb ladder (Fermentas). As positive control (+), pIAT‐COM was used with primers used for colony PCR. Prokaryotic MbtH like proteins stimulate secondary metabolism metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium 6 Supplementary figure 2 — Overview second round of COM MLP strain analysis. The values indicated represent relative fold changes in peak area of a dedicated metabolite. Peaks were identified prior according to accurate mass and retention time. Numbers are further normalized for dry weight of the cultures and unchanged levels (=1) marked (- - - -). Strains are separated according to DS17690 (top) and DS47274 (bottom) strain background,

232 day of fermentation, 2 (A) and 5 (B) and finally overlaid using day 2 ●( ) and day 5 (●) val- ues (C). Individual strains are represented as DS17690 WT (●), 1-2 (X), 1-5(+), 2-4(∆) and DS47274 WT (●), 2-4_12 (■), 2-4_14 ( ), 7-1 (○), 7-3 (□), 8-1 (X) and 8-4 (∆). Each point represents the average of data derived from 3 experiments using each, two biological and two technical replicates.

175 in 4 strains that were transformed with the COM expression cassette. No strong strong cassette. No expression with the COM transformed that were in 4 strains roqA Supplementary data change in gene expression was observed in either strain, suggesting that the effect is entirely kinetic and does not stimulate expression changes. expression stimulate kinetic and does not is entirely that the effect suggesting strain, in either observed was expression in gene change Supplementary figure 4 — expression changes in NRPS genes measured by qPCR. by genes measured in NRPS changes expression 4 — figure Supplementary pcbAB, chyA and chyA pcbAB, of the expression to measure performed A qPCR was

6 176 Supplementary table 1 — Measured metabolites and their corresponding biosynthetic pathways. Compound names, Monoisotopic Ionized masses (M/Z [H]+), Molecular composition and retention times for the applied LC program are indicated.

M/Z [H]+ Compound Formula RT (min) ppm

6-Aminopenicillanic acid 217.06 C8H12N2O3S 1.84 Penicillin δ-(L-α-Aminoadipyl)-L-cysteinyl-D-valine 364.15 C14H25N3O6S 7.80 Isopenicillin N 360.12 C14H21N3O6S 2.44 Penicillin G 335.11 C16H18N2O4S 10.47 N-pyrovoylanthranilamid 207.08 C10H10N2O3 7.48 Chrysogine VII 277.08 C13H12N2O5 7.94 Chrysogine XI 338.13 C15H19N3O6 8.17 Chrysogine XII 337.15 C15H20N4O5 7.07 Chrysogine XIII 295.11 C13H14N2O6 7.95 Chrysogine Chrysogine XIV 276.10 C13H13N3O4 8.28 Chrysogine XV 336.11 C15H18N3O6 9.43 Chrysogine XVI 413.15 C20H20N4O6 8.90 Chrysogine 191.08 C10H10N2O2 8.24 Chrysogine B 250.12 C13H15N3O3 7.99 Chrysogine C 294.11 C13H15N3O5 7.95 HTD 324.15 C17H17N5O2 5.51 DHTD 322.13 C17H15N5O2 6.62 Roquefortine C 390.19 C22H23N5O2 9.49 Roquefortine D 392.21 C22H25N5O2 8.94 Roquefortine F 420.20 C23H25N5O3 9.75 Roquefortine Roquefortine M 422.18 C22H23N5O4 8.92 Roquefortine N 440.19 C22H25N5O5 8.19 Glandicoline A 404.17 C22H21N5O3 9.72 Glandicoline B 420.17 C22H21N5O4 8.93 Meleagrine 434.18 C23H23N5O4 9.17 Neoxaline 436.19 C23H26N5O4 9.17 Prokaryotic MbtH like proteins stimulate secondary metabolism metabolism stimulate secondary proteins like MbtH Prokaryotic chrysogenum in the filamentous fungus Penicillium 6

177

CHAPTER VII

Summary and outlook

Reto D. Zwahlen

Microbial bioactive compounds like antibiotics and anti-cancer drugs have significant economic value in global food and health markets and there is a continuous need for more efficient and sustainable methods for their pro- duction. In addition, microbial bioactive compounds play an important role in the advancement of many scientific disciplines, including medicine and biol- ogy. Especially in the context of the rapid global spreading of microbial resis- tance to antibiotics, the discovery, development and engineering of bioactive compounds with antimicrobial activity has emerged as an extremely high priority issue. Predominantly, bioactive compounds can be attributed to sec- ondary metabolism of microorganisms and plants. Typically, the genes es- sential for secondary metabolite production are linked in the genome and arranged in so called biosynthetic gene clusters (BGC), dedicated to the pro- duction of one or a family of closely related compounds. Each cluster is com- prised of a series of genes that encode proteins needed for the synthesis, modification, and/or activation of the compound ultimately produced. The core element of a BGC is either a nonribosomal peptide synthetase (NRPS), a polyketide synthetase (PKS) or a NRPS-PKS hybrid [1], all large, modular and complex multi-domain enzymes with comprehensive molecular machineries. The multi-modular NRPS enzymes work with a thiotemplate mechanism, in- corporating single amino acids as building blocks stepwise into a growing chain, thereby constructing the backbone of the specific secondary metabo- lite [2]. The sequence of substrate incorporation is closely linked to the mod- ular setup and the domains present within the modules of the NRPS. Gener- ally, every module is responsible for the selection, activation and incorporation of a distinct building block which can be a proteinogenic or non-proteinogenic­ amino acid in the case of the NRPS [3]. Product formation happens in either a linear synthesis mode, specified by the NRPS modules from its N- to C- ter- minus, a non-linear or an iterative manner. However, the minimal composi- tion of each functional module remains the same and must cover an adenyla- tion domain (A), a thiolation domain (T/PCP) and in case of a multi-modular Summary and outlook Summary setup, a condensation domain (C) [4]. The A domain works in a bifunctional manner, divided into an adenylation and thioesterification reaction, which takes place in this sequence and requires a structural rearrangement [5]. Ini- tially, based upon the A domain specificity, an amino acid is selected and 7 using ATP, an adenyl-amino acid conjugate is formed. Subsequently, the con- jugate is transferred to the T-domain can covalently linked to a phosphopan- tetheine (ppant) moiety attached to a conserved serine residue, by means of a thioesterification reaction. The ppant attachment occurs after NRPS- ex pression as an essential post-translational modification of the NRPS, and is performed by a phosphopantetheinyl transferase (PPTase) [6]. After the

181 amino acid is attached to the ppant arm, the dynamic movement of the A domain, reversing to the adenylation position, pushes the ppant arm into the donor active site of the C domain, where proofreading as well peptide-bond formation with the acceptor substrate takes place. Depending on the number of modules, these steps are repeated, until the fully matured polypeptide reaches the termination module, where another proofreading step takes place carried out by a thioesterase domain (Te), which, catalytically releases the compound from the NRPS. The released product may not have reached its bioactive or final state yet. Next to alterations prior or during the assem- bly [7] in reactions carried out by modifying domains or standalone tailoring enzymes, the released product may be further modified in trans [8]. The com- bined complexity is tremendous and the backbone structure may comprise as many as 15 amino acids [9] as well as various side group exchanges or modification, epimerization, side branching or (macro-) cyclization of the compound. However, multi NRPS systems [10–11] as well as product multi- merization add yet an additional layer of complexity to the resulting product, which is fully reflected in the diverse structures and functions of the final secondary metabolites. A more comprehensive explanation with examples of secondary metabolites is described in chapter I. Furthermore, overall the NRPS biochemistry as well as their domain specific biochemistry is discussed, highlighting the incredible complexity of the reactions. In the context of en- gineering the difficulty and bottlenecks encountered working with these large and complex enzymes are discussed including the strategies and avail- able tools and their respective advantages and disadvantages. Some of these tools were selected and applied for the design and engineering of a novel hybrid NRPS projected to synthesize the hpgCV tripeptide, a potential pre- cursor to D-amoxicillin, as discussed in chapters II and III. A series of hpg activating NRPS modules from a variety of genomic and meta-genomic sources were selected (chapter II) and in parallel a semi high throughput pro- cess was developed to successfully identify modules and domains, respec- tively, with the predicted specificity. We also succeeded in implementing a non-radiolabel dependent adenylation activity assay based on pyrophos- Summary and outlook Summary phate detection. Additionally, we have demonstrated, that the in some cases essential MbtH-like protein (MLP), an adenylation domain associated chap- eroning protein, is necessary for the activity of selected hpg activating do- mains. This has not been reported in any previous studies, thus representing a novel observation for hpg activating domains and modules. Hereby identi- fied modules were subsequently embedded into a system creating a series of hybrid NRPS or hpgCV-synthetase (Chapter III). Using the two C-terminal modules of the ACVS, specific to cysteine and valine, we developed a system

7 182 for the generation of different hybrid NRPS. Although the system was effi- cient in generating the desired hybrids, the subsequent expression of the obtained hybrids was very difficult. Even though some expression of most hybrid NRPS could be achieved, it required MLP co-expression and resulted in low levels of purified protein. Further biochemical evaluation failed to demonstrate the production of the desired hpgCV tripeptide in vitro. None- theless, an in vitro product formation assay as well as a suitable analytical mass spectrometry methodology to detect product formation were estab- lished, and successfully applied in chapters IV–VI. Extensive in vitro produc- tion assays were employed for the characterization of the Nocardia lactamdu- rans ACVS (Chapter VI). In this chapter we established basic kinetic parameters of this ACVS and its substrate specificity, revealing several novel, non-recorded, tripeptides. In addition to this biochemical evaluation, we es- tablished a two-step purification and stabilization procedure for ACVS sam- ple preparation. Following this procedure, we obtained ACVS enzyme at high purity which allowed us to generate electron microscopic images, data and models, reaching an unprecedented detailed picture for a multi-modular NRPS structure. The crude structure, is in line with a recently proposed multi-modular NRPS model predicting a domain arrangement which resem- bles a helix like structure resolving around a central, T domain lined chan- nel [13]. The cumulative knowledge obtained handling the ACVS was further employed in developing an hpgCV synthetase with another strategy, focus- ing on the implementation of a bi-modular NRPS system, which works on the basis of NRPS communicating domains, or COM domains [14], rather than intrinsically linked hybrid NRPS (Chapter V). In contrast to the in vitro efforts discussed in chapter III, we decided to introduce the entire new pathway in a model production host Penicillium chrysogenum (DS68530) devoid of its en- dogenous penicillin biosynthetic pathway. The developed parts, based on the pre-selection in chapter II, were integrated in the genome and the resulting strains showed indeed the production of a compound of the predicted mass Summary and outlook Summary and retention time, as verified with a chemically synthesized standard. Un- fortunately, further characterization of the observed compound revealed that the predicted chemical structure was not formed. The projected peptide bond between the hydroxyphenylglycine and the cysteine residue was in a 7 thioester rather than the desired ester. Nevertheless, we have shown that the implemented COM domains do have the potential to create functional NRPS pairs in an in vivo setup and the suitability of P. chrysogenum as a host for the production and engineering of novel compounds and their pathways. To further evaluate the native secondary metabolite production capacities of P.chrysogenum, we furthermore introduced the previously characterized MLP,

183 associated with the modules and domains in chapter II. The results of these experiments, shown in chapter VI display a clear pattern, in which elevated secondary metabolite levels in P. chrysogenum strains with a single penicillin biosynthetic cluster (DS47274) and strongly improved kinetics, leading to earlier penicillin production in strains bearing multiple penicillin biosynthetic clusters (DS17690). Even though this effect seems to be specific for NRPS systems, it represents the first report of a prokaryotic MLP improving fungal NRPS peptide formation. This thesis has shown, that there is an enormous potential for the engineering and production of novel (NRPS derived) sec- ondary metabolites, but also shows that many of the critical requirements for NRPS substrate specificity and modification remain to be resolved. The tools described in this thesis, for engineering of a novel NRPS, are broadly applicable. The described adenylation assay fulfils a very distinct function is very sensitive and specific, while, the genomic MLP integration has the po- tential to improve the yield and kinetics of industrial strains, enhance the production of low abundant metabolites and may even be implemented to overexpress silent or cryptic NRPS clusters. Furthermore, with respect to NRPS engineering, it has become clear, that the exchange of entire domains or modules is very complicated and alternative more directed approaches, targeting sub-domains or distinct sites should be considered for future engi- neering attempts [8]. Presumably, the most straight forward path to suc- cessfully engineer NRPS, requires structural information, including potential interfaces, dynamics and positioning of domains and sub-structures within the multi-modular enzyme. Thus future methods should take potentially specific domain interactions into account. This will require the 3D-solution of multi-modular NRPS structures, preferably in various metabolic intermedi- ate states. This will remain a significant task for future research, due to the large size and high flexibility of NRPS enzymes. The strategies described in this thesis to obtain a semi-refined structure of the ACVS will undoubtedly aid in solving its complete structure, as well as the structure of other multi-modular NRPS. This will eventually make NRPS engineering easier and more accessible, leading to the development of an entirely new generation of Summary and outlook Summary novel enzymes, pathways and bioactive compounds to tackle the challenges of more sustainable manufacturing, new antibiotics and new bioactive com- pounds for a wide range of applications.

7 184 References

1. Wang H, Fewer DP, Holm L, Rouhiainen L, 8. Winn M, Fyans JK, Zhuo Y, Micklefield J. Re- Sivonen K. Atlas of nonribosomal peptide cent advances in engineering nonribosomal and polyketide biosynthetic pathways re- peptide assembly lines. Nat Prod Rep. The veals common occurrence of nonmodular Royal Society of Chemistry; 2016;33: enzymes. Proc Natl Acad Sci. 2014;111: 317–347. doi:10.1039/C5NP00099H 9259–9264. doi:10.1073/pnas.1401734111 9. Bode HB, Brachmann AO, Jadhav KB, Sey- 2. Marahiel MA, Stachelhaus T, Mootz HD. farth L, Dauth C, Fuchs SW, et al. Structure Modular Peptide Synthetases Involved in elucidation and activity of kolossin A, the Nonribosomal Peptide Synthesis. Chem D-/L-pentadecapeptide product of a giant Rev. American Chemical Society; 1997;97: nonribosomal peptide synthetase. An- 2651–2674. http://www.ncbi.nlm.nih.gov/ gew Chem Int Ed Engl. Germany; 2015;54: pubmed/11851476 10352–10355. doi:10.1002/anie.201502835 3. Caboche S, Pupin M, Leclère V, Fontaine 10. Hoertz AJ, Hamburger JB, Gooden DM, A, Jacques P, Kucherov G. NORINE: a da- Bednar MM, McCafferty DG. Studies on tabase of nonribosomal peptides. Nu- the biosynthesis of the lipodepsipep- cleic Acids Res. 2008;36: D326-31. tide antibiotic ramoplanin A2. Bioor- doi:10.1093/nar/gkm792 ganic Med Chem. Elsevier Ltd; 2012;20: 4. Weber T, Marahiel MA. Exploring the domain 859–865. doi:10.1016/j.bmc.2011.11.062 structure of modular nonribosomal pep- 11. Yin X, Zabriskie TM. The enduracidin bio- tide synthetases. Structure. 2001;9: R3–R9. synthetic gene cluster from Streptomy- doi:10.1016/S0969-2126(00)00560-8 ces fungicidicus. Microbiology. 2006;152: 5. Reger AS, Wu R, Dunaway-Mariano D, Gu- 2969–2983. doi:10.1099/mic.0.29043-0 lick AM. Structural characterization of a 12. Engler C, Gruetzner R, Kandzia R, Maril- 140 degrees domain movement in the two- lonnet S. Golden gate shuffling: A one-pot step reaction catalyzed by 4-chloroben- DNA shuffling method based on type ils zoate:CoA ligase. Biochemistry. 2008;47: restriction enzymes. PLoS One. 2009;4: 8016–25. doi:10.1021/bi800696y e5553. doi:10.1371/journal.pone.0005553 6. Beld J, Sonnenschein EC, Vickery CR, 13. Marahiel MA. A structural model for Summary and outlook Summary Noel JP, Burkart MD. The phosphopan- multimodular NRPS assembly lines. Nat tetheinyl transferases: Catalysis of a Prod Rep. England; 2016;33: 136–140. posttranslational modification crucial doi:10.1039/c5np00082c for life. Nat Prod Rep. 2014;31: 61–108. 14. Hahn M, Stachelhaus T. Harnessing the 7 doi:10.1039/c3np70054b potential of communication-mediating 7. Sundaram S, Hertweck C. On-line en- domains for the biocombinatorial syn- zymatic tailoring of polyketides and thesis of nonribosomal peptides. Proc peptides in thiotemplate systems. Curr Natl Acad Sci U S A. National Academy Opin Chem Biol. England; 2016;31: 82– of Sciences; 2006;103: 275–80. 94. doi:10.1016/j.cbpa.2016.01.012 doi:10.1073/pnas.0508409103

185

Deutsche Zusammenfassung

Mikrobielle bioaktive Stoffe, sowie Antibiotika oder Krebsmittel, besitzen ei- nen signifikanten Wert innerhalb der weltweiten Lebenmittel- und Gesund- heitsindustrie, wodurch ein fortlaufender Bedarf and nachhaltigeren und effizienteren Herstellungsmethoden dieser Produkte besteht. Zusatzlich spielen diese mikrobiellen, bioaktiven Stoffe eine tragende Rolle in der Wei- terentwicklung einer Vielzahl wissenschaftlicher Bereiche, wie der Medizin oder Biologie. Vor allem im Kontext der globalen Antibiotikaresistenzkrise hat sich die Entdeckung, Entwicklung und das Verändern von bioaktiven Stoffen mit antimikrobiellen Eigenschaften, als alternativloser Ansatz zur Herstellung neür Produkte herauskristallisiert. Bioaktive Stoffe entspringen mehrheitlich dem Sekundärmetabolismus von Mikroorganismen und Pflan- zen. Normalerweise sind die essenziellen Gene fur die Produktion eines Stof- fes in einem biosynthethischen Gen Cluster (BGC) organisiert, welches für die Produktion eines oder mehreren verwandten Stoffen verantwortlich ist. Jedes BGC besteht aus einer Reihe von Genen, welche Proteine für die Syn- these, Veränderung und/oder Aktivierung eines zu produzierenden Stoffes kodieren. Das Herzstück eines jeden BGC ist entweder eine nonribosomale Peptidsynthetase (NRPS), eine Polyketidsynthetase (PKS) oder eine Hybrid NRPS-PKS, alles grosse, komplexe und modulare Enzyme [1]. Diese multimodularen NRPS Enzyme funktionieren auf Basis eines Thi- ol-Gruppen Kopplungs-Mechanismus in welchem schrittweise, einzelne Bausteine in eine längere Kette zusammengefurgt werden und somit das Rückgrat des gewünschten Stoffes ensteht [2]. Die Sequenz der Bausteine innerhalb der Kette ist abhängig vom Setup der Module und Domänen des NRPS und dessen Funktionslogik. Allgemein ist jedes Modul für die Selektion, Aktivierung und die Verkettung eines Bausteines verantwortlich, welcher im Falle der NRPS entweder eine proteinogene oder nicht-proteinogene Ami- nosäure ist [3]. Der Produktionsprozess des Stoffes innerhalb des NRPS ist durch die Modulspezifität vom N- zum C- Terminus des Enzymes bestimmt und folgt entweder einem linearen, nicht-linearen oder iterativen Mecha- Deutsche Zusammenfassung nismus. Unabhängig des Mechanismus weist jedes aktive NRPS Modul die- selbe minimale Domänenkomposition auf, bestehend aus einer Adenylie- rungs- (A), Thiolisations- (T/PCP) und im Falle eines multimodularen NRPS, einer Kondnsationsdomäne (C) auf [4]. Die A-Domäne besitzt eine Doppel- 7 funktion bestehend aus zwei separaten Reaktionen, der Adenylierung und Thiolierung, welche schrittweise ablaufen und mit massiven strukturellen Veränderungen der A Domäne einhergehen [5]. Zu Beginn wird eine Amino- säure annhand der A-Domänenspezifität ausgewählt und mit Hilfe von ATP ein Adenyl-Aminosäuren Konjugat hergestellt. Das Konjugat wird danach an eine Thiol-Gruppe gebunden, welche sich am Ende eines Phosphpantheteine

189 Arms (ppant) befindet, welcher wiederum an einem konservierten serin in der T-Domäne verankert ist. Die Befestigung des ppant Armes geschieht posttranslational durch eine PPTase [6] und repräsentiert einen essenziel- len Schritt zur Aktivierung eines NRPS. Nachdem die Aminosäure und den ppant Arm angebracht ist, wird dieser durch die Rückwärtsbewegung der A-­Domäne in Richtung des aktiven Zentrums der C-Domäne gedrückt wo falsch angebrachte Aminosäuren entfernt werden und Korrekte der wach- senden Bausteinkette zugefügt werden. Diese Schritte werden so lange wie- derholt bis das wachsende Produkt das letzte, beziehungsweise terminale Modul des NRPSs erreicht. Dort befindet sich ein weiterer Kontrollmechanis- mus, ausgeführt durch eine thioesterasedomäne (Te), welche korrekte Pro- dukte schlussendlich katalytisch vom NRPS entfernt. Das mittlerweile entfernte Produkt ist jedoch zu diesem Zeitpunkt noch nicht zwangsläufig fertig beziehungsweise aktiv. Zusätzlich zu den Produkt- modifikationen, welche während des NRPS gebundenen Prozesses durch NRPS Modifikationsdomänen oder externen Faktoren enstehen [7], kann ein schon vom NRPS entferntes Produkt in trans weiter modifiziert wer- den [8]. Die Komplexität des daraus resultierenden Produktes ist beeindru- ckend, da alleine der Rückgrat eines Produktes aus 15 Aminosärün bestehen kann [9]. Durch den zusätzlichen Austausch funktioneller Gruppen, der Epi- merisierung, Zyklisierung und weiterer Modifikationen ensteht jedoch eine noch grössere und diversere Auswahl an Produkten. Zusätzlich fügen multi-­ NRPS-Systeme [10;11] und die Multimerisierung eines Produktes eine wei- tere Komplexitätsebene hinzu, welche sich in den vielfältigen Strukturen und biologischen Eigenschaften der Produkte widerspiegelt. Verschiedene Besispiele von Sekundärmetaboliten und genaüre Erläuterungen zu Produk- tions- und Modifikationsmechanismen finden sich inKapitel I. Des Weiteren wird die allgemeine und domänenspezifische Biochemie weiter besprochen, welche die ungemein komplexen Abläufe weiter unterstreicht. Im Kontext der Enzymveränderung, werden ausserdem die Schwierigkeiten und limi- tierenden Faktoren dieser grossen und komplexen Enzyme erläutert, sowie Strategien und verfügbare Werkzeuge inklusive deren Vor- und Nachteile Deutsche Zusammenfassung diskutiert. Einige der besprochenen Strategien und Werkzeuge wurden da- nach zur Entwicklung eines neuartigen NRPS-Hybrid Enzymes ausgewählt. Dieses System sollte ein hpgCV Tripeptid synthetisieren, welches schluss- endlich in amoxicillin umgewandelt wird, wie besprochen in den Kapiteln II und III. Dazu wurden hpg aktivierende NRPS Module aus genomischen und metagenomischen Qüllen selektiert (Kapitel II). Parallel dazu wurde ein Pro- zess entwickelt der es erlaubt Modul- und Domänenspezifitäten in Medium- bis Hochfrequenz zu analysieren. Zusätzlich wurde ein nicht-radioaktiver

7 190 Adenlyationsaktivitätstest etabliert und aufgezeigt, dass das fallspezifisch essenzielle MLP protein, welches mit der adenylations Domeane interagiert, unverzichtbar ist um die Aktivität von einigen der getesteten Domänen zu gewährleisten. Da dies eine neuartige Beobachtung darstellt, kann von ei- nem Präzedenzfall ausgegangen werden. Die durch diesen Prozess identi- fizierten Module wurden anschliessend in einem System integriert, welches die Kreation von hpgCV Hybrid NRPS zur Folge hatte (Kapitel III). Mit Hilfe der zwei C-terminalen Module der ACV-Synthetase (ACVS), welche cystein und valin Speizifitäten aufweisen, wurde ein System entwickelt, um ver- schiedene Hybrid-NRPS Varianten zu kreieren [12]. Das System erwies sich als effizient, jedoch war die Herstellung der erlangten NRPS-Hybrid Proteine wesentlich komplizierter. Trotzdem, konnte die Mehrheit der NRPS Hybri- den zur Expression gebracht werden, wozu jedoch MLP-Proteine nötig wa- ren. Schlussendlich konnten dennoch nur Kleinstmengen an aufgereinigtem Protein isoliert werden. Die in Folge durchgefuhrten biochemischen Expe- rimente, zeigten nichtsdestotrotz keinerlei Produktion des gewünschten hpgCV Tripeptides in vitro. Jedoch wurde ein in vitro-Test entwickelt, welcher im Verbund mit einer massenspektrometrischen Analyse die Herstellung von Produkten nachweisen kann und dadurch auch in Experimenten der Kapitel IV–VI angewendet wurde. Ausführliche in vitro Teststudien wurden vor allem zur Charakterisierung des Nocardia lactamdurans ACVS durchgeführt (Kapi- tel VI). Im selben Kapitel wurden die fundamentalen biochemischen Para- meter dieser ACVS Variante festgestellt, sowie dessen Substratspezifität, durch welche auch die Herstellung neuartiger Tripeptide beobachtet werden konnte. Zusätzlich zu den biochemischen Parametern wurde auch eine zwei- stufige Reinigungs- und Stabilisierungsprozedur zur ACVS Präparation eta- bliert. Durch die Anwendung dieser Prozedur wurde hochreines ACVS Prä- parat erlangt, welches uns erlaubte elektronenmikroskopische Aufnahmen und Modelle herzustellen die zu einer noch nie dagewesenen Qualität eines multimodularen NRPSs führte. Die erhaltene Struktur scheint das kürzlich Deutsche Zusammenfassung vorgestellte Modell eines multimodularen NRPS zu stützen, welches eine helixähnliche Struktur mit einem zentralen Tunnel vorschlägt [13]. Das gesammelte Wissen, welches aus den ACVS Studien enstand, wurde darauffolgend auch angewandt um ein Zwei-Komponenten NRPS System 7 zu entwickeln, welches NRPS Kommunikationsdomänen verwendet (COM) [14] anstatt eines kovalent-integralen NRPSs (Kapitel V). Im Gegensatz zu den in vitro Versuchen aus Kapitel III, wurde dieses System und die dazuge- hörenden Faktoren in dem Produktionsorganismus Penicillium chrysogenum (DS68530) getestet, welcher kein Penizillin BGC enthält. Die entwickelten Zwei-Komponenten NRPS Systeme, basierend auf der Selektion in Kapitel II,

191 wurden in das Genom integriert und die daraus resultierenden Stämme zeig- ten die Produktion eines Stoffes, welcher in Masse und Retentionszeit dem chemisch hergestellten hpgCV Produktestandard entsprach. Leider ergab eine Folgeanalyse, dass das Produkt in seiner Struktur inkorrekt war. Das hergestellte Produkt wies eine Thioester- anstatt der Estervebindung zwi- schen hpg und cystein auf. Nichtsdestotrotz, wurde aufgezeigt, dass die ausgewählten COM Domänen das Potential haben, erfolgreiche NRPS zwei Komponenten Systeme zu ermöglichen, welche mit Hilfe des P.chrysogenum (DS68530) Stammes weiter verfeinert werden könnten. Um das weitere Po- tenzial von P.chrysogenum als Produktionsplattform für Sekundärmetaboli- ten zu eruiren, haben wir die in Kapitel II evaluierten MLP-Proteine geno- misch in P.chrysogenum Stämme integriert. Die Resultate dieser Experimente werden in Kapitel VI geschildert und weisen ein klares Muster auf. P.chryso- genum Stamme DS47274 mit einem Penizillin BGC weisen höhere Penizillin Produktionsraten auf und P.chrysogenum Stämme DS17690 mit acht Penizil- lin BGC zeigen frühere Produktionsspitzen, nachdem jeweils ein MLP in den respektiven Stämm integriert wurde. Obwohl sich dieser Effekt höchstwahr- scheinlich auf NRPS beschränkt, stellt es den ersten Fall dar, in welchem ein prokaryotisches MLP den sekundären Metabolismus eines Pilzes begünstigt. In dieser Arbeit wurde das Potenzial für das re-Design von NRPS aufge- zeigt, zur Produktion neuer, potentiell interessanter Stoffe und die damit einhergehenden Schwierigkeiten in Bezug auf NRPS Strategien für die Eruie- rung und Bestimmung derer Spezifität. Die hier vorgestellten Werkzeuge zur Veränderung von NRPS sind allgemein für ein grosses Spektrum von Appli- kationen geeignet, nicht nur hinsichtlich NRPS Anwendungen sondern auch zur Veränderung von verwandten Enzymen. Nichtsdestotrotz variiert die An- wendungsbreite massiv, zum Beispiel ist der Adnylationstest dazu ausgelegt ausnahmslos Reaktionen zu messen, welche Pyrophosphat generieren. Die genomische Integration eines MLP Helferproteines kann jedoch nicht nur die Produktionsraten von aktiven Stoffen erhöhen sondern zusätzlich deren Pro- duktionskinetik verbessern und unter Umständen sogar dafür sorgen, dass kryptische Gene aktiviert werden, welche potenziell neue Stoffe produzie- Deutsche Zusammenfassung ren könnten. Des Weiteren hat sich hinsichtlich des NRPS-Designs bestätigt, dass sich der Austausch ganzer Domänen und Module als äußerst schwie- rig herausstellt, was ebenfalls in einer Vielzahl anderer Studien beobachtet wurde. Daher sollte künftig von einem solchen Ansatz abgesehen werden und eine verstärkt punktülle Strategie bevorzugt werden, die sich auf den Aus- tausch von integralen Substrukturen beschränkt. Der Ansatz sollte weiter- hin Subdomänen oder spezifische Interaktionspunkte und zusätzlich die vor- und nachgelagerten Domänen und deren Dynamik in Betracht ziehen. Dafür

7 192 ist es essentiel, dass 3-D NRPS Strukturen eruiert werden, die die Enzyme in einem multi-modularen Kontext darstellen und vorzugsweise in diversen metabolisch-intermediären Zustanden zur Verfügung stehen. Jedoch ist und bleibt dies vorerst eine enorme Herausforderung, auf Grund der massiven Grösse und Flexibilität der NRPS Komplexe. Nichtsdestotrotz können die in dieser Arbeit aufgezeigten Methoden zur Bestimmung von NRPS Strukturen, ohne Zweifel den weiteren Forschungsprozess signifikant vereinfachen und beschleunigen. Dies sollte schlussendlich dazu führen, dass die Entdeckung, Veränderung und die Anwendung neür NRPS vereinfacht wird, was zur Ent- wicklung einer neün Generation von Enzymen, genetischer Netzwerke und neür bioaktiver Produkte führen sollte welche nicht nur auf die Behandlung diverser Erkrankungen abziehlt sondern auch zur nachhaltigen Produktion von bereites etablierten Produkten beitragen könnte.

Quellen

1. Wang H, Fewer DP, Holm L, Rouhiainen L, 5. Reger AS, Wu R, Dunaway-Mariano D, Sivonen K. Atlas of nonribosomal peptide Gulick AM. Structural characterization and polyketide biosynthetic pathways of a 140 degrees domain movement reveals common occurrence of nonmod- in the two-step reaction catalyzed by ular enzymes. Proceedings of the Na- 4-chlorobenzoate:CoA ligase. Biochem- tional Academy of Sciences of the United istry. 2008;47: 8016–25. States of America. 2014. pp. 9259–9264. doi:10.1021/bi800696y doi:10.1073/pnas.1401734111 6. Beld J, Sonnenschein EC, Vickery CR, 2. Marahiel MA, Stachelhaus T, Mootz HD. Noel JP, Burkart MD. The phosphopan- Modular peptide synthetases involved in tetheinyl transferases: catalysis of a nonribosomal peptide synthesis. Chem posttranslational modification crucial Rev. American Chemical Society; 1997;97: for life. Nat Prod Rep. 2014;31: 61–108. 2651–2674. doi:10.1021/cr960029e doi:10.1039/c3np70054b Deutsche Zusammenfassung 3. Caboche S, Pupin M, Leclère V, Fon- 7. Sundaram S, Hertweck C. On-line enzy- taine A, Jacques P, Kucherov G. NORINE: matic tailoring of polyketides and pep- a database of nonribosomal peptides. tides in thiotemplate systems. Curr Opin Nucleic Acids Res. 2008;36: D326-31. Chem Biol. England; 2016;31: 82–94. 7 doi:10.1093/nar/gkm792 doi:10.1016/j.cbpa.2016.01.012 4. Weber T, Marahiel MA. Exploring the 8. Winn M, Fyans JK, Zhuo Y, Micklefield J. Re- Domain Structure of Modular Nonribo- cent advances in engineering nonribosomal somal Peptide Synthetases. Structure. peptide assembly lines. Nat Prod Rep. The 2001;9: R3–R9. Royal Society of Chemistry; 2016;33: doi:10.1016/S0969-2126(00)00560-8 317–347. doi:10.1039/C5NP00099H

193 9. Bode HB, Brachmann AO, Jadhav KB, Seyfarth L, Dauth C, Fuchs SW, et al. Structure elucidation and activity of kolossin A, the D-/L-pentadecapeptide product of a giant nonribosomal pep- tide synthetase. Angew Chem Int Ed Engl. Germany; 2015;54: 10352–10355. doi:10.1002/anie.201502835 10. Yin X, Zabriskie TM. The enduracidin biosynthetic gene cluster from Strep- tomyces fungicidicus. Microbiology. 2006;152: 2969–2983. doi:10.1099/mic.0.29043-0 11. Hoertz AJ, Hamburger JB, Gooden DM, Bednar MM, McCafferty DG. Studies on the biosynthesis of the lipodepsipeptide antibiotic ramoplanin A2. Bioorganic Med Chem. Elsevier Ltd; 2012;20: 859–865. doi:10.1016/j.bmc.2011.11.062 12. Engler C, Gruetzner R, Kandzia R, Maril- lonnet S. Golden gate shuffling: A one- pot DNA shuffling method based on type ils restriction enzymes. PLoS One. 2009;4: e5553. doi:10.1371/journal.pone.0005553 13. Marahiel MA. A structural model for multimodular NRPS assembly lines. Nat Prod Rep. England; 2016;33: 136–140. doi:10.1039/c5np00082c 14. Hahn M, Stachelhaus T. Harnessing the potential of communication-mediating domains for the biocombinatorial syn- Deutsche Zusammenfassung thesis of nonribosomal peptides. Proc Natl Acad Sci U S A. National Acad- emy of Sciences; 2006;103: 275–80. doi:10.1073/pnas.0508409103

7 194

Nederlandse samenvatting

Microbiële biologisch actieve stoffen, zoals antibiotica en anti-kanker medicij- nen, hebben significante economische waarde in de wereldwijde voedsel- en gezondheidsindustrie. Vandaar dat er een constante vraag is naar efficiëntere en duurzamere productiemethoden voor deze moleculen. Daarnaast spelen deze verbindingen een belangrijke rol in het wetenschappelijk onderzoek bin- nen de geneeskunde en biologie. Daar de microbiële resistentie tegen antibio- tica wereldwijd snel toeneemt en verspreidt, is de ontdekking en ontwikkeling van biologisch actieve stoffen met antimicrobiële werking van uitermate groot belang. Deze biologisch actieve componenten zijn hoofzakelijk secundaire me- tabolieten die afkomstig zijn van micro-organismen en planten. Over het alge- meen liggen de genen die essentieel zijn voor de productie van een secundair metaboliet gekoppeld in een gezamenlijk bio-synthetisch­ genen cluster (BCG). Zo’n BCG representeert de productie van één, of een familie van sterk gerela- teerde, stof(fen). Elk cluster bestaat uit een set genen die coderen voor eiwit- ten betrokken bij de synthese, modificatie en/of activatie van de uiteindelijk geproduceerde stof. Het centrale element van een BGC is ofwel een niet-ribo- somaal peptide synthetase (NRPS) een Polyketide synthetase (PKS) of een NRPS-PKS hybride. Deze grote, modulaire en complexe multi-domein­ enzy- men [1] functioneren via een thio-template mechanisme. Bij NRPS enzymen fungeren losse aminozuren als bouwstenen die stapsgewijs worden inge- bouwd in een groeiende keten, wat uiteindelijk resulteert in de vorming van de backbone van een specifiek secondaire metaboliet [2]. De volgorde waarin het substraat wordt ingebouwd is sterk gekoppeld aan de organisatie van de ver- schillende modules en de sub-modules (domeinen) van de NRPS. Over het al- gemeen is elke module verantwoordelijk voor de selectie, activatie en inbouw van een specifieke bouwsteen. In het geval van NRPS enzymen kunnen deze bouwstenen proteinogenic of non-proteinogenic aminozuren zijn [3]. Het pro- duct kan tot stand worden gebracht via lineaire (gereguleerd door de NRPS modules van de N- naar de C- terminus), niet-lineaire of iteratieve synthese. Echter, de samenstelling van elke functionele module moet minimaal bestaan Nederlandse samenvatting Nederlandse uit een Adenylatie domein (A), een Thiolatie domein (T/PCP) en, in het geval van een multi-modulaire organisatie, een Condensatie domein (C) [4]. De wer- king van het A domein kan verder worden onderverdeeld in een adenylatie en een thio-esterficatie reactie [5]. Voor laatst genoemde reactie is een structu- 7 rele reorganisatie binnen het eiwit nodig. Als eerste wordt er, afhankelijk van de specificiteit van het A domein, een specifiek aminozuur geselecteerd en, via de hydrolyse van ATP, gebonden aan het enzym. Vervolgens wordt de ge- vormde adenyl-aminozuur intermediair overgedragen aan het T-domein, waar het covalent wordt gebonden aan een geconserveerde Serine residu via de eerdergenoemde thio-esterificatie. Deze serine maakt deel uit van de

199 phosphopantetheine (Ppant) arm, een apart onderdeel van het enzym, dat pas na de synthese van het NRPS via een post-translationele phosphopantetet- heine transferase (PPTase) afhankelijke modificatie wordt verbonden met het eiwit [6]. Na de koppeling van het aminozuur aan de Ppant arm zorgt een dy- namische beweging van het A domein voor een draaiing van de adenylatie po- sitie, waarbij de Ppant arm wordt weggeduwd naar de actieve bindingsplaats van het C-domein. In dit domein vindt vervolgens een controle plaats, waarna het aminozuur aan het acceptor substraat wordt gekoppeld middels een pep- tide-binding. Afhankelijk van het aantal modules, worden deze stappen her- haald, totdat het volledige polypeptide is gevormd. Vervolgens wordt de keten overgedragen aan het Thio-esterase (Te) domein. Hier vindt nog een laatste controle plaats, waarna het gevormde peptide katalytisch kan worden ontkop- peld van het NRPS. Na deze ontkoppeling is het gevormde product in sommige gevallen nog niet in zijn uiteindelijke (bio)actieve toestand [7]. Vandaar dat ook na de ontkoppeling het product nog verder in trans gemodificeerd kan worden [8]. Afhankelijk van het NRPS, kunnen deze extra modificaties echter ook al tijdens of voorafgaand aan de aminozuurkoppeling plaatsvinden. Uiteindelijk kan dit resulteren in een peptide skelet bestaande uit wel 15 aminozuren [9], waarbij er verschillende zijketens, modificaties, epimerizaties, vertakkingen en (macro-) cyclisaties kunnen worden toegevoegd aan de gesynthetiseerde ver- binding. Vandaar dat de complexiteit van het uiteindelijke product aanzienlijk kan zijn. Desalniettemin voegen vele NRPS systemen nog eens een extra laag van complexiteit toe door de vorming van multi-meren van het geproduceerde product [10;11]. Dit alles tezamen resulteert in een enorme diversiteit in de structuur en functie van de secundaire metabolieten. In hoofdstuk I staat een uitgebreide uitleg van secundaire metaboliet productie beschreven aan de hand van specifieke voorbeelden. Daarnaast wordt zowel de algehele NRPS biochemie, als de specifieke domein biochemie uitvoerig besproken. In het ka- der van engineering worden ook de moeilijkheden en knelpunten die men te- genkomt tijdens het werken met deze grote en complexe enzymen benoemd, waarbij ook de voor- en nadelen van bepaalde strategieën en tools uitvoerig worden besproken. Enkele van deze tools zijn vervolgens geselecteerd en toe- Nederlandse samenvatting Nederlandse gepast voor het ontwerpen en ontwikkelen van een nieuw hybride-NRPS pro- ject, met als doel de synthese van de HpgCV tri-peptide: een potentiele pre- cursor van D-Amoxicillin­ (hoofdstuk II en III). Doormiddel van een nieuw ontwikkelde semi-high throughput methode zijn meerdere hydroxyphenylgly- cine (Hpg) activerende NRPS modules, afkomstig van verscheidene genomi- sche en meta-­genomische bronnen, geselecteerd aan de hand van de voor- spelde werking van deze specifieke modules en domeinen hoofdstuk( II). Daarnaast zijn we er ook in geslaagd een radiolabel vrije adenylatie activiteit

7 200 test te implementeren op basis van vorming van pyrofosfaat uit de hydrolyse van ATP tot AMP. Bovendien, hebben we aangetoond dat het in sommige ge- vallen een MbtH-achtig eiwit (MLP) dat fungeert als een adenylatie domein geassocieerd chaperonne eiwit, noodzakelijk is voor de activatie van de gese- lecteerde Hpg activerende domeinen. Aangezien dit nog niet eerder benoemd is in voorgaande studies, is dit een geheel nieuwe waarneming aangaande Hpg activerende domeinen en modules. Vervolgens zijn de geselecteerde modules en domeinen ingebouwd in een systeem met het uiteindelijke doel een serie van hybride NRPS enzymen als HpgCV-synthetases te creëren (hoofdstuk III). Door gebruik te maken van de twee C-terminale modules van de ACVS, speci- fiek voor cysteïne en valine, hebben we een systeem ontwikkeld voor de pro- ductie van verschillende hybride NRPS [12]. Ondanks dat het systeem de ge- wenste hybride NRPS efficiënt kon fabriceren, was de daaropvolgende over-expressie erg moeilijk. Hoewel de meeste hybride NRPS wel gedeeltelijk tot expressie konden worden gebracht, resulteerde dit in lage hoeveelheden gezuiverd eiwit, waarbij de co-expressie met MLP noodzakelijk was. Helaas resulteerde een verdere biochemische evaluatie niet in de in vitro productie van de gewenste HpgCV tri-peptide. Desondanks zijn zowel een methode voor de in vitro synthese van product, als een geschikte massaspectrometrie ana- lyse methode voor het gesynthetiseerde product ontwikkeld (hoofdstuk IV-VI). Uitgebreide in vitro experimenten zijn toegepast voor de karakterisatie van de Nocardia lactamdurans ACVS (hoofdstuk VI). In dit hoofdstuk hebben we de kinetische parameters en de substraat specificiteit vastgesteld van dit ACVS, wat uiteindelijk heeft geleid tot de ontdekking van nieuwe tri-peptiden. Tevens hebben we ook een twee-staps zuivering en stabilisatie methode ontwikkeld voor het ACVS enzyme. Het toepassen van deze methode heeft geresulteerd in extreem zuiver ACVS geschikt voor elektron microscopie, wat uiteindelijk heeft geleid tot een zeer gedetailleerde structuur van een multi-modulaire NRPS. Deze ruwe structuur komt overeen met een recentelijk gepubliceerd mul- ti-modulair NRPS model, waarbij er wordt gesuggereerd dat de domeinen in Nederlandse samenvatting Nederlandse een helix-achtige structuur gerangschikt liggen rond een centraal liggend ka- naal bestaande uit T-domeinen [13]. De vergaarde kennis aangaande het ge- bruik van ACVS is verder toegepast voor het ontwikkelen van een HpgCV syn- thetase via een andere strategie, die meer toegespitst is op de implementatie 7 van een bi-modulair NRPS systeem. Dit systeem werkt via het principe van NRPS communicatie (COM)- domeinen [14] en niet via intrinsiek gekoppelde hybride NRPS (hoofdstuk V). In tegenstelling tot de in hoofdstuk III eerder ondernomen in vitro pogingen, hebben we nu besloten de ontwikkelde route volledig te reconstitueren in de productie stam Penicillium chrysogenum (DS68530), die geen eigen bio-synthetische penicilline productie route meer

201 heeft. Aldus, zijn de eerder geselecteerde en ontwikkelde onderdelen, be- noemd in hoofdstuk II, geïntegreerd in het genoom. Dit resulteerde in de syn- these van een nieuw product, dat na massaspectrometrie analyse, exact de- zelfde massa en retentie tijd had als de standaard. Uit verdere karakterisatie bleek dat de chemische structuur van het gesynthetiseerde product niet volle- dig overeenkwam met de voorspelde structuur. De ontworpen peptide binding tussen de hydroxyphenylglycine en de cysteïne was een thio-ester in plaats van een ester. Desalniettemin hebben we aangetoond dat de implementatie van COM-domeinen voor secundaire metaboliet productie heeft geleid tot een in vivo functioneel hybride NRPS enzyme in P. chrysogenum. Om de natuurlijke capaciteit van P. chrysogenum verder te onderzoeken hebben we het al eerder gekarakteriseerde MLP (betrokken bij de modules en domeinen uit hoofdstuk II) geïntroduceerd in dit organisme. De resultaten van deze experimenten, be- sproken in hoofdstuk VI, laten een duidelijk patroon zien. Waar met de intro- ductie van MLPs een verhoogde concentraties van secundaire metaboliet wor- den geproduceerd in de stammen waarin één penicilline cluster is geïntroduceerd (DS47274), leidt de aanwezigheid van meerdere bio-syntheti- sche clusters tot vervroegde penicilline productie via sterk verbeterde kinetiek. Dit effect lijkt specifiek te zijn voor NRPS systemen en is de eerste waarne- ming van verbeterde NRPS peptide formatie in schimmels na co-expressie met een prokaryotische MLP. In dit proefschrift hebben wij aan de ene kant laten zien dat er een enorm potentieel zit in de engineering en productie van nieuwe (van NRPS afkom- stige) secundaire metabolieten, maar dat er aan de andere kant vele kritieke vereisten zijn voor NRPS substraat specificiteit, waardoor vele modificaties nog onbekend zijn. De technieken beschreven in dit proefschrift die gebruikt zijn voor de engineering van een nieuwe NRPS zijn zeer breed toepasbaar. De beschreven adenylatie experimenten laat zien dat de domeinen zeer specifiek zijn en gevoelig voor modificatie waardoor al snel de activiteit verloren gaat. Daarnaast kan de genomische integratie van MLP potentieel de opbrengst en kinetiek van secundaire metaboliet vorming in industriële stammen ver- hogen, de productie van metabolieten verbeteren en zou zelfs kunnen wor- Nederlandse samenvatting Nederlandse den geïmplementeerd voor de over-expressie van slapende of versleutelde NRPS clusters. Bovendien heeft ons werk aangaande de engineering van NRPS laten zien dat de vervanging van gehele domeinen en/of modules erg gecompliceerd is. Vandaar dat in de toekomst een meer directe aanpak van de sub-domeinen of specifieke plaatsen in deze sub-domeinen tot beter re- sultaat zou kunnen leiden. Uiteraard zal de beschikbaarheid van structurele informatie van een bepaald NRPS, zoals potentiele interfaces, dynamiek en positionering van de domeinen en substructuren binnen een multi-modulair

7 202 enzym, het mogelijke succes van dergelijke benaderingen kunnen faciliteren. Oftewel, toekomstige methoden zouden de potentieel specifieke interacties tussen domeinen moeten betrekken bij de engineering. Hiervoor zal een 3D weergave van de structuur van een multi-modulair NRPS nodig zijn, bij voor- keur van meerdere metabolische intermediaire varianten. Dit zal in de toe- komst echter een uitdagende taak blijven, aangezien NRPS grote dynamische complexen zijn, waardoor het achterhalen van de structuur gecompliceerd is. Echter, de methoden beschreven in dit proefschrift, die geleid hebben tot een semi-verfijnde structuur, zullen ongetwijfeld bijdragen aan de ophelde- ring van een complete structuur van deze en andere multi-modulaire NRPS. Uiteindelijk zullen deze nieuwe inzichten de NRPS engineering versimpelen en waardoor deze op den duur toegankelijk worden voor de ontwikkeling van een geheel nieuwe generatie van nieuwe enzymen, synthese routes en bioac- tieve stoffen. Hiermee ligt de duurzame productie van nieuwe antibiotica en andere bioactieve stoffen met brede toepasbaarheid in het verschiet.

Referenties

1. Wang H, Fewer DP, Holm L, Rouhiainen L, Nonribosomal Peptide Synthetases. Sivonen K. Atlas of nonribosomal peptide Structure. 2001;9: R3–R9. and polyketide biosynthetic pathways doi:10.1016/S0969-2126(00)00560-8 reveals common occurrence of nonmod- 5. Reger AS, Wu R, Dunaway-Mariano D, ular enzymes. Proceedings of the Na- Gulick AM. Structural characterization tional Academy of Sciences of the United of a 140 degrees domain movement States of America. 2014. pp. 9259–9264. in the two-step reaction catalyzed by doi:10.1073/pnas.1401734111 4-chlorobenzoate:CoA ligase. Biochem- 2. Marahiel MA, Stachelhaus T, Mootz HD. istry. 2008;47: 8016–25. Modular peptide synthetases involved doi:10.1021/bi800696y in nonribosomal peptide synthesis. 6. Beld J, Sonnenschein EC, Vickery CR,

Chem Rev. American Chemical Society; Noel JP, Burkart MD. The phosphopan- samenvatting Nederlandse 1997;97: 2651–2674. tetheinyl transferases: catalysis of a doi:10.1021/cr960029e posttranslational modification crucial 3. Caboche S, Pupin M, Leclère V, Fontaine for life. Nat Prod Rep. 2014;31: 61–108. A, Jacques P, Kucherov G. NORINE: a da- doi:10.1039/c3np70054b 7 tabase of nonribosomal peptides. Nu- 7. Sundaram S, Hertweck C. On-line enzy- cleic Acids Res. 2008;36: D326-31. matic tailoring of polyketides and pep- doi:10.1093/nar/gkm792 tides in thiotemplate systems. Curr Opin 4. Weber T, Marahiel MA. Exploring Chem Biol. England; 2016;31: 82–94. the Domain Structure of Modular doi:10.1016/j.cbpa.2016.01.012

203 8. Winn M, Fyans JK, Zhuo Y, Micklefield the biosynthesis of the lipodepsipeptide J. Recent advances in engineering non- antibiotic ramoplanin A2. Bioorganic Med ribosomal peptide assembly lines. Nat Chem. Elsevier Ltd; 2012;20: 859–865. Prod Rep. The Royal Society of Chemis- doi:10.1016/j.bmc.2011.11.062 try; 2016;33: 317–347. 12. Engler C, Gruetzner R, Kandzia R, Maril- doi:10.1039/C5NP00099H lonnet S. Golden gate shuffling: A one- 9. Bode HB, Brachmann AO, Jadhav KB, pot DNA shuffling method based on Seyfarth L, Dauth C, Fuchs SW, et al. type ils restriction enzymes. PLoS One. Structure elucidation and activity of 2009;4: e5553. kolossin A, the D-/L-pentadecapeptide doi:10.1371/journal.pone.0005553 product of a giant nonribosomal pep- 13. Marahiel MA. A structural model for tide synthetase. Angew Chem Int Ed multimodular NRPS assembly lines. Nat Engl. Germany; 2015;54: 10352–10355. Prod Rep. England; 2016;33: 136–140. doi:10.1002/anie.201502835 doi:10.1039/c5np00082c 10. Yin X, Zabriskie TM. The enduracidin 14. Hahn M, Stachelhaus T. Harnessing the biosynthetic gene cluster from Strep- potential of communication-mediating tomyces fungicidicus. Microbiology. domains for the biocombinatorial syn- 2006;152: 2969–2983. thesis of nonribosomal peptides. Proc doi:10.1099/mic.0.29043-0 Natl Acad Sci U S A. National Academy 11. Hoertz AJ, Hamburger JB, Gooden DM, of Sciences; 2006;103: 275–80. Bednar MM, McCafferty DG. Studies on doi:10.1073/pnas.0508409103 Nederlandse samenvatting Nederlandse

7 204 Acknowledgements

It has been a lengthy, but throughoutly exciting path I came to follow over the past years. A path which has enlightened me not only scientifically, but also socially, revealing what are the most important aspects of life. I have had the chance to cross paths with many interesting people in a variety of more or less probable setups, creating unforgettable memories. Writing these lines is still surreal and it seems to be an impossible task to do justice to all your contributions to this work and my wellbeing and entertainment during this period, though I will try! Without an opportunity there is no place to start, thank you Arnold for giving me a place, in which I could grow as a Scientist and person alike. The hours of constructive talks, meetings but also travel tale telling, have tre- mendously contributed to the evolution of this work and mind. Roel, your unique way of thinking, explaining and rationalizing have al- ways given me the comfort to approach you with new ideas and your input was absolutely invaluable. Never have you failed in keeping a cool head and a positive attitude, not even at the peak of the most heated scientific dis- course, thank you for everything. Today would be also be entirely impossible without the continuous help and friendship of my two paranymfs, thank you for being there for me in times of need Sabrina and Riccardo. I met you during very different times of my journey, Sabrina, from a “you, the German girl, wanna join our BBQ?” to “Ermaghherd!” you have always been full of energy, ambition and enthusiasm, Thanks you for all the good times and always be yourself.Riccardo , I can`t imagine anyone else more suitable for continuing this line of work, your in- tellectual input and quick ability to learn and adapt along with your wit have in many ways contributed to this thesis. Amazing how you found the time to so meticulously keep your beard groomed and in shape! A big thank you also goes to all the people I had the chance to meet during my time in Delft. Even though the beginning of the work at DSM, in the middle of an ongoing project, has not been the most structured project start of my life, Ulrike and Remon have never lost confidence in me and were always there to reach out to in case I was overwhelmed by the collection of constructs, reports and ongoing experiments. We got very close to solving the AmoxiGreen mis- Acknowledgements tery, but unfortunately only very close, though I have enjoyed the constructive time with you and wish you better for your coming challenges! After this short intermezzo into a new place, getting to know company life and many new people, it was time my new group in Groningen. It is never A easy to become part of grown group, but the MolMic evergreens in person of

205 Stefan, Sasha and Jan-Pieter have definitely made me feel welcome if also leaving me in doubt about the state of my sanity from time to time. As I still have distant memories to the old biological centre in Haren, I was more than happy, that window-free offices were a thing of the past! -Ei ther way, the people you share said space with make the experience. Ciprian, Fabiola­ and Min, you were not only my Pen group members, but also my of- fice mates and made every ever so grey and rainy day more fun and pleasant. Our collaboration was not only scientifically fruitful, but also during count- less trips to the super market in preparation for the next Borrel. I want to furthermore thank Carsten and Susan for the countless hours of time spent on experiments putting this story together! It has been a pleasure to collaborate with you on every level. Also my greatest appreciation to the entire Pen group, Zsofie, Fernando, Annarita, Laszlo and Ahsen, you have been great colleagues and even more wonderful friends. Also a big thank you to Marten, thanks for all the shared time drinking coffee and worrying about the MS… hope it will be going smoothly, at least till the end of your research J and many thanks for helping me with the Dutch translations in this book! I would like to also thank my lab mates over the years, Jeroen, you were not only the all-knowing oracle of the lab, but also a first class Dutch tutor, Dank je wel! Also many thanks to Aleksandra and Niels, even though you have only shared a space with me briefly, it was great having you around. The years in this lab have also given me the chance to work with and supervise a considerate number of students, Sven, Fleur, Simone, Arjen, Catalin, Kate, Sebas (and all the ones on shorter projects), thank you all for contributing to this work one way or another, you have taught me, that teaching is not only essential, but also fun! I would also like to express my gratitude to Greetje and Janny, this group would not work in such a fine manner without you two! Also a big thank you to Bea and Anmara, you have always been incredibly helpful and patient to even the silliest questions. Overall a huge thank you to all the parts of the MolMic group, it has been an honour being part of this construct with all its facets for these years, I would not be the person and scientist I am now without this outstanding environment! Moving on to where my roots lay, thank you my Swissies for making my homecomings always freakin awesome! Jasi, thank you for all the time, dis- cussions and bullshit we exchanged and of course to always give us a refuge

Acknowledgements for our stays in Basel J. Ivo, you have always been there for me and you were never shy of catching a drink... I fondly remember (more or less) our trips to Croatia, Ghent, Hamburg and so many more, never change man! Nikki, we are going way back and I am more than happy, that we still have the chance

A 206 to talk and see each other regularly. Thank you for all the “Schabernack” as well as all the discussions on any topic! Adi, your humour was always at the fine line between inappropriate and awesome, thank you for making me laugh on countless occasions. Even though Groningen makes the impression of a little student town, it is a ground for so many international people of whom I had the chance to meet and get to know at least a few. Aga, we knew each other since the very early Groningen times, thank you for making my first internship much better, de- spite a window-less office…Kasia , you have always been keen and more than well prepared for my BBQs, thank you for being a part of my life over the years and all the best in London. Ina, you have been a fresh breeze and I will never forget the most fun GBB symposium ever, thank you for bringing fun and randomly awesome moments into my life! I would also like to thank Stef, Cyrus and the Grande Familia for all the great get-togethers, theme evenings and overall anything to make the most depressive Mid-winter days fun! Next to work, BBQs, travels and more work, I found my balance in doing sports, foremost Floorball, which gave me the unique chance to meet some fantastic players and even better human beings. Simon, I still remember the disbelieving stare across the room when we met at a random house party, the rest is history J Thank you for all the chit chat, beers we shared and all the time you have been there for me! Mike, I am happy to have met such a fine if also chaotic floorball player, thank you for all the random nights out, I would have never experienced KEI week without you! I would furthermore like to thank Joey, Klaas, Reinier, Andy and Paul, you have made every train- ing, game and especially the drives to the latter into a hilarious experience. I hope to see you and the rest of Face Off at the GFO again! A dere steu möcht I o mine Eutere Merci säge, Müetterli u Pesche, ohni euch Zwöi wär das aues unmüglech gsi! Dir heit mi mis Läbe lang ungerstützt u immer a mi gloubt! Ohni eui kontinürlechi Ungerstützig uf aune Äbene u zu jedere Zyt hät I das nie gschafft. Dir heit mer immer d Chance gäh bi euch I de Bärge mini Energie wiederufztanke u euch u d Umgäbig z gniesse, Merci viu mau für aues, I bi froh das es euch git! I möcht witer o am Roland Merci säge! Du bisch e ungloublech wichtigi Figur I mim Läbe, I ha so viu vo dir chönne lehre u o we mer üs nüm ganz so viu gseh wi o scho, möcht I dir für au dini Understützig Merci säge u I hoffe Acknowledgements das mer de widr d Chance hei üs chli öfters z gseh! Mis (Haub-) Brüetschli, Philip, mir kenne üs zwar ersch sit churzem u hei üs leider ersch es par mau gseh, aber I möcht dir glich Danke das du e Teil vo üserer Family bisch worde. I hoffe das mer trotz dim stressige autag u dr A distanz üs chli meh gseh ir zuekunft.

207 Karin, obwohl wir uns in den letzten Jahren nur sporadisch sahen, ziehst du dich wie ein roter Faden durch mein Leben, du hast immer an mich ge- glaubt und mich immer unterstützt auf meinem Weg! Danke das du für mich da warst, vorallem nach dem Ableben Marks. Du hast mir geholfen viele un- beantowrtete Fragen zu klären und meinen Vater in einem Kontext zu sehen und verstehen, in welchem ich persönlich nie die Chance hatte, Danke! Päpu, obwou mir üs nid immer ganz verstange hei u üsi differenze hei gha, hesch du mir nichtsdestotrotz viu ufe wäg gä. I möcht dir trotz auem fescht Merci säge u du wirsch i mine Erinnerige immer dr abetürluschtig u liecht unor- ganisiert Naturfründ si, I hoffe du hesch dis ewige Plätzli im Universum gfunge. These years have been more than a wild ride, not only professionally, but also personally I have experienced so, so many things. And I would do it all again, go through all the struggle, disappointment and uncomfortable situa- tions, to meet you again, my Tonia! I have never really believed that there is a person out there, which will be my counterpart, sharing the same attitude, interests, humour and appreciation of art, culture, party and travel, but now I know. You are an amazing person, caring, loyal, funny, smart and always cute ;-) Without you I would have never discovered so many corners of this world and myself… and no, there are no bears in the Ticino Alps :-D your enthusiasm and eternal love have made me the man I am today, I could have never walked this path without you on my side and I hope that this was just a little flavour of what we still have ahead of us. I love you my cutie, you are the most important and best thing which has ever happened to me and I hope that you will continue to believe in yourself and see what an outstand- ing and amazing person you really are! Acknowledgements

A 208 List of publications and patents

Haq U.I., Zwahlen R.D., Yang P., van Elsas J.D., The response of Paraburkholde- ria terrae strains to two soil fungi and the potential role of oxalate, Frontiers in Microbiology, 2018, accepted

Haq U.I., Zwahlen R.D., Yang P., van Elsas J.D., A signaling molecule or a po- tential carbon source: the conundrum of oxalic acid in the interactions of ­Burkholderia terrae strains with two soil fungi, PhD thesis: Interactions of Burkholderia terrae with soil fungi, ISBN 978-90-367-9045-1, 2016, pp.153-173

Zwier M.V., Verhulst E.C., Zwahlen R.D., van de Zande L., DNA methylation plays a crucial role during early Nasonia development, Insect Mol Bio, 21 (1), 2012, pp. 129-138.

Guzmán-Chávez, F., Zwahlen, R.D., Bovenberg, R.A.L., Driessen, A.J.M. En- gineering of the filamentous fungus Penicillium chrysogenum as cell factory for natural products, 2018, review to be submitted to Frontiers in Microbiology (abstract accepted)

Zwahlen R.D., Boer R., Müller U.M., Bovenberg R.A.L., Driessen A.J.M., Iden- tification and characterization of Nonribosomal peptide synthetase modules that activate 4-Hydroxyphenylglycine, submitted to PLOS one

Zwahlen R.D., Pohl C., Bovenberg R.A.L., Driessen A.J.M., Prokaryotic MbtH like proteins stimulate secondary metabolism in the filamentous fungus Pen- icillium chrysogenum, submitted to ACS synthetic biology

Zwahlen R.D., Crismaru C., Polli F., Bunduc C.M, Müller U.M., Boer R., Schouten O., Bovenberg R.A.L., Driessen A.J.M., An engineered two com- ponent nonribosomal peptide synthetase (NRPS) producing a novel pep- tide-like compound in Penicillium chrysogenum, manuscript in preparation List of publications and patents List of A

209