UvA-DARE (Digital Academic Repository)

Metabolomic and genomic investigation into the actinomycete genus Planomonospora

Zdouc, M.M.

Publication date 2021 Document Version Final published version

Link to publication

Citation for published version (APA): Zdouc, M. M. (2021). Metabolomic and genomic investigation into the actinomycete genus Planomonospora.

General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

Download date:09 Oct 2021 Mitja M. Zdouc Mitja Genus Planomonospora the Actinomycete into Investigation Genomic and Metabolomic

Metabolomic and Genomic Investigation into the Actinomycete Genus Planomonospora

Mitja M. Zdouc Metabolomic and Genomic Investigation into the Actinomycete Genus Planomonospora

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Universiteit van Amsterdam

op gezag van de Rector Magnificus

prof. dr. ir. K.I.J. Maex

ten overstaan van een door het College voor Promoties ingestelde commissie,

in het openbaar te verdedigen

op maandag 28 juni 2021, te 14.00 uur

door

Mitja Maximilian Zdouc

geboren te Klagenfurt Promotiecommissie

Promotor: dr. T. den Blaauwen Universiteit van Amsterdam

Copromotores: dr. M. Sosio NAISCONS prof. dr. L.W. Hamoen Universiteit van Amsterdam

Overige leden: prof. dr. G.P. van Wezel Universiteit Leiden prof. dr. L.W. Hamoen Universiteit van Amsterdam prof. dr. J. Hugenholtz Universiteit van Amsterdam dr. M.A. Fernández Ibáñez Universiteit van Amsterdam dr. J.J.J. van der Hooft Wageningen University & Research prof. dr. C.R. Berkers Universiteit Utrecht

Faculteit der Natuurwetenschappen, Wiskunde en Informatica

II

III

Copyright © 2020 Mitja Maximilian Zdouc. All rights reserved. No part of this publication may be reproduced in any form without permission from the author.

The studies described in this thesis were carried out at in the research laboratories of Naicons Srl in Viale Ortles 22/4, 20139 Milan, Italy and at the Institut für Pharmazeutische Biologie, Rheinische Friedrich-Wilhelms-Universität, Nußallee 6, 53115 Bonn, Germany. Mitja Maximilian Zdouc was financially supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No.721484 (Train2Target).

Cover Design: The cover shows an electron microscope scan of Planomonospora parontospora ATCC 23863. Image taken from the Digital Atlas of Actinomycetes (http://atlas.actino.jp/), with courtesy of the Society for Actinomycetes Japan. Image by G. Vobis, from Thiemann et al. 1967, 15, 27-38.

IV

Table of Contents

LIST OF ABBREVIATIONS VII

GENERAL INTRODUCTION 1 1.1 NATURAL PRODUCT PRODUCERS: ACTINOMYCETES 4 1.1.1 RARE ACTINOMYCETES 6 1.2 ACTINOMYCETE GENETICS 8 1.3 GENOMICS AND METABOLOMICS IN THE FIELD OF NATURAL PRODUCTS 16 1.4 SCOPE OF THE THESIS 21 1.5 REFERENCES 22

CREATION OF AN IN SILICO MS2 FRAGMENTATION LIBRARY FOR AUTOMATED DEREPLICATION 31 2.1 ABSTRACT 32 2.2 INTRODUCTION 33 2.3 RESULTS & DISCUSSION 35 2.4 CONCLUSION 40 2.5 ACKNOWLEDGEMENTS 42 2.6 EXPERIMENTAL SECTION 42 2.7 REFERENCES 43 2.8 SUPPLEMENTAL INFORMATION 45

PLANOMONOSPORA: A METABOLOMICS PERSPECTIVE ON AN UNDEREXPLORED GENUS 47 3.1 ABSTRACT 48 3.2 INTRODUCTION 49 3.3 RESULTS & DISCUSSION 51 3.4 CONCLUSION 70 3.5 AUTHOR CONTRIBUTIONS & NOTES 71 3.6 ACKNOWLEDGEMENTS 71 3.7 EXPERIMENTAL SECTION 72 3.8 REFERENCES 76 3.9 SUPPLEMENTAL INFORMATION 84

A BIARYL-LINKED TRIPEPTIDE FROM PLANOMONOSPORA REVEALS A WIDESPREAD CLASS OF MINIMAL RIPP GENE CLUSTERS 117 4.1 ABSTRACT 118 4.2 INTRODUCTION 119

V 4.3 RESULTS & DISCUSSION 120 4.4 CONCLUSION 129 4.5 AUTHOR CONTRIBUTIONS 129 4.6 ACKNOWLEDGEMENTS 130 4.7 EXPERIMENTAL SECTION 130 4.8 REFERENCES 134 4.9 SUPPLEMENTAL INFORMATION 137

GENERAL DISCUSSION 151 5.1 REFERENCES 158

SUMMARY 159 SAMENVATTING 163 ZUSAMMENFASSUNG 167 POVZETEK 171 ACKNOWLEDGEMENTS 175 LIST OF PUBLICATIONS 179

VI List of Abbreviations

ABL-ISDB - Antibiotic Literature In Dha - Dehydroalanine Silico Database DH Domain - Dehydratase Domain ACE - Angiotensin II Converting Enzyme DMSO - Dimethylsulfoxid

ACP - Acyl Carrier Protein DNA - Deoxyribonucleic Acid

A Domain - Adenylation Domain E Domain - Epimerase Domain antiSMASH - Antibiotics & ER Domain - Enoylreductase Secondary Metabolite Analysis Domain Shell ESI - Electrospray Ionization ARTS - Antibiotic Resistant Target GC Content - Guanine-Cytosine Seeker Content AT - Acyltransferase Domain GCF - Gene Cluster Family ATP - Adenosine Triphosphate GNPS - Global Natural Product BAGEL - Bayesian Analysis of Social Molecular Networking Gene Essentiality HMBC - Heteronuclear Multiple- BGC - Biosynthetic Gene Cluster Bond Correlation Spectroscopy

BiG-SCAPE - Biosynthetic Gene HPLC - High Performance Liquid Similarity Clustering and Chromatography Prospecting Engine HSQC - Heteronuclear Single C Domain - Condensation Domain Quantum Coherence

CDS - Coding DNA Sequences KEGG - Kyoto Encyclopedia of Genes and Genomes CFM - Competitive Fragmentation Modeling KS Domain - Ketosynthetase Domain CLUSEAN - Cluster Sequence Analyzer KR Domain - Keto-Reductase Domain COSY - Homonuclear Correlation Spectroscopy LC-MS - Liquid Chromatography - Mass Spectrometry Da - Dalton

VII m/z - Mass-to-Charge Ratio O Domain - Oxidase Domain

Mb - Megabases PCP - Peptide Carrier Protein

MeCN - Acetonitrile PK - Polyketide

MiBIG - Minimum Information about PKS - Polyketide Synthase a Biosynthetic Gene Cluster PRPS - Post-Ribosomal Peptide MS - Mass Spectrometry Synthesis

MS2 - Tandem Mass Fragmentation R Domain - Reduction Domain

MT Domain - Methyltransferase RBS - Ribosomal Binding Site Domain RiPP - Ribosomally Synthesized MUSCLE - Multiple Sequence and Post-Translationally Modified Comparison by Log- Expectation Peptide

NMR - Nuclear Magnetic RNA - Ribonucleic Acid Resonance rRNA - Ribosomal RNA NOESY - Nuclear Overhauser Effect Spectroscopy TE Domain - Thioesterase Domain

NP - Natural product TOCSY - Total Correlation Spectroscopy NRP - Non-Ribosomal Peptide tRNA - Transfer RNA NRPS - Non-Ribosomal Peptide- Synthetase tmRNA - Transfer-Messenger RNA UV - Ultraviolet

VIII General Introduction

1

General Introduction

1 General Introduction

any , fungi and are capable to produce specialized M metabolites (also known as natural products (NPs) or secondary metabolites). From a medicinal chemistry perspective, they are defined as low molecular weight (<3000 Da) organic molecules that usually have no primary role in the metabolism of the producing organism and do not immediately partake in processes essential for cell survival. [1-3] Many NPs can induce specific biological responses from organisms by selectively interacting with their cellular machinery. An example is nicotine, a NP most prominently occurring in the tobacco , which acts as a strong insecticide and protects plant from insect infestation. [4] Still, in many cases, the ecological function of NPs is unclear. However, their cellular energy cost and elaborate structures suggest that they benefit the producing organism in some way and grant an evolutionary advantage. [3,5] Due to their diverse and potent bioactivities, many NPs or their derivatives have found their way into pharmacopoeias. Over the last four decades alone, 66% of approved small-molecule drugs were NPs or derivatives thereof (see Figure 1). They encompass antitumor drugs, antibiotics, immunosuppressants, analgesics and lipid-lowering agents, to name a few. In the field of antibacterial drugs, 71% of approved drugs are derived from microbial sources. [6]

Fig. 1. Origin of small molecule drugs (1981-2019, n=1394). S: synthetic drug; S/NM: synthetic drug/mimic of NP; S*: synthetic drug with NP pharmacophore; S*/NM: synthetic drug with NP pharmacophore/NP mimic; N/NB: unaltered NP/botanical drug; ND: NP derivate. Adapted from Newman and Cragg, 2020. [6]

2 General Introduction

Several thousand bioactive NPs were isolated from filamentous actinomycetes, prolific producers of various classes of metabolites. [1] Examples are the antibiotics streptomycin (1) from Streptomyces griseus [7,8] and vancomycin (2) from Amycolatopsis orientalis, [9,10] the anti-cancer drug daunorubicin (3) from Streptomyces peucetius, [11] the anti-fungal amphotericin B (4) from Streptomyces nodosus, [12] or the immunosuppressant rapamycin (5) from Streptomyces hygroscopicus (see Figure 2). [13-15] Despite the impressive track record of important drugs discovered from nature, large pharmaceutical companies have largely withdrawn from natural product research in the last decades. The loss in popularity of NPs has been explained in part by the supposedly higher productivity of large, synthetic combinatorial libraries. Further, high cost of structural elucidation and severe rediscovery rate of NPs in traditional “top-down” screening approaches, such as bioassay- or chemical signature- guided isolation, have contributed to the reduction in interest. [16-18] However, microbial NP research is currently undergoing a renaissance. The sequencing of microbial genomes showed that only a small fraction of potential microbial metabolites has been identified. [19] The recent development of new tools in bioinformatics and analytical chemistry has allowed to expedite the pace of NP discovery and development. The combined and increased use of genomics and metabolomics-inspired methods have enabled the detection of promising metabolites that have remained unnoticed so far. [17,20–22] In this chapter, the biology and genetics of filamentous actinomycetes, as well as state-of-the-art approaches for the discovery of new NPs are introduced, which set the ground for the next chapters on this thesis.

3 General Introduction

Fig. 2. Examples for various natural products from actinomycetes. (1) streptomycin; (2) vancomycin; (3) daunorubicin; (4) amphotericin B; (5) rapamycin.

1.1 Natural Product Producers: Actinomycetes

Microbes of the order , commonly called actinomycetes, are Gram-positive filamentous bacteria belonging to the phylum Actinobacteria. Mostly aerobic, mesophilic, chemoheterotrophic and saprophytic, they play important roles in environmental ecology. [23] They are ubiquitous and able to colonize a variety of different environments, ranging from soil to sea and fresh water, marine sediments and the surfaces of plants and fungi. Some actinomycetes are even found as pathogens in humans, where they cause the

4 General Introduction disease actinomycosis. So far, more than 220 genera were assigned to actinomycetes, some of them only described recently. This makes them one of the most diversified group of microorganisms. [24] Among actinomycetes, members of the genus Streptomyces are best investigated, partly due to their abundance in soil, versatility and proficiency in producing NPs. [24] Streptomyces have provided much of the insights we have in actinomycetes biology, with closely investigated model organisms such as Streptomyces coelicolor A3(2). In soil, Streptomyces and most actinomycetes undergo complex life cycles with slow, filamentous growth. In their vegetative stage, actinomycetes spores germinate and form hyphae, which burrow in the substrate and form the mycelium (see Figure 3). Hyphae elongate by tip extension and branching, making cross walls at irregular intervals to form connected compartments, in which multiple copies of the chromosome are present. It has been observed that transport of nutrients or even plasmids occurs over long distances. [25,26] Under physiological stress, such as nutrient depletion, the vegetative mycelium undergoes autolysis by programmed cell death. Nutrients are re-distributed to fuel the development of aerial mycelium, which extends from the substrate mycelium (see Figure 3D). [27] The distal regions of the aerial hyphae divide into compartments by the formation of septa, each holding a single copy of the genome. [28] Eventually, the compartments become exospores, coated with a hydrophobic, mesh-like structure of proteins. Upon contact with water, exospores are dispersed and remain dormant until conditions improve. [26] Exospores can stay viable for a long time, with accounts of successful germination after 70 years of soil sample storage. [29] The number and form of exospores attached to the mycelium has long been used to distinguish actinomycete genera and influencing their designation, such as Micromonospora, Planotetraspora, Planobispora or Planomonospora. More recently, various genetic markers for their differentiation have been employed, such as 16S, rpoB and ssgB sequences. [30] Lately, the advent of next generation sequencing methods has led to the development of overall genomes relatedness indices such as the average

5 General Introduction nucleotide identity, allowing for an unprecedented, fine-grained discrimination even on a strain-level. [31]

Fig. 3. The life cycle of Streptomyces coelicolor A3(2), adapted from [23]. (A) From a spore, (B) one or two germ tubes protrude, which burrow in the substrate. (C) The substrate mycelium is formed by tip extension and branching. (D) After a few days, the aerial mycelium growths from the substrate, tipped with unicellular exospores.

1.1.1 Rare Actinomycetes

As mentioned above, over 220 other actinomycetes genera are described, many of them only recently. [32-35] Often called “rare actinomycetes”, they are defined as “non-Streptomyces” strains that are less frequently isolated than the more “common” Streptomyces ssp. [1,36] However, they have represented and still represent a rich source for metabolites with novel scaffolds. [20,36,37] Examples are the proteasome inhibitor salinosporamide A from Salinispora tropica, [38] the antibiotic NAI-107/microbisporicin from Microbispora sp. [39,40] or the analgesic NAI-112 from Actinoplanes sp. [41] Among the “rare actinomycetes” is the poorly characterized genus Planomonospora from the Streptosporangiaceae family. Only few species have been described since its first

6 General Introduction report in 1967, [42,43] despite their proven potential to produce interesting molecules. The Naicons library, a proprietary strain collection of approximately 45,000 microbes, mainly “rare actinomycetes”, [20] comprises circa 300 isolates designated as Planomonospora, based on their morphological characteristics. The abundance of members of this underexploited genus in the Naicons collection, in contrast to their poor representation in public libraries, prompted further research into this genus. According to Bergey’s Manual of Systematic Bacteriology, Planomonospora is chemoorganotrophic, aerobic, mesophilic (28–37 °C) and preferably grows on medium with pH 7.0-8.0. Morphologically, it is characterized by cylindrical sporangia, formed in bundles on the aerial mycelium, resembling rows of bananas, or in an open palm pattern (see Figure 4 and thesis cover). A single, motile spore, equipped with a flagellum, is contained in each sporangium, which is reflected by its name: in ancient greek, planos means wanderer, monos means solitary or single and spora means seed. [43] Validly described Planomonospora species and sub-species are included in Table 1.

Tab. 1. Validly described species of the genus Planomonospora. Species Year Ref. Planomonospora parontospora ssp. parontospora 1967 [42] Planomonospora parontospora ssp. antibiotica 1967 [42] Planomonospora venezuelensis 1970 [44] Planomonospora alba 1994 [45] Planomonospora sphaerica 1994 [45] Planomonospora corallina 2016 [46] Planomonospora algeriensis 2017 [47]

The following NPs were reported for Planomonospora-strains: the lantibiotic 97518, a member of a family of class I lanthipeptides and also known as planosporicin; [48,49] the thiopeptides thiostrepton [45,50] and siomycin/sporangiomycin; [51] the ureylene-containing oligopeptide antipain, [52] which is also produced by Streptomyces; [53] and the more recently described lassopeptide sphaericin. [54]

7 General Introduction

Fig 4. Electron microscope scan of Planomonospora parontospora ATCC 23863. Image taken from the Digital Atlas of Actinomycetes (http://atlas.actino.jp/), with courtesy of the Society for Actinomycetes Japan. Image by M. Hayakawa, H. Iino & H. Nonomura, from Thiemann et al. 1967, 15, 27-38.

1.2 Actinomycete Genetics

Actinomycetes are characterized by their large genomes in the range of 8-10 Mb and a high guanine-cytosine (GC) content of over 70%, compared to other Gram-positive bacteria such as Staphylococcus aureus or Bacillus subtilis, with less than 50% GC content. The number of coding DNA sequences (CDS) is proportional to the genome size and averages at over 7,000. A putative function can be assigned to 70-80% of CDS, while 20-30% are annotated as hypothetical proteins. [55] Currently, Nonomuraea sp. ATCC55076 harbors the largest genome of the actinomycetes, consisting of a 13.1 Mb linear chromosome with 72.1% GC content and approximately 11,600 predicted coding sequences. Interestingly, 10% of its genome is dedicated to secondary metabolite synthesis. [56] Generally, actinomycete genomes consist of a 5-6 Mb “core” region that

8 General Introduction contains essential genes, while variable “arms” contain non-essential ones. Some Streptomyces strains can undergo rearrangements, such as deletions or amplifications, of up to one fourth of their chromosome, usually near to its ends/arms. Sometimes, these rearrangements can result in the formation of circular chromosomes. [23] On average, genomes contain six rRNA operons, one tmRNA and more than 60 tRNA genes. 9-15% of genes are predicted to have regulatory functions, in agreement with the complex life cycle of actinomycetes, which requires tight regulation. [55] Since the release of the first full genome sequences in the early 2000s, it has become apparent that actinomycetes carry the genetic potential to produce many more NPs than can be detected under laboratory conditions. NPs are synthesized by a battery of dedicated enzymes, usually encoded in close vicinity in a region of the bacterial genome known as a biosynthetic gene cluster (BGC). Usually, more than 20 BGCs can be found in a single actinomycete genome, meaning that they can produce at least 20 different NPs. [23,55] A BGC usually contains all necessary genes for the production of a specific NP, alongside of genes responsible for auto-resistance, export and regulation. The co-location of biosynthetic genes in clusters improves translation and thus, production of the NP, which commonly grants an evolutionary advantage to the producer. [57,58] It is estimated that actinomycetes devote 0.8-3.0 Mb of their genome toward the production of NPs, which translates to approximately 10% of their coding capacity. [59] Actinomycetes are the source of numerous structurally diverse metabolites, which result from a complex biosynthetic machinery. A recent study on 1,100 Streptomyces genomes and similar studies on Salinispora and Amycolatopsis reported that the most common BGC classes were the ones responsible for the synthesis of polyketides (PKs), non-ribosomal peptides (NRPs), ribosomally synthesized and post-translationally modified peptides (RiPPs), terpenes and combinations of some of these classes, also called hybrid BGCs. For the latter, BGCs containing PK and NRPS elements have been found most commonly. [60-63] Due to their relevance, the modus operandi of polyketide

9 General Introduction synthases (PK), non-ribosomal peptide synthetases (NRPS), post-ribosomal peptide synthesis (PRPS) and terpene biosynthesis is briefly described below. Of note, less commonly distributed classes of BGCs exist in actinomycetes. [60,64] Polyketides. Similarly to fatty acid biosynthesis, polyketide synthases (PKSs) polymerize simple acyl building blocks to create the carbon backbone of the PK, that can undergo further modification. The most frequent and best- investigated are the PKS type I, assembly lines enzymes, which can be further differentiated into a modular and an iterative type. Less frequent are PKS type II and type III, as well as trans-AT PKS, which are not further discussed here. Modular PKS I are composed of separate functional units called modules, consisting of catalytic domains that recognize, condense and modify a number of acyl monomers (acyl-CoAs) and eventually release the assembled molecule. From N- to C-terminus, PKS I are organized into a loading module, a series of elongation/extension modules and a termination module (see Figure 5). The order of the biosynthetic modules and the number and type of catalytic domains within each module determine the final structure and functionality of the NP. This well- defined modularity allows for a straightforward prediction of products. [65] PKS I modules have three mandatory domains: the acyltransferase domain (AT), which selects an appropriate monomer from the cellular environment, the acyl carrier protein (ACP), which carries the monomer selected by the AT domain, and the ketosynthetase domain (KS), responsible for the formation of the C-C bond by Claisen condensation. In the loading module, a starter group such as acetyl-CoA is loaded onto the thiol-moiety of the ACP by the AT domain. In the elongation/extension modules, the nascent polyketide chain is handed over to the thiol moiety of the KS domain. Next to it, an elongation group (e.g. malonyl-CoA) is loaded onto an ACP domain by its associated AT domain. The nascent polyketide chain on the KS domain reacts with the elongation group on the ACP, undergoing the chain extension step. The module can include additional domains that perform processing reactions on the β-keto moiety of the newly attached residue, such as reduction by a keto-reductase (KR) domain that results in a hydroxyl group, dehydration by a dehydratase (DH) domain or a

10 General Introduction complete DH-enoylreductase(ER)-KR domain that leads to a fully reduced methylene group. Following this modification, the nascent polyketide is passed on to the downstream elongation module, which is ready for the next step of elongation. Eventually, in the termination module, a thioesterase (TE) domain releases the mature polyketide chain by hydrolysis or, more commonly, by macrocyclization, which may undergo further modifications. An example for a type PKS I product is the macrolide antibiotic erythromycin (6), with its biosynthetic pathway visualized in Figure 5. [66]

Fig. 5. Biosynthesis of erythromycin (6), adapted from Donadio et al. [66]

11 General Introduction

Nonribosomal peptides. NRPs are synthesized by nonribosomal peptide synthases (NRPSs), which show a modular assembly line organization similar to PKSs. In NRPSs, proteinogenic or modified amino acids replace acyl monomer building blocks. They produce a single type of peptide, defined by the sequence of NRPS modules, each of which adds and possibly modifies a distinct amino acid to the nascent peptide. Analogous to PKS, NRPS are organized into a starting module, several elongation/extension modules and a termination module, from N- to C-terminus (see Figure 6). In the starting module, the first amino acid is selected and activated by the adenylation (A) domain and loaded onto the thiol moiety of the peptide carrier protein (PCP). In the adjacent elongation module, while the next amino acid is activated and loaded on its PCP by the successive A domain, the condensation (C) domain catalyzes the amide bond formation between the amino acid from the previous module and the activated new amino acid, transferring the nascent peptide chain to the PCP. Similarly to PKS I, additional domains can be included in NRPS modules, such as epimerase (E), N- and C-methyltransferase (MT) and oxidase (Ox) domains. In the termination module, either a thioesterase (TE) or a reduction (R) domain terminates the chain extension and releases the mature peptide, which may be submitted to further modifications, such as acylation, oxidation or glycosylations. An example for NRP biosynthesis is the assembly line for the glycopeptide antibiotics such as vancomycin (2), with its biosynthetic pathway visualized in Figure 6. [67] RiPPs. RiPPs are ribosomally synthesized and post-translationally modified peptides. Their biosynthesis is known as post-ribosomal peptide synthesis (PRPS), in line with the designation of NRPS enzymes. Synthesis initiates on a ribosomally translated precursor peptide, typically 20-110 residues long, encoded by a structural gene. [69] Generally, this precursor peptide contains a N-terminal leader peptide, important for post-translational modification enzymes and export from the cell, a core peptide that is eventually turned into the mature

12 General Introduction

Fig. 6. Vancomycin biosynthesis. Adapted from Schmartz et al. [68] product, and, in some cases, a C-terminal follower sequence. While some RiPP precursor peptides contain both leader and follower peptides, a few rare examples contain only a follower sequence. [70] After translation, the core of the precursor peptide is modified by tailoring enzymes. [69] Modifications include formation of non-proteinogenic amino acids such as lanthionin or dehydroalanine, cyclization of amino acids, for example formation of thiazoles or oxazoles, and conformationally restrictive macrocyclization. After the modifications have taken place, the leader and/or follower peptide are cleaved off in one or more steps, resulting in the mature product (see Figure 7). [69] The various RiPP classes are

13 General Introduction defined by the types of post-translational modifications that are installed on the core peptide (see Figure 8). Thiopeptides, such as siomycin A (7), [71] contain thiazol- and piperidine rings, often with dehydroamino acid such as dehydroalanine. Lassopeptides, such as sphaericin (8), [54] consist of an N- terminal macrolactam, which traps the C-terminal tail in the macrocycle, leading to a highly compact and stable structure. Lanthipeptides, such as nisin (9), contain lanthionine-moieties, which consist of two alanines, connected by a thioether- bridge. Such cross-links restrict conformational flexibility and increase the stability of the compound against degradation by enzymes. [69] Since the first comprehensive literature overview in 2013, [69] several new RiPP families have been uncovered, mostly through searching precursor peptide-related features and RiPP biosynthetic enzymes in genome sequences. [72] Putative RiPP BGCs can be discovered by computational genome mining, and their predictive biosynthetic logic allows to hypothesize their chemical structures. [70] Such “bottom-up” in silico methods will be discussed in greater detail in the next section.

Fig. 7. Conceptual visualization of post-ribosomal peptide synthesis, featuring the synthesis of a theoretical lantipeptide. Adapted from Hetrick et al. [70]

14 General Introduction

Fig. 8. Examples of RiPPs: (7) siomycin A; (8) sphaericin; (9) nisin.

Terpenes. All terpenoids are constructed from the universal C5 isoprenoid precursor isopentenyl diphosphate and dimethylallyl diphosphate. Oligomerization by different types of isoprenyl transferases and consecutive tailoring by enzymes leads to an enormous amount of structural diversity. Many terpenoids are volatile and odoriferous, such as geosmin (10), which is produced by many actinomycetes and responsible for the musty odor of moist soils. Others, such as the sesquiterpene pentalenolactone (11), exhibit antibiotic activity (see Figure 9). [73]

15 General Introduction

Fig. 9. Examples of terpene NPs: (10) geosmin; (11) pentalenolactone.

1.3 Genomics and Metabolomics in the Field of Natural Products

The re-discovery of known metabolites is a frequently occurring problem in NP research. [16] In actinomycetes, NPs are not distributed uniformly. Streptothricin, one of the first antibiotics ever discovered, occurs in approximately 10% of randomly isolated strains. Similarly, streptomycin, actinomycin and tetracyclin occur in 1%, 0.1% and 0.1% of randomly isolated strains, respectively. On the other hand, vancomycin is estimated to occur in a frequency of 1.5 x 10-5, erythromycin at 5 x 10-6 and the recently approved antibiotic daptomycin at only 1-2 x 10-7. [74] Therefore, the probability to randomly find novel bioactive metabolites in traditional NP discovery approaches, such as screening for antibiotic activity, is very low. According to Pye et al., “traditional natural product discovery platforms implemented on traditional source organisms will lead predominantly to the isolation of traditional, well-known chemical entities”. [75] To increase the odds to find new NPs, novel sources have to be accessed, but also new methods and approaches have to be applied. [20,76–78] Such approaches can be broadly classified as “top-down”, which start at the organism level, without prior knowledge of the genes responsible for the production of metabolites, and “bottom-up”, which start with the genome sequence at hand. Classical “top-down” approaches prioritize metabolites mainly due to their activity in a biological assay, such as antibiotic activity, while more modern ones select promising metabolites by comparative analysis of metabolomics data. With the advent of high-throughput genome sequencing, “bottom-up” approaches were developed that start with the identification of a BGC in a genome, using bioinformatics methods. Subsequently,

16 General Introduction the corresponding metabolite is identified and characterized using metabolomics and genetic manipulation methods. While starting from different perspectives, both “bottom-up” and “top-down” approaches attempt to identify promising NPs from a large quantity of data. [79] The combinations of different disciplines such as genomics, bioinformatics and metabolomics are an area of active research and represent a unique opportunity for the reinvigoration of natural product discovery. Generally addressed as data mining approaches, these methods are discussed below. Genomics and Genome Mining. The breakthrough sequencing of the complete genomes of the model organisms Streptomyces coelicolor A3(2) [80] and Streptomyces avermitilis [81] revealed the hidden biosynthetic capacity of Streptomyces. For Streptomyces coelicolor A3(2), 22 BGCs could be identified, with only four of them being associated to known detected metabolites. [80] Similarly, for Streptomyces avermitilis, 25 BGCs were detected. [81] This unexpected wealth in unknown biosynthetic potential in strains that had been model organisms for decades led to the idea that access to the genome sequence of a strain could be used to decipher the biosynthetic potential of the microorganism. The strategy was called “genome mining”: prediction and isolation of NPs without a priori knowledge of the underlying structure, only based on genetic information. [78] The surge in new DNA sequencing technologies, with the development of platforms such as PacBio, [82] Illumina [83] and Nanopore [84] has made the high-throughput sequencing of bacterial genomes affordable. [85] This prompted the development of various computational genome mining approaches. In “classical” genome mining, genomes are screened for homologs of conserved genes involved in natural product biosynthesis, such as PKS and NRPS domains or tailoring enzymes. Commonly used tools include BAGEL, [86] CLUSEAN [87] and antiSMASH. [64,88] Based on gene similarity, BGCs can be further grouped in gene cluster families (GCFs). The tool BiG-SCAPE allows the comparison of the biosynthetic potential of several genomes and enables the annotation of determined GCFs using the MiBIG repository of experimentally annotated BGCs. [89] Integration of the CORASON algorithm allows to construct

17 General Introduction multi-locus phylogenies of BGCs within and across GCFs and to analyze the phylogenetic relationship of many strains at once. [90] Further, phylogeny-based genome mining can investigate lineages of productive strains or BGCs of special interest, which aids in the prioritization of gene clusters for experimental characterization. [91] Resistance/regulator-based mining tools such as the tool ARTS identifies BGCs with known self-resistance genes, which allows to predict the putative bioactivity of the encoded product. [92,93] These programs allow to conduct searches (semi-)automatically, making genome mining accessible even to non-experts. Comparisons against experimentally established BGC databases, such as the MiBIG-repository, allows to annotate BGCs without additional wet laboratory work. [89] Genome mining can even support structural elucidation by predicting the substrate specificity of involved biosynthetic enzymes, especially in NRPS and PKS I systems. [94] However, a general disadvantage of genome mining is that only known classes BGCs can be identified, so far. Further, a high-quality genome sequence is the prerequisite for all analyses and has to be available beforehand. [95] Still, genome mining is an excellent tool to identify and prioritize interesting BGCs or strains, which can be investigated further, using heterologous expression or metabolomic studies. Metabolomics and Metabolome Mining. Microbial extracts are complex matrices, containing primary and secondary metabolites, media components and possibly, impurities from the extraction process. Therefore, extract analysis requires efficient analytical separation and detection techniques. In recent years, high pressure liquid chromatography, coupled to mass spectrometry (hereby referred to as LC-MS), has become a popular method due to is sensitivity, accurateness and rapidness. It allows to characterize even minor constituents of an extract, based on exact mass, adduct formation and tandem mass (MS2, MS/MS or MS2) fragmentation pattern. [96] LC-MS can be used for the high- throughput chemical profiling of extracts, enabling the systematic detection of contained metabolites. The comprehensive identification and quantification of all low-molecular weight metabolites present in a biological system is known as

18 General Introduction metabolomics. [97,98] LC-MS-based metabolomic technologies have been widely applied in the field of natural products. Targeted metabolomics focuses on a pre- defined set of metabolites, while untargeted metabolomics attempts to identify all metabolites in a biological sample, without a priori knowledge. Identification initiates usually on comparisons of the spectroscopic or spectrometric characteristics of metabolites, such as NMR or tandem mass spectra, against databases of experimental data, a method also known as dereplication. [96] Originally defined as “a process of quickly identifying known chemotypes”, [99] dereplication differentiates known from unknown metabolites. In recent years, this process has evolved significantly and will be discussed in greater extent in chapter 2. [100] Systematic LC-MS analysis of complex extracts generates large quantities of raw data, which are usually processed using computational metabolomics tools. Among the various methods, tools integrating MS2 fragmentation spectra have become increasingly popular. MS2 spectra result from the collision-induced fragmentation of the parent ion and act as reproducible molecular fingerprint. Due to common fragmentation pathways, similar molecules usually yield similar MS2 spectra. Vice versa, similar MS2 spectra can indicate related molecules. [101] Still, different types of mass spectrometers can lead to differences in fragmentation pattern, which limits comparison of data obtained on different instruments. [102] Based on the observation that similar MS2 fragmentations can indicate structural relatedness, an approach known as molecular networking has been developed. Here, the pairwise similarities between all MS2 fragmentation spectra in a dataset are calculated. Molecules with highly similar fragmentation spectra are assumed to be related and grouped into molecule clusters, also known as molecular families. Such clusters are networks (= graphs), in which compounds are represented as nodes, with links between them indicating relatedness. This allows to visually correlate the chemistry detected in a dataset. Compounds can also be singletons, if their fragmentation spectra are sufficiently different from any other compound. [101,103] To facilitate exploratory data analysis, molecular

19 General Introduction networking was developed into the Global Natural Product Social Molecular Networking (GNPS) web infrastructure, which automatizes molecular networking of user uploaded MS2 data. Users can map metadata, such as bioactivity or phylogenetic information, on the molecular network, allowing to prioritize samples and pursue mass-spectrometry based isolation of the detected metabolites. [104] Further, GNPS allows for automated annotation against a public library containing MS2 fragmentation spectra of standards. [105] Another approach, called topic modeling, considers both fragments and neutral losses, occurring during MS2 fragmentation. Neutral losses contain additional information and may correspond to functional groups, such as glycosyl moieties, which do not appear as ions in tandem mass fragmentation. Using data mining, topic modeling identifies frequently occurring subsets of fragmentation spectra, so-called “mass motifs”, which may indicate common substructures of metabolites. Therefore, topic modeling can identify similarities between compounds that would not have been noticed in molecular networking. This approach has been developed into the MS2LDA-platform, where user can upload data and calculate similarity networks based on topic modeling. [106,107] Metabolomics is an interdisciplinary field, combining a multitude of different approaches, including chemical analysis, multivariate statistics and systems biology. [96] However, to reach its full potential, metabolomic and genomic datasets have to be connected and interpreted together. Recently, the iOMEGA Paired Omics Data Platform has been initiated to provide infrastructure for the standardized linking of existing and novel genomic and metabolomic data, as well as to serve as a centralized database for paired datasets. An example for a paired-omics workflow is DeepRiPP, which was published during the writing of this thesis. DeepRiPP automatically detects RiPP BGCs in genomic data, scores them for their putative novelty and links them to compounds, detected in a corresponding metabolomics mass spectrometry dataset. [108] While such automated workflows still have to prove their value, they represent an exciting glance in the future of natural product discovery.

20 General Introduction

1.4 Scope of the Thesis

In this thesis, we aimed to identify novel natural products from the rare actinomycete genus Planomonospora. Although several antibiotics have been reported from this genus, it has not yet been investigated systematically. The underexplored nature of Planomonospora explains the lack of dedicated database for the genus. To improve annotation of metabolites, we generated an in silico tandem mass fragmentation library from an in-house database of 13 000 actinobacterial metabolites. We described the creation of this library, including an investigation into the prediction fidelity of the applied algorithm, in chapter 2 of this thesis. For our exploratory analysis, we applied state-of-the-art metabolomics and genomics tools. We retrieved 72 Planomonospora strains from the proprietary Naicons isolate library, determined their phylogenetic affiliation and cultivated them in different growth conditions. Strain extracts were profiled by high resolution LC-MS2 spectrometry and analyzed using a metabolomics workflow. We also elaborated on our ratio for - and rareness-based prioritization of natural products, which is not biased by bioactivity testing, as described in chapter 3 of this thesis. In chapter 4, we described the discovery of a family of novel metabolites called biarylitides, selected in the previous metabolomic analysis. The biarylitides are N-acetylated cyclic tripeptides with an uncommon carbon-carbon cross-link. Genomic investigation showed that these peptides are encoded by a minimal, two- gene BGC, including an 18 base pair gene that is the smallest peptide-coding gene ever reported. Due to the minimal nature of the BGC, current bioinformatics tools were unable to detect it. We developed an ad hoc search algorithm and uncovered similar putative genes in other bacterial genomes. The biological characterization of these metabolites awaits completion. In chapter 5, we discuss all research efforts and address future perspectives.

21 General Introduction

1.5 References

1 Berdy, J. Bioactive Microbial Metabolites. J. Antibiot. 2005, 58 (1), 1–26. 2 Lemke, T. L.; Williams, D. A. Foye’s Principles of Medicinal Chemistry, Seventh Edition, 2012. 3 Dewick, P. M. Medicinal Natural Products: A Biosynthetic Approach; Second Edition, 2002. 4 Steppuhn, A.; Gase, K.; Krock, B.; Halitschke, R.; Baldwin, I. T. Nicotine’s Defensive Function in Nature. PLoS Biol. 2004, 2 (8), e217. 5 Davies, J. Specialized Microbial Metabolites: Functions and Origins. J. Antibiot. 2013, 66 (7), 361-364. 6 Newman, D. J.; Cragg, G. M. Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019. J. Nat. Prod. 2020, 83 (3), 770–803. 7 Schatz, A.; Bugle, E.; Waksman, S. A. Streptomycin, a Substance Exhibiting Antibiotic Activity Against Gram-Positive and Gram-Negative Bacteria. Proc. Soc. Exp. Biol. Med. 1944, 55 (1), 66–69. 8 Waksman, S. A.; Reilly, H. C.; Johnstone, D. B. Isolation of Streptomycin- producing Strains of Streptomyces griseus. J. Bacteriol. 1946, 52 (3), 393- 397. 9 Geraci, J. E.; Heilman, F. R.; Nichols, D. R.; Wellman, W. E.; Ross, G. T.; Warren, E.; Dorothy, R. Some Laboratory and Clinical Experiences with a New Antibiotic, Vancomycin. Proc. Staff. Meet. Mayo Clinic 1956, 31 (21), 564–582. 10 Levine, D. P. Vancomycin: A History. Clin. Infect. Dis. 2006, 42, S5–S12. 11 Di Marco, A.; Cassinelli, G.; Arcamone, F. The Discovery of Daunorubicin. Cancer Treat. Rep. 1981, 65, 3–8. 12 Dutcher, J. D. The Discovery and Development of Amphotericin B. Diseases of the Chest 1968, 54, 296–298. 13 Vezina, C.; Kudelski, A.; Sehgal, S. N. Rapamycin (AY-22, 989), a New Antifungal Antibiotic. J. Antibiot. 1975, 28 (10), 721–726. 14 Sehgal, S. N.; Baker, H.; Vezina, C. Rapamycin (AY-22, 989), a New Antifungal Antibiotic. J. Antibiot. 1975, 28 (10), 727–732. 15 Sehgal, S. N. Sirolimus: Its Discovery, Biological Properties, and Mechanism of Action. Transplant. Proc. 2003, 35 (3), S7–S14. 16 Ortholand, J.-Y.; Ganesan, A. Natural Products and Combinatorial Chemistry: Back to the Future. Curr. Opin. Chem. Biol. 2004, 8 (3), 271–280. 17 Jensen, P. R.; Chavarria, K. L.; Fenical, W.; Moore, B. S.; Ziemert, N. Challenges and Triumphs to Genomics-Based Natural Product Discovery. J. Ind. Microbiol. Biotechnol. 2014, 41 (2), 203–209. 18 Amirkia, V.; Heinrich, M. Natural Products and Drug Discovery: A Survey of Stakeholders in Industry and Academia. Front. Pharmacol. 2015, 6, 237. 19 Da Silva, R. R.; Dorrestein, P. C.; Quinn, R. A. Illuminating the Dark Matter in Metabolomics. Proc. Natl. Acad. Sci. U.S.A. 2015, 112 (41), 12549–12550.

22 General Introduction

20 Monciardini, P.; Iorio, M.; Maffioli, S.; Sosio, M.; Donadio, S. Discovering New Bioactive Molecules from Microbial Sources. Microb. Biotechnol. 2014, 7 (3), 209–220. 21 Bachmann, B. O.; Van Lanen, S. G.; Baltz, R. H. Microbial Genome Mining for Accelerated Natural Products Discovery: Is a Renaissance in the Making? J. Ind. Microbiol. Biotechnol. 2014, 41 (2), 175–184. 22 Wolfender, J.-L.; Litaudon, M.; Touboul, D.; Queiroz, E. F. Innovative Omics- Based Approaches for Prioritisation and Targeted Isolation of Natural Products – New Strategies for Drug Discovery. Nat. Prod. Rep. 2019, 36 (6), 855–868. 23 Kieser, T.; Bibb, M. J.; Buttner, M. J.; Chater, K. F.; Hopwood, D. A.; Practical Streptomyces Genetics, 2000. 24 Barka, E. A.; Vatsa, P.; Sanchez, L.; Gaveau-Vaillant, N.; Jacquard, C.; Klenk, H.-P.; Clément, C.; Ouhdouch, Y.; Van Wezel, G. P. Taxonomy, Physiology, and Natural Products of Actinobacteria. Microbiol. Mol. Biol. Rev. 2016, 80 (1), 1–43. 25 Wildermuth, H.; Hopwood, D. A. Septation During Sporulation in Streptomyces coelicolor. Microbiology 1970, 60 (1), 51–59. 26 Claessen, D.; Rozen, D. E.; Kuipers, O. P.; Søgaard-Andersen, L.; Van Wezel, G. P. Bacterial Solutions to Multicellularity: A Tale of Biofilms, Filaments and Fruiting Bodies. Nat. Rev. Microbiol. 2014, 12 (2), 115–124. 27 Miguélez, E. M.; Hardisson, C.; Manzanal, M. B. Hyphal Death During Colony Development in Streptomyces antibioticus: Morphological Evidence for the Existence of a Process of Cell Deletion in a Multicellular Prokaryote. J. Cell Biol. 1999, 145 (3), 515–525. 28 Jakimowicz, D.; Van Wezel, G. P. Cell Division and DNA Segregation in Streptomyces: How to Build a Septum in the Middle of Nowhere? Mol. Microbiol. 2012, 85 (3), 393–404. 29 Morita, R. Y. Starvation and Miniaturization of Heterotrophic Bacteria with Special Emphasis on Maintenance of the Starved Viable State. Bacteria in their Natural Environments: The Effect Of Nutrient Conditions 1985, 111–130. 30 Girard, G.; Traag, B. A.; Sangal, V.; Mascini, N.; Hoskisson, P. A.; Goodfellow, M.; Van Wezel, G. P. A Novel Taxonomic Marker That Discriminates Between Morphologically Complex Actinomycetes. Open Biol. 2013, 3 (10), 130073. 31 Kirby, R. Chromosome Diversity and Similarity Within the Actinomycetales. FEMS Microbio. Lett. 2011, 319 (1), 1–10. 32 Cavaletti, L.; Monciardini, P.; Bamonte, R.; Schumann, P.; Rohde, M.; Sosio, M.; Donadio, S. New Lineage of Filamentous, Spore-Forming, Gram-Positive Bacteria from Soil. Appl. Environ. Microbiol. 2006, 72 (6), 4360–4369. 33 Monciardini, P.; Cavaletti, L.; Ranghetti, A.; Schumann, P.; Rohde, M.; Bamonte, R.; Sosio, M.; Mezzelani, A.; Donadio, S. Novel Members of the Family Micromonosporaceae, Rugosimonospora acidiphila Gen. Nov., Sp. Nov. and Rugosimonospora africana Sp. Nov. Int. J. Syst. Evol. Microbiol. 2009, 59 (11), 2752–2758. 34 Busti, E.; Monciardini, P.; Cavaletti, L.; Bamonte, R.; Lazzarini, A.; Sosio, M.; Donadio, S. Antibiotic-Producing Ability by Representatives of a Newly Discovered Lineage of Actinomycetes. Microbiology 2006, 152 (3), 675–683.

23 General Introduction

35 Tiwari, K.; Gupta, R. K. Diversity and Isolation of Rare Actinomycetes: An Overview. Crit. Rev. Microbiol. 2013, 39 (3), 256–294. 36 Lazzarini, A.; Cavaletti, L.; Toppo, G.; Marinelli, F. Rare Genera of Actinomycetes as Potential Producers of New Antibiotics. Antonie van Leeuwenhoek 2000, 78 (3), 399–405. 37 Tiwari, K.; Gupta, R. K. Bioactive Metabolites from Rare Actinomycetes. Stud. Nat. Prod. Chem. 2014, 41, 419–512. 38 Feling, R. H.; Buchanan, G. O.; Mincer, T. J.; Kauffman, C. A.; Jensen, P. R.; Fenical, W. Salinosporamide A: a Highly Cytotoxic Proteasome Inhibitor from a Novel Microbial Source, a Marine Bacterium of the New Genus Salinospora. Angew. Chem. Int. Ed. 2003, 42 (3), 355–357. 39 Maffioli, S. I.; Iorio, M.; Sosio, M.; Monciardini, P.; Gaspari, E.; Donadio, S. Characterization of the Congeners in the Lantibiotic NAI-107 Complex. J. Nat. Prod. 2014, 77 (1), 79–84. 40 Castiglione, F.; Lazzarini, A.; Carrano, L.; Corti, E.; Ciciliato, I.; Gastaldo, L.; Candiani, P.; Losi, D.; Marinelli, F.; Selva, E.; Parenti, F. Determining the Structure and Mode of Action of Microbisporicin, a Potent Lantibiotic Active Against Multiresistant Pathogens. Chem. Biol. 2008, 15 (1), 22–31. 41 Iorio, M.; Sasso, O.; Maffioli, S. I.; Bertorelli, R.; Monciardini, P.; Sosio, M.; Bonezzi, F.; Summa, M.; Brunati, C.; Bordoni, R.; Corti, G.; Tarozzo, G.; Piomelli, D.; Reggiani, A.; Donadio, S. A Glycosylated, Labionin-Containing Lanthipeptide with Marked Antinociceptive Activity. ACS Chem. Biol. 2014, 9 (2), 398–404. 42 Thiemann, J. E. A New Genus of the Actinoplanaceae: Planomonospora Gen. Nov. G. Microbiol. 1967, 15, 27–38. 43 Goodfellow, M.; Kämpfer, P.; Busse, H.-J.; Trujillo, M. E.; Suzuki, K.-i.; Ludwig, W.; Whitman, W. B. Bergey’s Manual of Systematic Bacteriology, 2012. 44 Thiemann, J. E. Study of Some New Genera and Species of the Actinoplanaceae. The actinomycetales. Jena international symposium on taxonomy, 1970. 45 Mertz, F. P. Planomonospora alba Sp. Nov. and Planomonospora sphaerica Sp. Nov., Two New Species Isolated from Soil by Baiting Techniques. Int. J. Syst. Evol. Microbiol. 1994, 44 (2), 274–281. 46 Suriyachadkun, C.; Ngaemthao, W.; Chunhametha, S. Planomonospora corallina Sp. Nov., Isolated from Soil. Int. J. Syst. Evol. Microbiol. 2016, 66 (8), 3224–3229. 47 Chaouch, F. C.; Bouras, N.; Mokrane, S.; Bouznada, K.; Zitouni, A.; Pötter, G.; Spröer, C.; Klenk, H.-P.; Sabaou, N. Planomonospora algeriensis Sp. Nov., an Actinobacterium Isolated from a Saharan Soil of Algeria. Antonie van leeuwenhoek 2017, 110 (2), 245–252. 48 Maffioli, S. I.; Potenza, D.; Vasile, F.; De Matteo, M.; Sosio, M.; Marsiglia, B.; Rizzo, V.; Scolastico, C.; Donadio, S. Structure Revision of the Lantibiotic 97518. J. Nat. Prod. 2009, 72 (4), 605–607. 49 Castiglione, F.; Cavaletti, L.; Losi, D.; Lazzarini, A.; Carrano, L.; Feroggio, M.; Ciciliato, I.; Corti, E.; Candiani, G.; Marinelli, F.; Selva, E. A Novel Lantibiotic

24 General Introduction

Acting on Bacterial Cell Wall Synthesis Produced by the Uncommon Actinomycete Planomonospora Sp. Biochemistry 2007, 46 (20), 5884–5895. 50 Richard, D.; Pagano, J. F.; John, V. Thiostrepton, Its Salts, and Production, U.S. Patent No. 2,982,689, 1961. 51 Thiemann, J.; Coronelli, C.; Pagani, H.; Beretta, G.; Tamoni, G.; Arioli, V. Sporangiomycin, an Antibacterial Agent Isolated from Planomonospora parontospora Var. antibiotica Var. Nov. J. Antibiot 1968, 21, 525–531. 52 Wingender, W.; Von Hugo, H.; Frommer, W.; Schäfer, D. A Protease Inhibitor Isolated from Planomonospora parontospora. J. Antibiot. 1975, 28 (8), 611– 612. 53 Suda, H.; Aoyagi, T.; Hamada, M.; Takeuchi, T.; Umezawa, H. Antipain, a New Protease Inhibitor Isolated from Actinomycetes. J. Antibiot. 1972, 25 (4), 263–265. 54 Kodani, S.; Inoue, Y.; Suzuki, M.; Dohra, H.; Suzuki, T.; Hemmi, H.; Ohnishi- Kameyama, M. Sphaericin, a Lasso Peptide from the Rare Actinomycete Planomonospora sphaerica. Eur. J. Org. Chem. 2017, 8, 1177–1183. 55 Wink, J.; Mohammadipanah, F.; Hamedi, J. Biology and Biotechnology of Actinobacteria, 2017. 56 Nazari, B.; Forneris, C. C.; Gibson, M. I.; Moon, K.; Schramma, K. R.; Seyedsayamdost, M. R. Nonomuraea Sp. ATCC 55076 Harbours the Largest Actinomycete Chromosome to Date and the Kistamicin Biosynthetic Gene Cluster. Med. Chem. Comm. 2017, 8 (4), 780–788. 57 Cimermancic, P.; Medema, M. H.; Claesen, J.; Kurita, K.; Brown, L. C. W.; Mavrommatis, K.; Pati, A.; Godfrey, P. A.; Koehrsen, M.; Clardy, J.; Birren, B. W.; Takano, E.; Sali, A.; Linington, R. G.; Fischbach, M. A. Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters. Cell 2014, 158 (2), 412–421. 58 Chevrette, M. G.; Gutiérrez-García, K.; Selem-Mojica, N.; Aguilar-Martínez, C.; Yañez-Olvera, A.; Ramos-Aboites, H. E.; Hoskisson, P. A.; Barona- Gómez, F. Evolutionary Dynamics of Natural Product Biosynthesis in Bacteria. Nat. Prod. Rep. 2020, 37 (4), 566–599. 59 Baltz, R. H. Gifted Microbes for Genome Mining and Natural Product Discovery. J. Ind. Microbiol. Biotechnol. 2017, 44 (4-5), 573–588. 60 Belknap, K. C.; Park, C. J.; Barth, B. M.; Andam, C. P. Genome Mining of Biosynthetic and Chemotherapeutic Gene Clusters in Streptomyces Bacteria. Sci. Rep. 2020, 10, 2003. 61 Ziemert, N.; Lechner, A.; Wietz, M.; Millán-Aguiñaga, N.; Chavarria, K. L.; Jensen, P. R. Diversity and Evolution of Secondary Metabolism in the Marine Actinomycete Genus Salinispora. Proc. Natl. Acad. Sci. U. S. A. 2014, 111 (12), E1130–E1139. 62 Adamek, M.; Alanjary, M.; Sales-Ortells, H.; Goodfellow, M.; Bull, A. T.; Winkler, A.; Wibberg, D.; Kalinowski, J.; Ziemert, N. Comparative Genomics Reveals Phylogenetic Distribution Patterns of Secondary Metabolites in Amycolatopsis Species. BMC Genom. 2018, 19 (1), 1–15. 63 Doroghazi, J. R.; Metcalf, W. W. Comparative Genomics of Actinomycetes with a Focus on Natural Product Biosynthetic Genes. BMC Genom. 2013, 14 (1), 611.

25 General Introduction

64 Blin, K.; Shaw, S.; Steinke, K.; Villebro, R.; Ziemert, N.; Lee, S. Y.; Medema, M. H.; Weber, T. AntiSMASH 5.0: Updates to the Secondary Metabolite Genome Mining Pipeline. Nucleic Acids Res. 2019, 47 (W1), W81–W87. 65 Hopwood, D. A. Genetic Contributions to Understanding Polyketide Synthases. Chem. Rev. 1997, 97 (7), 2465–2498. 66 Donadio, S.; Staver, M. J.; McAlpine, J. B.; Swanson, S. J.; Katz, L. Modular Organization of Genes Required for Complex Polyketide Biosynthesis. Science 1991, 252 (5006), 675–679. 67 Van Wageningen, A. A.; Kirkpatrick, P. N.; Williams, D. H.; Harris, B. R.; Kershaw, J. K.; Lennard, N. J.; Jones, M.; Jones, S. J. M.; Solenberg, P. J. Sequencing and Analysis of Genes Involved in the Biosynthesis of a Vancomycin Group Antibiotic. Chem. Biol. 1998, 5 (3), 155–162. 68 Schmartz, P. C.; Zerbe, K.; Abou-Hadeed, K.; Robinson, J. A. Bis-Chlorination of a Hexapeptide–PCP Conjugate by the Halogenase Involved in Vancomycin Biosynthesis. Org. Biomol. Chem. 2014, 12 (30), 5574–5577. 69 Arnison, P. G.; Bibb, M. J.; Bierbaum, G.; Bowers, A. A.; Bugni, T. S.; Bulaj, G.; Camarero, J. A.; Campopiano, D. J.; Challis, G. L.; Clardy, J.; Cotter, P. D.; Craik, D. J.; Dawson, M.; Dittmann, E.; Donadio, S.; Dorrestein, P. C.; Entian, K.-D.; Fischbach, M. A.; Garavelli, J. S.; Göransson, U.; Gruber, C. W.; Haft, D. H.; Hemscheidt, T. K.; Hertweck, C.; Hill, C.; Horswill, A. R.; Jaspars, M.; Kelly, W. L.; Klinman, J. P.; Kuipers, O. P.; Link, A. J.; Liu, W.; Marahiel, M. A.; Mitchell, D. A.; Moll, G. N.; Moore, B. S.; Müller, R.; Nair, S. K.; Nes, I. F.; Norris, G. E.; Olivera, B. M.; Onaka, H.; Patchett, M. L.; Piel, J.; Reaney, M. J. T.; Rebuffat, S.; Ross, R. P.; Sahl, H.-G.; Schmidt, E. W.; Selsted, M. E.; Severinov, K.; Shen, B.; Sivonen, K.; Smith, L.; Stein, T.; Süssmuth, R. D.; Tagg, J. R.; Tang, G.-L.; Truman, A. W.; Vederas, J. C.; Walsh, C. T.; Walton, J. D.; Wenzel, S. C.; Willey, J. M.; Van der Donk, W. A. Ribosomally Synthesized and Post-Translationally Modified Peptide Natural Products: Overview and Recommendations for a Universal Nomenclature. Nat. Prod. Rep. 2013, 30 (1), 108–160. 70 Hetrick, K. J.; Van der Donk, W. A. Ribosomally Synthesized and Post- Translationally Modified Peptide Natural Product Discovery in the Genomic Era. Curr. Opin. Chem. Biol. 2017, 38, 36–44. 71 Ebata, M.; Miyazaki, K.; Otsuka, H. Studies on Siomycin. I. J. Antibiot. 1969, 22 (8), 364–368. 72 Kloosterman, A. M.; Shelton, K. E.; Van Wezel, G.; Medema, M. H.; Mitchell, D. A. RRE-Finder: A Genome-Mining Tool for Class-Independent RiPP Discovery. bioRxiv 2020. 73 Reddy, G. K.; Leferink, N. G. H.; Umemura, M.; Ahmed, S. T.; Breitling, R.; Scrutton, N. S.; Takano, E. Exploring Novel Bacterial Terpene Synthases. PloS one 2020, 15 (4), e0232220. 74 Baltz, R. H. Antimicrobials from Actinomycetes: Back to the Future. mBio 2007, 2 (3), 125. 75 Pye, C. R.; Bertin, M. J.; Lokey, R. S.; Gerwick, W. H.; Linington, R. G. Retrospective Analysis of Natural Products Provides Insights for Future Discovery Trends. Proc. Natl. Acad. Sci. U.S.A. 2017, 114 (22), 5601–5606.

26 General Introduction

76 Russell, A. H.; Truman, A. W. Genome Mining Strategies for Ribosomally Synthesised and Post-Translationally Modified Peptides. Comput. Struct. Biotechnol. J. 2020, 18, 1838-1851 77 Hug, J. J.; Bader, C. D.; Remškar, M.; Cirnski, K.; Müller, R. Concepts and Methods to Access Novel Antibiotics from Actinomycetes. Antibiotics 2018, 7 (2), 44. 78 Ziemert, N.; Alanjary, M.; Weber, T. The Evolution of Genome Mining in Microbes–a Review. Nat. Prod. Rep. 2016, 33 (8), 988–1005. 79 Luo, Y.; Cobb, R. E.; Zhao, H. Recent Advances in Natural Product Discovery. Curr. Opin. Biotechnol. 2014, 30, 230–237. 80 Bentley, S. D.; Chater, K. F.; Cerdeño-Tárraga, A.-M.; Challis, G. L.; Thomson, N. R.; James, K. D.; Harris, D. E.; Quail, M. A.; Kieser, H.; Harper, D.; Bateman, A.; Brown, S.; Chandra, G.; Chen, C. W.; Collins, M.; Cronin, A.; Fraser, A.; Goble, A.; Hidalgo, J.; Hornsby, T.; Howarth, S.; Huang, C.; Kieser, T.; Larke, L.; Murphy, L.; Oliver, K.; O’Neil, S.; Rabbinowitsch, E.; Rajandream, M. K.; Rutherford, K.; Rutter, S.; Seeger, D.; Saundern, S.; Sharp, S.; Squares, R.; Squares, S.; Taylor, K.; Warren, T.; Wietzorrek, A.; Woodward, J.; Barrell, B.; Parkhill, J.; Hopwood, D. Complete Genome Sequence of the Model Actinomycete Streptomyces coelicolor A3 (2). Nature 2002, 417 (6885), 141–147. 81 Omura, S.; Ikeda, H.; Ishikawa, J.; Hanamoto, A.; Takahashi, C.; Shinose, M.; Takahashi, Y.; Horikawa, H.; Nakazawa, H.; Osonoe, T.; Kikuchi, H.; Shiba, T.; Sakaki, Y.; Hattori, M. Genome Sequence of an Industrial Microorganism Streptomyces avermitilis: Deducing the Ability of Producing Secondary Metabolites. Proc. Natl. Acad. Sci. U.S.A. 2001, 98 (21), 12215–12220. 82 English, A. C.; Richards, S.; Han, Y.; Wang, M.; Vee, V.; Qu, J.; Qin, X.; Muzny, D. M.; Reid, J. G.; Worley, K. C.; Gibbs, R. A. Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PloS one 2012, 7 (11), e47768. 83 Kozich, J. J.; Westcott, S. L.; Baxter, N. T.; Highlander, S. K.; Schloss, P. D. Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform. Appl. Environ. Microbiol. 2013, 79 (17), 5112–5120. 84 Karlsson, E.; Lärkeryd, A.; Sjödin, A.; Forsman, M.; Stenberg, P. Scaffolding of a Bacterial Genome Using MinION Nanopore Sequencing. Sci. Rep. 2015, 5, 11996. 85 Koren, S.; Treangen, T. J.; Hill, C. M.; Pop, M.; Phillippy, A. M. Automated Ensemble Assembly and Validation of Microbial Genomes. BMC Bioinf. 2014, 15 (1), 1–9. 86 De Jong, A.; Van Hijum, S. A.; Bijlsma, J. J. E.; Kok, J.; Kuipers, O. P. BAGEL: A Web-Based Bacteriocin Genome Mining Tool. Nucleic Acids Res. 2006, 34, W273–W279. 87 Weber, T.; Rausch, C.; Lopez, P.; Hoof, I.; Gaykova, V.; Huson, D. N.; Wohlleben, W. CLUSEAN: A Computer-Based Framework for the Automated Analysis of Bacterial Secondary Metabolite Biosynthetic Gene Clusters. J. Biotechnol. 2009, 140 (1-2), 13–17.

27 General Introduction

88 Medema, M. H.; Blin, K.; Cimermancic, P.; De Jager, V.; Zakrzewski, P.; Fischbach, M. A.; Weber, T.; Takano, E.; Breitling, R. AntiSMASH: Rapid Identification, Annotation and Analysis of Secondary Metabolite Biosynthesis Gene Clusters in Bacterial and Fungal Genome Sequences. Nucleic Acids Res. 2011, 39, W339–W346. 89 Kautsar, S. A.; Blin, K.; Shaw, S.; Navarro-Muñoz, J. C.; Terlouw, B. R.; Van Der Hooft, J. J. J.; Van Santen, J. A.; Tracanna, V.; Suarez Duran, H. G.; Pascal Andreu, V.; Selem-Mojica, N.; Alanjary, M.; Robinson, S. L.; Lund, G.; Epstein, S. C.; Sisto, A. C.; Charkoudian, L. K.; Collemare, J.; Linington, R. G.; Weber, T.; Medema, M. H. MIBiG 2.0: A Repository for Biosynthetic Gene Clusters of Known Function. Nucleic Acids Res. 2020, 48 (D1), D454–D458. 90 Navarro-Muñoz, J. C.; Selem-Mojica, N.; Mullowney, M. W.; Kautsar, S. A.; Tryon, J. H.; Parkinson, E. I.; De Los Santos, E. L.; Yeong, M.; Cruz-Morales, P.; Abubucker, S.; Roeters, A.; Lokhorst, W.; Fernandez-Guerra, A.; Dias Cappelini, L. T.; Goering, A. W.; Thomson, R. J.; Metcalf, W. W.; Kelleher, N. L.; Barona-Gomez, F.; Medema, M. H. A Computational Framework to Explore Large-Scale Biosynthetic Diversity. Nat. Chem. Biol. 2020, 16 (1), 60– 68. 91 Ziemert, N.; Podell, S.; Penn, K.; Badger, J. H.; Allen, E.; Jensen, P. R. The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity. PloS one 2012, 7 (3), e34064. 92 Alanjary, M.; Kronmiller, B.; Adamek, M.; Blin, K.; Weber, T.; Huson, D.; Philmus, B.; Ziemert, N. The Antibiotic Resistant Target Seeker (ARTS), an Exploration Engine for Antibiotic Cluster Prioritization and Novel Drug Target Discovery. Nucleic Acids Res. 2017, 45 (W1), W42–W48. 93 Mungan, M. D.; Alanjary, M.; Blin, K.; Weber, T.; Medema, M. H.; Ziemert, N. ARTS 2.0: Feature Updates and Expansion of the Antibiotic Resistant Target Seeker for Comparative Genome Mining. Nucleic Acids Res. 2020, 48 (W1), W546–W552. 94 Von Döhren, H.; Dieckmann, R.; Pavela-Vrancic, M. The Nonribosomal Code. Chem. Biol. 1999, 6 (10), R273–R279. 95 Zerikly, M.; Challis, G. L. Strategies for the Discovery of New Natural Products by Genome Mining. ChemBioChem 2009, 10 (4), 625–633. 96 Hubert, J.; Nuzillard, J.-M.; Renault, J.-H. Dereplication Strategies in Natural Product Research: How Many Tools and Methodologies Behind the Same Concept? Phytochem. Rev. 2017, 16 (1), 55–95. 97 Fiehn, O. Metabolomics - the Link Between Genotypes and Phenotypes. Funct. Genom. 2002; 155–171. 98 Zhang, A.; Sun, H.; Wang, P.; Han, Y.; Wang, X. Modern Analytical Techniques in Metabolomics Analysis. Analyst 2012, 137 (2), 293–300. 99 Beutler, J. A.; Alvarado, A. B.; Schaufelberger, D. E.; Andrews, P.; McCloud, T. G. Dereplication of Phorbol Bioactives: Lyngbya majuscula and Croton cuneatus. J. Nat. Prod. 1990, 53 (4), 867–874. 100 Wolfender, J.-L.; Nuzillard, J.-M.; Van Der Hooft, J. J. J.; Renault, J.-H.; Bertrand, S. Accelerating Metabolite Identification in Natural Product Research: Toward an Ideal Combination of Liquid Chromatography–high-

28 General Introduction

Resolution Tandem Mass Spectrometry and NMR Profiling, in silico Databases, and Chemometrics. Anal. Chem. 2019, 91 (1), 704–742. 101 Yang, J. Y.; Sanchez, L. M.; Rath, C. M.; Liu, X.; Boudreau, P. D.; Bruns, N.; Glukhov, E.; Wodtke, A.; De Felicio, R.; Fenner, A.; Wong, W. R.; Linington, R. G.; Zhang, L.; Debonsi, H. M.; Gerwick, W. H.; Dorrestein, P. C. Molecular Networking as a Dereplication Strategy. J. Nat. Prod. 2013, 76 (9), 1686– 1699. 102 Niessen, W. M.; Correa C. R. A. Interpretation of MS-MS Mass Spectra of Drugs and Pesticides, 2017. 103 Nguyen, D. D.; Wu, C.-H.; Moree, W. J.; Lamsa, A.; Medema, M. H.; Zhao, X.; Gavilan, R. G.; Aparicio, M.; Atencio, L.; Jackson, C.; Ballesteros, J.; Sanchez, J.; Watrous, J. D.; Phelan, V. V.; Van De Wiel, C.; Kersten, R. D.; Mehnaz, S.; De Mot, R.; Shank, E. A.; Charusanti, P.; Nagarajan, H.; Duggan, B. M.; Moore, B. S.; Bandeira, N.; Palsson, B. Ø.; Pogliano, K.; Gutiérrez, M.; Dorrestein, P. C. MS/MS Networking Guided Analysis of Molecule and Gene Cluster Families. Proc. Natl. Acad. Sci. U.S.A. 2013, 110 (28), E2611–E2620. 104 Olivon, F.; Allard, P.-M.; Koval, A.; Righi, D.; Genta-Jouve, G.; Neyts, J.; Apel, C.; Pannecouque, C.; Nothias, L.-F.; Cachet, X.; Marcourt, L.; Roussi, F.; Katanaev, V. L.; Touboul, D.; Wolfender, J.-L.; Litaudon, M. Bioactive Natural Products Prioritization Using Massive Multi-Informational Molecular Networks. ACS Chem. Bio. 2017, 12 (10), 2644-2651. 105 Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crüsemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderón, M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.; Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya, C. A. P.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O’Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodríguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias, L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Müller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, Bernhard O; Pogliano, K.; Linington, R. G.; Gutiérrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N. Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016, 34 (8), 828–837.

29 General Introduction

106 Van Der Hooft, J. J. J.; Wandy, J.; Barrett, M. P.; Burgess, K. E.; Rogers, S. Topic Modeling for Untargeted Substructure Exploration in Metabolomics. Proc. Natl. Acad. Sci. U.S.A. 2016, 113 (48), 13738–13743. 107 Wandy, J.; Zhu, Y.; Van Der Hooft, J. J. J.; Daly, R.; Barrett, M. P.; Rogers, S. Ms2lda.org: Web-Based Topic Modelling for Substructure Discovery in Mass Spectrometry. Bioinformatics 2018, 34 (2), 317–318. 108 Merwin, N. J.; Mousa, W. K.; Dejong, C. A.; Skinnider, M. A.; Cannon, M. J.; Li, H.; Dial, K.; Gunabalasingam, M.; Johnston, C.; Magarvey, N. A. DeepRiPP Integrates Multiomics Data to Automate Discovery of Novel Ribosomally Synthesized Natural Products. Proc. Natl. Acad. Sci. U.S.A. 2020, 117 (1), 371–38

30 Chapter 2

2

Creation of an In Silico MS2 Fragmentation Library for Automated Dereplication

31 Chapter 2

2.1 Abstract n natural product research, dereplication, the fast recognition of known molecules, is a crucial step to avoid redundant compound I 2 characterization. Advancements in LC-MS -analytical instrumentation have enabled high-throughput processing of extracts, making manual dereplication the rate-limiting step of analysis. Therefore, automated annotation methods, such as matching of experimental MS2 fragmentation spectra against databases of standards, have increased in importance. Unfortunately, such spectral databases are vastly incomplete, hindering automated dereplication. To bridge the gap between structural and spectral databases, tools for the computational prediction of MS2 fragmentation spectra from structural information have been developed. In this study, the in silico fragmentation tool CFM-ID has been applied to amend an in-house database containing structural data of circa 13 000 actinobacterial metabolites with MS2 fragmentation data. While CFM-ID proved to be suited for the prediction of some molecular classes, it failed to predict the fragmentation pattern of macrocyclic molecules correctly, as exemplified. Despite this drawback, the generated in silico database can be a valuable addition to dereplication workflows.

32 Chapter 2

2.2 Introduction

To date, more than 300 000 NPs have been isolated and reported in the literature. This large catalog of known compounds makes the re-discovery of metabolites a frequently occurring issue. [1] Compound isolation and structure elucidation is a lengthy and expensive task and the risk of redundant characterization of NPs should be reduced to a minimum. [2] Hence, in NP discovery workflows, the recognition of known metabolites, known as dereplication, should be performed as early as possible. [3] Usually, the dereplication of a molecule requires chemical profiling, such as UV profile, NMR and MS, which are compared against in-house or external curated spectral databases of literature-reported metabolites. While NMR is universally applicable and non-destructive, MS is fast and orders of magnitude more sensitive. Usually, MS is combined with a chromatography separation method (e.g. LC) to routinely analyze complex extracts and to separate structural isomers. MS2 yields additional structural information by fragmentation of ions. Among the various fragmentation methods, electrospray-ionization (ESI), coupled to collision-induced dissociation (CID), is of special importance to natural product research. In ESI-CID, the stream of liquid coming from LC is electrostatically charged to create a fine aerosol of microdroplets, from which solvent is evaporated until discrete analyte-ion adducts (ions) remain. These ions are introduced into a collision cell containing an inert gas. Upon collision, part of the kinetic energy of the ions is converted into vibrational energy, which leads to specific fragmentations due to the chemical structure of the analyte. [21] The resulting fragmentation spectra can be considered molecular fingerprints and used for increased annotation confidence when compared to fragmentation spectra of authentic standards in spectral databases. [4] These features, alongside of its compatibility with high-throughput approaches, makes LC-MS2 the method of choice in NP screening workflows. [5] Dereplication relies heavily on databases, which should contain as much annotated data from known analytes as possible. Ideally, it should contain metadata, such as phylogeny of the producing organism, bioactivity, and spectral

33 Chapter 2 data, such as NMR or MS2 fragmentation spectra. [1] While some spectra reference databases exist, they are still vastly incomplete, especially for less- investigated microbial genera. [6] Further, many NPs discovered in the past are missing associated spectral data, mainly because they were not recorded due to historical limitations in analytical instrumentation. [1] To bridge the gap between structural and spectral data, the in silico prediction of MS2 data has received broad attention. [5-7] Starting from a chemical structure, MS2 spectra can be predicted either from first-principles (ab initio, using quantum-mechanical methods) or from the chemometric analysis of high-quality reference data. [1] While the former is universally applicable, it has the disadvantage of being slow and is therefore not feasible for large datasets. The latter applies a set of empirical, combinatorial or machine-learned rules for the simulation of compound fragmentation, which is fast, but constrained by the extent of the training data set, meaning that less-represented molecular classes are predicted less precisely. [5,6] Chemometric-based approaches are currently the method of choice, with several tools commonly used, such as CSI:FingerID, [8], MetFrag, [9,10] MAGMa [11] or CFM-ID. [12] The latter has been demonstrated to outperform several other in silico fragmentation tools and has been used previously for the prediction of MS2 spectra of NPs. [12-14] Unconstrained searches in NP databases can lead to false positive annotations, due to the large number of similar metabolites. However, an NP that is produced by terrestrial bacteria is unlikely to be produced by marine fungi and vice versa. Therefore, targeted libraries, tailored for the research objective in question, are favored to reduce the “noise” of the system. [6,15,16] While several microbial NP databases exist, only a minority is focused on actinobacteria. [1] One such database is the proprietary Antibiotic Literature (ABL) library, first compiled by Biosearch Italia and further developed by Naicons Srl. This specialized database contains a wealth of information of over 13 300 compounds, derived from actinomycetes, such as taxonomic and physicochemical characteristics, biological activity and molecular structures (see Figure S1). Of the compounds, 18.2% are attributed to “rare actinomycetes” [17]. Regrettably, this important

34 Chapter 2 database does not include MS2 spectra, which limits its usability in automated dereplication workflows. In this study, we used the computational fragmentation tool CFM-ID to predict in silico MS2 spectra for the compounds contained in the ABL library, creating the ABL in silico database (ABL-ISDB). To validate the prediction fidelity of the tool, we compared the in silico MS2 spectra of a number of reference actinomycetes metabolites against their experimental fragmentation spectra. Further, we integrated the ABL-ISDB into an untargeted metabolomics workflow and were able to annotate the ureylene-containing peptides chymostatinol A and GE-20372 A/B in a real-world dataset.

Fig. 1. Metabolomic tools and workflow utilized in this study. (A) Structural information of molecules contained in the ABL library is used to predict MS2 spectra, using the tool CFM-ID, and leading to the generation of the ABL-ISDB. (B) Extracts are analyzed by LC-MS and processed with the software MZmine 2, where also spectral matching against the ABL-ISDB takes place, leading to putative annotations that can be investigated further.

2.3 Results & Discussion

ABL-ISDB Generation and Investigation. Using the command line version of the freely available CFM-ID tool (https://sourceforge.net/projects/cfm- id/), in silico MS2 fragmentation spectra were predicted for 13 300 entries for

35 Chapter 2 actinomycete metabolites from the ABL library. CFM-ID is a hybrid rule- based/machine learning tool that takes a chemical structure in SMILES-format and uses a pre-trained statistical model to predict its MS2 fragmentation pattern. The CFM-ID algorithm only predicts electrospray ionization collision induced dissociation (ESI-CID) but is otherwise independent of instrument type. By default, CFM-ID calculates the prediction for the [M+H]+ ion. [12] To access the prediction fidelity of CFM-ID, the in silico fragmentation spectra of a sample of actinomycetes metabolites were compared with experimentally determined ones. For our qualitative analysis, in silico predicted fragments were classified in three groups: (i) True positive fragments, i.e. fragments that were predicted and were also observed in the experimental spectrum; (j) false positive fragments, which were predicted but were not observed in the experimental spectrum; (k) false negative fragments, which were not predicted but were observed in the experimental spectrum. The ratio between these fragments (i:j:k) can give an idea of the prediction quality of the fragmentation algorithm, even though it does not account for fragment intensity. As first case study, the hydroxamate siderophore deferoxamine B [18] and the catechol siderophore benarthin [19,20] were examined. They were chosen as proof of concept due to their linear structures, containing several amide bonds, which represent preferred points of fragmentation. [21] For deferoxamine B (Figure 2), eight fragments were true positive (i), both predicted and observed in the experimental spectrum. 13 fragments were false positives (j), thus predicted but not experimentally observed. Six fragments were false negatives (k), not predicted but still experimentally observed. With a ratio of 8:13:6, the fragmentation prediction for deferoxamine B was considered satisfactory, also because all high-intensity fragments were predicted correctly (Figure 2). Fragmentation prediction for benarthin (Figure 3) was similar, with eight i- fragments, eight j-fragments and 13 k-fragments. The majority of the false- negative (k) fragments was of low intensity and therefore, of lesser importance. All fragments except one in the experimental spectrum were accounted for, which was considered a good result.

36 Chapter 2

Fig. 2. (A) theoretical fragmentation of deferoxamine B. (B) Comparison of the experimental MS2 fragmentation spectra of deferoxamine B (top) with the in silico-predicted one (bottom).

Fig. 3. (A) theoretical fragmentation of benarthin. (B) Comparison of the experimental MS2 fragmentation spectra of benarthin (top) with the in silico- predicted one (bottom).

37 Chapter 2

For the polyketide daunorubicin (Figure 4), [22] the CFM-ID tool predicted five true-positive (i), five false-positive (j) and four false-negative (k) fragments. The low overall number of fragments can be explained by the chemical structure of daunorubicin, which has a highly stable aromatic anthracene core. Only few residues, such as the hydroxyl-group or sugar moiety, can be split off effectively during fragmentation, which was reflected well in the fragmentation prediction. The lipopeptide daptomycin (Figure 5) consists of a peptide macrocycle of 10 amino acids and an exocyclic tail of three amino acids, to which a fatty acid chain is attached. [23] 6 fragments were true-positive (i), 13 false-positive (j) and 20 false-negative (k). Closer examination showed that only the fragmentation of the exocyclic tail was predicted correctly (Figure 5), which explains the high number of false-negative (k) fragments. Based on this observation, we assumed be that the tool has some limitations regarding the fragmentation of macrocycles. To test this assumption, further two macrocyclic compounds were examined: thiostrepton A [24] and rifampicin. [25] For thiostrepton A, a thiopeptide with two macrocycles and an exocyclic tail of two amino acids, no true-positive fragments (i) were predicted by the algorithm, despite a high number of j- and k- fragments (Figure S2). The situation was similar for rifampicin, a macrocyclic polyketide, with zero i-fragments, but many j- and k-fragments. Overall, the case studies demonstrate the possible shortcomings of the CFM-ID tool, due to improper fragmentation prediction of macrocyclic molecules. However, the examples also show that the tool is suited for the prediction of MS2 fragmentation spectra of linear molecules, as corroborated by a related study that reported the annotation of several stilbenes and schweinfurthins. [14] To our knowledge, correct fragmentation prediction for complex, cyclic molecules by CFM-ID has yet to be reported. Thus, the dereplication of molecules by CFM-ID- generated in silico MS2 spectra has to consider these shortcomings. However, the limited sample size in our investigation allows only to observe trends, and a larger, systematic study would be necessary to fully access the performance of CFM-ID.

38 Chapter 2

Fig. 4. (A) theoretical fragmentation of daunorubicin. (B) Comparison of the experimental MS2 fragmentation spectra of daunorubicin (top) with the in silico-predicted one (bottom).

Fig. 5. (A) theoretical fragmentation of daptomycin. (B) Comparison of the experimental MS2 fragmentation spectra of daptomycin (top) with the in silico- predicted one (bottom).

39 Chapter 2

Integration into Metabolomics Workflow and Annotation. Considering its limitations, we integrated the ABL-ISDB into our metabolomics workflow for automated dereplication, as described in chapter 3. Briefly, LC-MS data is processed by the software MZmine 2 (http://mzmine.github.io/), a peak-picking metabolomics software that aligns spectra and decreases redundancy of features, allowing for data reduction. [26] MZmine 2 is also capable to perform spectral matching against a user-provided spectral library, for example the ABL-ISDB. As reported in chapter 3, 286 microbial extracts were processed by MZmine 2 and searched against the ABL-ISDB. Two molecules were successfully identified. A signal with m/z 596.32 was identified that matched the [M+H]+ ion of chymostatinol A. This match was confirmed by comparison to an authentic standard (see Figure 6 and chapter 3). Further, a signal with m/z 612.31 was annotated that matched the [M+H]+ ion of GE-20372 A/B (see Figure 7 and chapter 3). Both metabolites were not previously known to be produced by Planomonospora. Prompted by these results, a literature search and manual annotation of MS signals led to the annotation of further four ureylene-containing peptides, as described in detail in chapter 3.

2.4 Conclusion

In this study, we integrated and applied the tool CFM-ID for the computational prediction of MS2 fragmentation data, using structural information from the proprietary ABL library. In our sample, CFM-ID was well-suited for the prediction of some molecular classes but failed to predict the proper fragmentation pattern of macrocyclic molecules correctly. Integration of the in silico library in a metabolomics workflow allowed to de-replicate the ureylene-containing peptides chymostatinol A and GE-20372 A/B in extracts of Planomonospora-strains. We conclude that despite its drawbacks, CFM-ID can be a valuable tool to provide spectral databases for automated dereplication. However, further advances are necessary to increase prediction confidence.

40 Chapter 2

Fig. 6. Automated spectral matching by MZmine 2 against the ABL-ISDB annotated the metabolite chymostatinol A in Planomonospora. See chapter 3 for details.

Fig. 7. Automated spectral matching by MZmine 2 against the ABL-ISDB annotated the metabolite GE20372 in Planomonospora. See chapter 3 for details.

41 Chapter 2

2.5 Acknowledgements

The author is grateful to Samo Lešnik and Janez Konc for provision of computation time.

2.6 Experimental Section

General Experimental Procedures. LR-ESIMS data were acquired using a Dionex UltiMate 3000 HPLC system (Thermo Scientific) coupled to an LCQ Fleet (Thermo Scientific) mass spectrometer. Separation was achieved on - an Atlantis T3 C18 column (5 µm, 4.6 mm x 50 mm) at a flow rate of 0.8 mL x min 1 and maintained at 40 °C, as described elsewhere. [27] HR-ESIMS spectra were recorded using a Dionex UltiMate 3000 HPLC system (Thermo Scientific) coupled to a micrOTOF III (Bruker) mass spectrometer. Separation was achieved on an -1 Atlantis T3 C18 column (5 µm, 4.6 mm x 50 mm) at a flow rate of 0.3 mL x min and maintained at 25 °C, as described elsewhere. [28] Generation of the In Silico Library. The ABL-ISDB was generated following the protocol established by Allard et al., [14] with slight modifications. From the in-house ABL database, SMILES-strings were extracted, yielding 17 604 entries. A subset of actinomycetes-associated compounds was prepared, containing 13 300 entries. The cfm-predict module of the in silico fragmentation tool CFM-ID v.1.10 (available at sourceforge.net/projects/cfm-id/ as of 01/2018) was used with the default pre-trained Combined-Energy model. The probability threshold was set to 0.001, with no post-processing applied. Spectra of protonated species at low, medium and high energy (corresponding to ESI-MS2 at 10, 20, 40 eV) were calculated in command line batch mode. The generated in silico fragmentation spectra were converted to the MZmine 2 readable .msp (NIST MSP) format by an ad hoc written Python script. Spectral Matching in MZmine 2. LC-MS data was processed as described elsewhere. [28] Spectral matching was performed with MZmine 2.51, using the module Local Spectra Database Search with following settings: MS level, 2; precursor tolerance, 0.02 m/z; minimum ion intensity, 1E1; crop spectra to m/z overlap, yes; spectral m/z tolerance, 0.02 m/z; minimum matched signals, 5; similarity, weighted dot-product cosine; weights, Mass Bank; minimum cos similarity, 0.6. LC-MS-Analysis of Standards. Standard substances were acquired from regular commercial sources or from Naicons collections. For LR-ESIMS2 analyses, standards were dissolved in 10% DMSO/water at a concentration of 1 mg x mL-1. LC-MS analyses of standards were performed on a Dionex UltiMate 3000 (Thermo Scientific) coupled with an LCQ Fleet (Thermo scientific) ion trap mass spectrometer, as described in the General Experimental Procedures. Fragmentation spectra of analyzed standards were selected in ThermoFisher Xcalibur and extracted using the export function of the program. The resulting spectra in the .raw format were converted to .mgf files using the MSconvert

42 Chapter 2 module of the ProteoWizard suite. [29] Files in the .mgf-format were visualized using the spectrum_utils Python package. [30] Only the 30 most intense peaks per spectrum were shown.

2.7 References

1 Wolfender, J.-L.; Nuzillard, J.-M.; Van Der Hooft, J. J. J.; Renault, J.-H.; Bertrand, S. Accelerating Metabolite Identification in Natural Product Research: Toward an Ideal Combination of Liquid Chromatography-High- Resolution Tandem Mass Spectrometry and NMR Profiling, in Silico Databases, and Chemometrics. Anal. Chem. 2019, 91 (1), 704–742. 2 Harvey, A. L.; Edrada-Ebel, R.; Quinn, R. J. The Re-Emergence of Natural Products for Drug Discovery in the Genomics Era. Nat. Rev. Drug Discov. 2015, 14 (2), 111–129. 3 Hubert, J.; Nuzillard, J.-M.; Renault, J.-H. Dereplication Strategies in Natural Product Research: How Many Tools and Methodologies Behind the Same Concept? Phytochem. Rev. 2017, 16 (1), 55–95. 4 Nielsen, K. F.; Larsen, T. O. The Importance of Mass Spectrometric Dereplication in Fungal Secondary Metabolite Analysis. Front. Microbiol. 2015, 6, 71. 5 Blaženović, I.; Kind, T.; Ji, J.; Fiehn, O. Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics. Metabolites 2018, 8 (2), 31. 6 Hufsky, F.; Scheubert, K.; Böcker, S. New Kids on the Block: Novel Informatics Methods for Natural Product Discovery. Nat. Prod. Rep. 2014, 31 (6), 807–817. 7 Tsipouras, A.; Ondeyka, J.; Dufresne, C.; Lee, S.; Salituro, G.; Tsou, N.; Goetz, M.; Singh, S. B.; Kearsley, S. K. Using Similarity Searches over Databases of Estimated 13C NMR Spectra for Structure Identification of Natural Product Compounds. Anal. Chim. Acta 1995, 316 (2), 161–171. 8 Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI: FingerID. Proc. Natl. Acad. Sci. U.S.A. 2015, 112 (41), 12580–12585. 9 Ruttkies, C.; Schymanski, E. L.; Wolf, S.; Hollender, J.; Neumann, S. MetFrag Relaunched: Incorporating Strategies Beyond In Silico Fragmentation. J. Cheminformatics 2016, 8 (1), 3. 10 Wolf, S.; Schmidt, S.; Müller-Hannemann, M.; Neumann, S. In Silico Fragmentation for Computer Assisted Identification of Metabolite Mass Spectra. BMC Bioinf. 2010, 11 (1), 148. 11 Ridder, L.; Van Der Hooft, J. J. J.; Verhoeven, S.; Vos, R. C. H. de; Bino, R. J.; Vervoort, J. Automatic Chemical Structure Annotation of an LC–MSn Based Metabolic Profile from Green Tea. Anal. Chem. 2013, 85 (12), 6033– 6040. 12 Allen, F.; Greiner, R.; Wishart, D. Competitive Fragmentation Modeling of ESI- MS/MS Spectra for Putative Metabolite Identification. Metabolomics 2015, 11 (1), 98–110.

43 Chapter 2

13 Allen, F.; Pon, A.; Wilson, M.; Greiner, R.; Wishart, D. CFM-ID: A Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra. Nucleic Acids Res. 2014, 42 (W1), W94–W99. 14 Allard, P.-M.; Péresse, T.; Bisson, J.; Gindro, K.; Marcourt, L.; Pham, V. C.; Roussi, F.; Litaudon, M.; Wolfender, J.-L. Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. Anal. Chem. 2016, 88 (6), 3317–3323. 15 Böcker, S. Searching Molecular Structure Databases Using Tandem MS Data: Are We There Yet? Curr. Opin. Chem. Biol. 2017, 36, 1–6. 16 Kind, T.; Tsugawa, H.; Cajka, T.; Ma, Y.; Lai, Z.; Mehta, S. S.; Wohlgemuth, G.; Barupal, D. K.; Showalter, M. R.; Arita, M.; Fiehn, O. Identification of Small Molecules Using Accurate Mass MS/MS Search. Mass Spectrom. Rev. 2018, 37 (4), 513–532. 17 Lazzarini, A.; Cavaletti, L.; Toppo, G.; Marinelli, F. Rare Genera of Actinomycetes as Potential Producers of New Antibiotics. Antonie van Leeuwenhoek 2000, 78, 399–405. 18 Feistner, G. J.; Stahl, D. C.; Gabrik, A. H. Proferrioxamine Siderophores of Erwinia Amylovora. A Capillary Liquid Chromatographic/Electrospray Tandem Mass Spectrometric Study. J. Mass Spectrom. 1993, 28 (3), 163– 175. 19 Hatsu, M.; Tudo, M.; Muraoka, Y.; Aoyagi, T.; Takeuchi, T. Benarthin: A New Inhibitor of Pyroglutamyl Peptidase. J. Antibiot. 1992, 45 (7), 1088–1095. 20 Matsuo, Y.; Kanoh, K.; Jang, J.-H.; Adachi, K.; Matsuda, S.; Miki, O.; Kato, T.; Shizuri, Y. Streptobactin, a Tricatechol-Type Siderophore from Marine- Derived Streptomyces Sp. YM5-799. J. Nat. Prod. 2011, 74 (11), 2371–2376. 21 Niessen, W. M. A.; Ricardo Correa, A. C. Interpretation of MS-MS Mass Spectra of Drugs and Pesticides, 2017. 22 Di Marco, A.; Cassinelli, G.; Arcamone, F. The Discovery of Daunorubicin. Cancer Treat. Rep. 1981, 65, 3–8. 23 Steenbergen, J. N.; Alder, J.;Thorne, G. M.; Tally, F. P. Daptomycin: a Lipopeptide Antibiotic For The Treatment Of Serious Gram-Positive Infections. J. Antimicrob. Chemother. 2005, 55 (3), 283-288 24 Anderson, B.; Hodgkin, D. C.; Viswamitra, M. A. The Structure of Thiostrepton. Nature 1970, 225 (5229), 233-235. 25 Maggi, N.; Pasqualucci, C. R.; Ballotta, R.; Sensi, P. Rifampicin: a New Orally Active Rifamycin. Chemotherapy 1966, 11(5), 285-292. 26 Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry- Based Molecular Profile Data. BMC Bioinformatics 2010, 11 (1), 395. 27 Iorio, M.; Tocchetti, A.; Cruz, J. C. S.; Del Gatto, G.; Brunati, C.; Maffioli, S.; Sosio, M.; Donadio, S. Novel Polyethers from Screening Actinoallomurus Spp. Antibiotics 2018, 7 (2), 47. 28 Zdouc, M. M.; Iorio, M.; Maffioli, S. I.; Crüsemann, M.; Donadio, S.; Sosio, M. Planomonospora: A Metabolomics Perspective on an Underexplored Actinobacteria Genus. J. Nat. Prod. 2021, 82 (2), 204-219.

44 Chapter 2

29 Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. ProteoWizard: Open Source Software for Rapid Proteomics Tools Development. Bioinformatics 2008, 24 (21), 2534–2536. 30 Bittremieux, W. Spectrum_utils: A Python Package for Mass Spectrometry Data Processing and Visualization. Anal. Chem. 2019, 92 (1), 659–661.

2.8 Supplemental Information

Fig. S1. Example entires from the Antibiotic Literature (ABL) database.

45 Chapter 2

Fig. S2. Comparison of the experimental MS2 fragmentation spectra of thiostrepton A (top) with the in silico-predicted one (bottom).

Fig. S3. Comparison of the experimental MS2 fragmentation spectra of rifampicin (top) with the in silico-predicted one (bottom).

46 Chapter 3

3

Planomonospora: a Metabolomics Perspective on an Underexplored Actinobacteria Genus

Mitja M. Zdouc and Marianna Iorio and Sonia I. Maffioli and Max Crüsemann and Stefano Donadio and Margherita Sosio

Parts of this chapter have been published in the Journal of Natural Products, 84 (2), 204-219 (2021).

47 Chapter 3

3.1 Abstract espite an excellent track record, microbial drug discovery suffers D from high rates of rediscovery. Better workflows for the rapid investigation of complex extracts are needed to increase throughput and to allow early prioritization of samples. In addition, systematic characterization of poorly explored strains is seldomly performed. Here, we report a metabolomic study of 72 isolates belonging to the rare actinomycete genus Planomonospora, using a workflow of commonly used open access tools to investigate its secondary metabolites. The results reveal a correlation of chemical diversity and strain phylogeny, with classes of metabolites exclusive to certain phylogroups. We were able to identify previously reported Planomonospora metabolites, including the ureylene-containing oligopeptide antipain, the thiopeptide siomycin including new congeners and the ribosomally synthesized peptides sphaericin and lantibiotic 97518. In addition, we found that Planomonospora strains can produce the siderophore desferrioxamine or a salinichelin-like peptide. Analysis of the genomes of three newly sequenced strains led to the detection of 59 gene cluster families, of which three were connected to products found by LC-MS2 profiling. These results demonstrate the value of metabolomic studies to investigate poorly explored taxa and provides a first picture of the biosynthetic capabilities of the genus Planomonospora.

48 Chapter 3

3.2 Introduction

Natural products are excellent sources for bioactive scaffolds. Over the last four decades, 66% of approved small-molecule drugs were actual natural products or at least inspired from such. [1] This is an impressive track record, considering the general withdrawal of industrial activity from the field. [2] One reason for this disinterest has been the frequent rediscovery of known molecules in activity-based screenings, especially in microbe-derived extracts. [3] However, the focus on bioactivity as selection criterion provides a biased perspective on a small portion of the chemical diversity microbes are capable to produce. Despite decades of research, the majority of secondary metabolites remain “metabolomic dark matter”, [4] with high probability of structural novelty [5] and novel bioactive scaffolds. [6] Streamlined approaches for strain prioritization and workflow optimization are needed to render drug discovery from microbial sources a cost- effective endeavor. [5] Recent advances in genome mining have enabled researchers to investigate the biosynthetic potential of bacteria in silico, with minimal wet lab work. [7] Tools such as antiSMASH [8] mine genomes for BGCs, while repositories such as MIBiG [9] aid in the evaluation of BGC novelty. In addition, advances in (tandem) mass spectrometry and the introduction of molecular networking, [10] the MS2 based grouping of molecules by structural relatedness, has made untargeted metabolomics broadly available, [11] while public databases such as GNPS [12] and the Natural Product Atlas [13] facilitate metabolite annotation. These methods allow researchers to rationalize resources and quickly prioritize strains or metabolites for further investigations. Earlier studies on bacterial genera, such as the actinobacteria Salinispora [14-16] and Nocardia, [17] the myxobacterium Myxococcus [18] and the gamma-proteobacterium Pseudoalteromonas [19] have demonstrated distinct chemical profiles and shown correlations between taxonomic and metabolomic diversity. Our group is particularly interested in exploring the metabolic capabilities of rare genera of actinomycetes present within the Naicons collection, which comprises approximately 45 000 actinomycete strains of diverse origin, isolated

49 Chapter 3 between 1960 and 2005. [20] One such genus is Planomonospora. Originally described by researchers from Lepetit (the predecessor company of Naicons) in 1967, [21] six species, two subspecies and four unclassified strains can be found in public collections or databases. Few molecules have been described as produced by this genus: the thiopeptides thiostrepton [22] and siomycin, also known as sporangiomycin; [23,24] lantibiotic 97518, also known as planosporicin, [25,26] a member of a family of class I lantipeptides produced by many actinobacterial genera; [27] the lassopeptide sphaericin; [28] and the ureylene- containing oligopeptide antipain, [29] which is also produced by Streptomyces. [30] Here, the capability of Planomonospora strains to produce secondary metabolites is accessed, using a pipeline of freely available tools for metabolome and genome mining for the prioritization of promising strains and/or metabolites for further investigation (see Figure 1). Similar workflows have been reported by other groups. [11,31-36] The investigation was carried out on 72 strains from the Naicons collection and was complemented by genomic analyses of selected strains. This study gives unprecedented insight into the rare genus Planomonospora, correlating metabolites to their putative BGCs and making way for targeted isolation efforts.

Fig. 1. Visualization of workflow: Strains are cultivated, extracts are prepared and analyzed. (A) Data acquisition on an LC-ESI-HR-MS2-instrument. Data are preprocessed with the MZmine 2 software, yielding a list of features, which are then analyzed by GNPS feature-based molecular networking, with consecutive MS2LDA curation. (B) Taxonomy of the strains is established by 16S rRNA- sequencing. Strains/features are prioritized for consecutive targeted isolation.

50 Chapter 3

3.3 Results & Discussion

Determination of Phylogenetic Affiliation. From the approximately 350 Planomonospora entries listed in the Naicons collection, 72 strains confirmed by 16S rRNA gene sequencing as belonging to the genus Planomonospora were selected. The majority were isolated from soil originating from central Africa and the Mediterranean region. The 72 strains yielded 35 unique 16S rRNA sequences, 31 of which had not been previously reported (see Figures S1 and S2). The resulting phylogenetic tree (see Figure 2) was found to be in agreement with previous studies [37,38] and showed three phylogroups with a relevant number of representatives: Phylogroup C includes 13 strains (9 of them from Naicons collection) and 9 distinct 16S rRNA sequences; phylogroup A2 includes 12 strains (11 of them from Naicons collection) and 8 distinct 16S rRNA sequences. The most populated phylogroup S includes 46 strains (43 of them from Naicons collection) and 13 distinct 16S rRNA sequences. In addition, the phylogenetic analysis yielded three poorly represented phylogroups: V1, which includes Planomonospora venezuelensis JCM3167 and Naicons strain ID43178, with identical 16S rRNA sequences; the somehow related V2 group, with six distinct Naicons isolates with identical sequences; and A1, with just two Naicons strains with identical sequences. All phylogroups contained sequences of previously described Planomonospora species, except V2 and A1. Given the extent of sequence distance from validly described species (see Figures S1 and S2), many of the Naicons strains likely represent new species within this genus. In the following analyses, we consider V1 as a phylogroup, even though it contains only one strain. Cultivation, Extraction and Molecular Network Analysis. To find appropriate cultivation conditions for the Planomonospora strains, the behavior of a selected number of isolates under a variety of different conditions, including three solid and six liquid media, was investigated. Four liquid media (AF, R3, MC1 and AF2, see Experimental Procedures) afforded the highest metabolic diversity (data not shown) and were used in the following analyses.

51 Chapter 3

Fig. 2. 16S rRNA-based phylogenetic tree of Planomonospora (P.) strains. Naicons strains are represented by their ID numbers in boldface. Type strains and two unclassified strains with complete 16S sequences are indicated in italics. Naicons strains with identical sequences are represented by a single ID number, with the number of additional strains in parentheses (details in Figures S1 and S2). Bootstrap values (1000 times resampled) higher than 60% are indicated in bold type. Planobispora rosea ATCC53733 was used as outgroup.

52 Chapter 3

Cultivation of the 72 strains in the four media and preparation of two extracts per culture yielded 576 samples. To expedite analysis, the two extracts from each culture were combined, yielding 288 samples, two of which were removed due to cross-contamination. The remaining 286 samples were analyzed by LC-ESI-HR-MS2 in data-dependent acquisition mode. The LC-MS2 data were subjected to a workflow consisting of several steps (see Figure 1): Files were (i) preprocessed with the feature finding tool MZmine 2 to correct for m/z and retention time drift, to differentiate between structural isomers that had been separated by chromatography and to reduce redundancy of data by merging duplicates; [31,39] (ii) analyzed using the feature-based molecular networking workflow of GNPS; [12,40] and (iii) visualized using the program Cytoscape [41] (see Figure S3). This resulted in a feature-based molecular network, a visual, topological representation of the chemistry detected by mass spectrometry. In such a network, the features (each with corresponding m/z, retention time and MS2-spectrum) detected during the preceding feature finding step are organized into subnetworks, also called molecular families or clusters, based on the similarity of their associated MS2 spectra (see Figure S3 for more details). This is based on the observation that similar molecules generally show similar MS2 fragmentation. [10] Features can also remain singletons, if they have sufficiently unique MS2 spectra not to cluster with any other feature. Sample metadata, such as producing strain, cultivation medium and phylogroup affiliation, can be mapped onto the molecular network, which supports intuitive assessment and helps in data organization. In the resulting molecular network, media components, background impurities from the extraction process and features with an m/z less or equal to 300 were removed beforehand. The last step excluded approximately 650 features which are likely to include metabolites that would be hard to dereplicate due to limited MS2 information and lack of an appropriate database for the genus. Indeed, a preliminary analysis utilizing the GNPS libraries resulted in zero dereplications, with 90% of features being singletons (data not shown). Despite

53 Chapter 3 the opportunity to discover novel metabolites in this mass range, we chose to focus on higher molecular weight molecules. Of the remaining 1492 features in the molecular network in Figure 3, 447 (30%) were organized in 60 different clusters, while 1045 features remained singletons. The number of features is not equivalent to the number of metabolites, because the same metabolite can be detected as different adducts (and thus features) by ESI mass spectrometry (e.g. [M+H]+, [M+2H]2+, [M+2H+Fe]+, [M+Na]+). A recent study on myxobacteria demonstrated a strong correlation between taxonomic and secondary metabolite diversity, i.e., metabolite profiles showed high taxonomic specificity. [42] This raised the question whether this applied to Planomonospora. Only 1% of features were detected in members of all phylogroups, as shown by the Venn diagram in Figure 4A. The vast majority (74%) of the 1492 features were phylogroup-specific, meaning that they were not detected in samples derived from strains of a different phylogroup. The number of specific features was especially high for phylogroup C; 31% of all detected features were exclusive to its 9 members. This is consistent with the phylogenetic tree of Figure 2, which indicates that phylogroup C is more divergent from the other well-represented phylogroups A and S. In contrast, the number of phylogroup-specific features was relatively low in groups A1 and A2, suggesting that the separation into two phylogroups might be an artifact due to the existence of only one 16S rRNA sequence in phylogroup A1. Overall, the results suggest that secondary metabolite production in Planomonospora ssp. is a phylogroup- defining trait. Each bar in Figure 4B represents a single strain, with the number of detected features indicated on the y-axis. This number varies greatly among strains, with some talented strains standing out in terms of both total features and strain-specific features. Phylogroup C, and, to a lesser extent, V2, were enriched in such strains. This is particularly relevant for phylogroup V2, for which all six strains shared an identical 16S rRNA sequence. In total, 36% of features were

54 Chapter 3 strain-specific, indicating Planomonospora secondary metabolites tend to be strain-specific.

Fig. 3. Complete molecular network of 286 Planomonospora extracts, encompassing 1492 features (=nodes). Features (447) were organized in 60 clusters. Node size correlates to the number of contributing strains, while the colors give the contributing phylogroup(s).

55 Chapter 3

Fig. 4. Distribution of 1492 features according to (A) phylogroups and (B) strains. In panel A, overlaps amounting to less than 1% are not labeled, while features detected in phylogroup V1 were omitted from the analysis. In panel B, each bar represents a different strain. Bars are separated into strain-specific (red), phylogroup-specific (yellow, detected in at least one additional strain from the same phylogroup) and shared features (blue, detected in at least one additional strain from a different phylogroup).

56 Chapter 3

Because all 72 strains were cultivated in the same four media, feature distributions could be evaluated (see Figure 5). Overall, 57% of features were specific for a single medium. Media MC1 and R3, with 26 and 18% of exclusive features, respectively, were the biggest contributors. Furthermore, these two media covered 85% of all detected features. In contrast, just 9% of features were found in all media. Similar observations were made in a study of 26 marine Streptomyces strains, in which 71% of detected ions were medium-specific and just 7% common to all three conditions. [14] Even though complete metabolite coverage remains elusive, our results suggest that two media should cover most of the metabolites produced by different Planomonospora strains belonging to different phylogroups.

Fig. 5. Visualization of the distribution of features in extracts with regard to cultivation medium. Media are MC1, AF2, R3 and AF. Overlaps amounting to less than 1% are not labeled.

Metabolite Annotation. The metabolites present in the Planomonospora metabolome were examined by first identifying known metabolites by dereplication. [43,44] In addition to providing insights into the biosynthetic potential of this poorly studied genus, establishing the known metabolites can highlight features likely to be associated with novel chemistry, thus evading the pitfalls of reinvestigating reported compounds. As described below, representatives of 11 clusters were dereplicated, some of which are visualized in Figure 6. For

57 Chapter 3 annotation, both spectral matching (comparison with identified spectra in curated databases, such as GNPS) and literature search were used. To increase confidence in the annotations, Chemical Analysis Working Group (CAWG) criteria [45] were applied, leading to a number of class 1 (comparison against an authentic standard) and class 2 (putatively identified molecule) annotations (for a full list, see Figure S4). Metabolites previously reported from Planomonospora were investigated. The lantibiotic 97518/planosporicin was identified in an extract from strain ID50037, after observing a signal with m/z 1096.89, which matched the [M+2H]2+ ion of the compound. Comparison of the MS2 fragmentation spectrum to the one reported in the literature supported this assumption (see Figures S4 and S31). [25] The lassopeptide sphaericin, identified by the signal m/z 2156.1, corresponding to the [M+H]+ ion, was found in the extracts of six strains. [28] The MS2 fragmentation spectrum of the compound matched the one reported in the literature (see Figures S4 and S32). Further, the thiopeptide siomycin A ([M+H]+ 1648.46 m/z), along with its congeners, siomycin B ([M+H]+ 1510.42 m/z), siomycin C ([M+2H]2+ 832.73 m/z) and siomycin D1 ([M+2H]2+ 817.73 m/z) was detected in extracts of up to 26 strains and are described in detail below. Finally, a signal with m/z 303.18, corresponding to the [M+2H]2+ ion of ureylene-containing oligopeptide antipain, was detected in extracts of 11 strains and annotated by comparison to an authentic standard (see Figures S4 and S20). Several more signals belonging to ureylene-containing oligopeptides were identified: the antipain-like molecule KF 77AG6 ([M+H]+ 366.18 m/z, Figures S4 and S29) as well as chymostatin A/C ([M+H]+ 608.31 m/z, Figures S4 and S24) along with its congeners chymostatin B ([M+H]+ 594.30 m/z, Figures S4 and S25), chymostatinol A ([M+H]+ 596.32 m/z, Figures S4 and S26), chymostatinol B ([M+H]+ 610.33 m/z, Figures S4 and S27) and GE-20372 A/B ([M+H]+ 612.31 m/z, Figures S4 and S28). Except for thiostrepton, all previously reported Planomonospora metabolites were identified in the dataset. In addition, several members of the desferrioxamine family, an iron chelating siderophore commonly produced by Streptomyces, were detected. [46] The signal corresponding to

58 Chapter 3 desferrioxamine B [M+H]+ ion (561.36 m/z), detected in extracts of 8 strains, was annotated by comparison to a commercial standard (see Figures S4 and S30). Further, signals matching acyl-desferrioxamine C13, C15 and C16 ([M+H]+ 729.54 m/z, [M+H]+ 757.58 m/z and [M+H]+ 771.59 m/z, Figures S4 and S21-23, respectively) were identified by literature search. Some other putative siderophores were identified by spectral matching against the GNPS spectral library (see Figure S4): Desferrioxamine E ([M+H]+ 601.36 m/z), as well as partially described metabolites deposited as amphiphilic ferrioxamine 7 ([M+H]+ 768.44 m/z), Bisu-05 ([M+H]+ 345.35 m/z) and Desf-05 ([M+H]+ 575.37 m/z). Several of the de-replicated features showed strong phylogroup- specificity: the desferrioxamines were almost exclusively detected in samples from strains of phylogroup C (see Figure 6A), while siomycins and sphaericin were only detected in extracts derived from the S phylogroup strains (Figure 6C). Other metabolites were less phylogroup-specific: chymostatinol A was produced by 31 strains from phylogroups C, S, V2 and A2, as were antipain (11 strains, phylogroups S, V2 and A2) or GE-20372 A/B (17 strains, phylogroups C, S, V2 and A2). Other identified ureylene-containing oligopeptides showed a similar broad distribution. In total, 28 metabolites were annotated as CAWG classes 1 or 2 (see Figure S4). A summary of the presence/absence of all de-replicated features can be seen in Figure S5. Many more features in the molecular network are neighbors to annotated ones (and thus, structurally related). While a systematic investigation of all features would exceed the scope of this study, examples of special relevance will be discussed below. To get a better picture of the biosynthetic capacities of Planomonospora and to further explore the annotated metabolites, genomic analysis was used.

59 Chapter 3

Fig. 6. Visualization of selected annotated clusters, with features annotated manually or by GNPS spectral library search (e.g. amphiphilic ferrioxamine 7). (A) Desferrioxamines (DFO) are mostly occurring in strains from phylogroup C. (B) Chymostatin-like metabolites show no phylogroup-specificity. (C) Siomycin and sphaericin are exclusive to phylogroup S.

Genome Analysis: Public databases report two Planomonospora genome sequences: Planomonospora sphaerica JCM9374 [47] and Planomonospora venezuelensis CECT3303, the latter not formally published yet. Three representative strains for full genome sequencing were selected: strain ID67723, as a representative of the divergent phylogroup V2 and for its capability to produce chymostatin; strain ID82291 as the representative of a subgroup of strains in phylogroup A2 that produced the biarylitides, as reported in detail

60 Chapter 3 elsewhere; [48] and strain ID91781, with its 16S rRNA sequence identical to that of P. sphaerica JCM9374 and a producer of the thiopeptide siomycin. Genomes were sequenced with both Illumina HiSeq and PacBio technologies to allow hybrid assembly, providing good quality sequences with a substantially lower number of contigs than that of the reference genomes of P. sphaerica JCM9374, P. venezuelensis CECT3303 and Planobispora rosea ATCC53733. [49] The three genomes were similar to that of P. sphaerica JCM9374, P. venezuelensis CECT330 and to each other in terms of GC content (from 71.5 to 72.8%, see Figure 7B). Interestingly, the genome of strain ID82291 was about 9% smaller than the other Planomonospora genomes and thus harbored a smaller number of predicted genes. antiSMASH analysis identified between 23 and 28 biosynthetic gene clusters (BGCs) in the three genomes (see Figure 7A). A multilocus sequence analysis of the five strains of Figure 7A, along with other publicly available genome sequences of members of the Streptosporangiaceae, was performed with the web-based program autoMLST. [50] Based on a concatenated alignment of 89 identified housekeeping genes (see Figure S38), a tree was constructed that was consistent with the one of Figure 2, except that strain ID91781 is now clearly distinct from Planomonospora sphaerica JCM9374 (see Figure 7B). Oddly, strain ID67723 and Planomonospora venezuelensis CECT3303 show closer relationship to Planobispora rosea ATCC53733 than to the other Planomonospora strains, in contrast to the results from the 16S rRNA-based phylogeny (see Figure 2). Therefore, we calculated the average nucleotide identity (ANI) between the six genomes under investigation (see Figure 7C). [51,52] An ANI of 95–96% is generally considered as species boundary cut-off. [53,54] Only strain ID91781 and Planomonospora sphaerica JCM9374 showed a high enough ANI to fall in this category. Strain ID67723, Planomonospora venezuelensis CECT3303 and Planobispora rosea ATCC53733 appear to be slightly more similar to each other than to the other Planomonospora. While exceeding the scope of this study, these results warrant further studies in the taxonomy of Planomonospora and Planobispora ssp.

61 Chapter 3

Fig. 7. Overview of sequenced genomes, with comparison to reported ones. (A) Summary of metadata of newly sequenced genomes (bold type, P=Planomonospora). (B) Segment of autoMLST-generated phylogenetic tree. (C) Pairwise comparison of average nucleotide identity between the genomes, using the tool OrthoANIu.

62 Chapter 3

To investigate the similarity among the Planomonospora BGCs, the output of antiSMASH v5.0.0 [8] was processed with the program BiG- SCAPE/CORASON v1.0. Based on Pfam composition, this tool calculates the similarity between BGCs and clusters related ones into gene cluster families (GCFs). BGCs that show low similarity to any other BGC are displayed as singletons. Therefore, BiG-SCAPE/CORASON allows for the quick assessment of phylogenetic relationships between BGCs. It further enables automatic annotation of BGCs using the MIBiG repository of experimentally established BGCs. [55] The 155 BGCs from the genomes in Figure 7A could be grouped into 59 GCFs, of which 25 were singletons, as illustrated in Figure 8. Interestingly, the analysis demonstrated the existence of seven GCFs that are common among the five Planomonospora strains as well as Planobispora rosea ATCC53733. An additional GCF is present in all strains, except for ID82291, the strain with a slightly reduced genome, and one more in all strains except Planomonospora venezuelensis CECT3303. One GCF is present in the five Planomonospora genomes but not in Planobispora rosea. Only two of the seven core GCFs are highly related to experimentally established BGCs, namely those for the lantipeptide catenulipeptin and for the polyketide alkylresorcinol. Some of these GCFs are highly conserved in other genomes of members of the Streptosporangiaceae (Figure 8). In addition to the core GCFs, strain ID67723, Planomonospora venezuelensis CECT3303 and Planobispora rosea ATCC53733 share one more GCF. Strain ID67723 and Planobispora rosea ATCC53733 share further six GFCs. This overlap includes an erythrochelin-like BGC, as discussed later. Further, strain ID91781 and Planomonospora sphaerica JCM9374 are remarkably similar in terms of GCFs: ID91781 lacks the sphaericin BGC and another cluster of unknown function, present in Planomonospora sphaerica JCM9374, but instead contains a type I PKS BGC. When compared to the MIBiG-repository, only 17 (31%) of all GCFs match an experimentally determined BGC. Investigation with the program

63 Chapter 3

ClusterBlast showed related BGCs in non-Planomonospora genomes. Similarities were low in most cases, except for some of the core GCFs (see Figure 8). In summary, the three analyzed genomes contain a considerable number of BGCs, with a core of BGCs present in several representatives of the Streptosporangiaceae family. In terms of BGCs, strain ID67723 appears to be very similar to Planobispora rosea ATCC53733, even though their ANI is relatively low. However, a larger number of genomes is needed to better understand the distribution of GCFs in Planomonospora, as demonstrated in studies on Planctomycetes [56] or Salinispora. [15,16] Many of the BGCs in Planomonospora remain unannotated and may encode for novel metabolites. Aiming for a better understanding of the biosynthetic capabilities, we connected genomic and metabolomic data creating a “paired-omics” dataset, as illustrated below. Paired Omics. Siomycin, first reported from Planomonospora in 1968 as sporangiomycin, [24] is a thiostrepton-like thiopeptide with an established biosynthetic route. [57] While the siomycin BGC was detected in strain ID91781 (RiPP4 in Figure 8), during metabolite annotation, it became evident that the features matching siomycin A and congeners B, C and D1 were not clustered in a single cluster in the molecular network, as expected for structurally related molecules. Instead, they were mostly singletons, with MS2 spectra sufficiently different not to be clustered by the networking algorithm. However, inspection of the corresponding MS2 spectra revealed that siomycin A and congeners shared low molecular weight fragments, presumably corresponding to the quinalidic acid (QA) moiety of class b thiopeptides (see Figure S6). Hence, the program MS2LDA [58a-b] was used to mine for QA-related motifs in the MS2 fragmentation spectra of features. Apart from the known siomycins, the program detected an additional 9 siomycin-like thiopeptides (see Figure S7). One, which we named siomycin E, is hypothesized to correspond to siomycin B with an additional dehydroalanine (Dha)-residue at its C-terminal end, instead of two, as in siomycin A. MS2- fragmentation spectra of the [M+H]+ ions showed a particular fragment corresponding to a break between Ala2- and Dha3 as well as Thr12 and QA, with an m/z-value diagnostic for each congener (see Figures 9 and S8).

64 Chapter 3

Fig. 8. Distribution of biosynthetic gene clusters (BGCs) Planomonospora and Planobispora strains. Newly sequenced strains are in bold. BGCs with a similarity of more than 40% to a MIBiG-deposited gene cluster were annotated. For clusters without a MIBiG- annotation, the next most similar BGC found by ClusterBlast is given. Eleven Planobispora rosea singleton BGCs were omitted, as were one BGC of each strain ID67723, P. sphaerica and strain ID91781, fragmented due to their position on contig edges.

65 Chapter 3

Literature provides a precedent for a thiopeptide with a similar intermediate molecule: Thiopeptin A3a, A4a and Ba have zero, one and two Dha residues, respectively, at the C-terminal end of the molecule. [59] The precursor peptide for thiopeptin, TpnA, indeed shows two serine moieties at the C-terminal end. [60] Consistently, the precursor peptide encoded by the BGC RiPP4 contains two additional serine residues at the C-terminus (see Figure S9). Promiscuous processing of the C-terminal end of precursor peptides has been observed in several thiopeptides and can be also assumed here. [57,59,61,62] For some of the other putative thiopeptides, the differences in exact mass with regard to siomycin A point towards the presence of one or two N-acetylcysteinyl-moieties with additional modifications, such as deacetylation or hydroxylation (see Figure S7). Again, literature provides precedence for likewise modified antibiotics from Streptomyces sp.: N-Acetylcysteine derivates have been isolated for the macrolide piceamycin [63] and the phenazine SB 212021. [64] These results show how the use of additional tools such as MS2LDA can overcome the limitations of any one tool and help to organize data, identify analogues and support annotation (for a full list of annotated Mass2Motifs, see Figure S10).

Fig. 9. Identification and annotation of siomycin congeners. (A) Shows the putative MS2 fragmentation pathway of siomycins, leading to the (B) diagnostic fragments. A (blue) has two Dha-moieties at its C-terminal end, while siomycin B (black) has none. Siomycin E (green) is hypothesized to be an intermediate congener with one Dha-group at its C-terminal end.

66 Chapter 3

As mentioned above, several features were found to match desferrioxamines, iron-chelating siderophores involved in iron-uptake in bacteria, [65,66] mostly in samples derived from strains of phylogroup C, raising the question of other Planomonospora strains producing distinctive iron-chelating molecules. In a study on 118 Salinispora genomes, Bruns et al. reported a mutually exclusive presence of either the des or slc BGC, responsible for the production of desferrioxamine or the structurally unrelated salinichelins, respectively. [67] In order to rapidly identify iron-binding metabolites, we added

2 FeCl3 to the Planomonospora extracts and reanalyzed them by LC-MS , looking for mass shifts from the disappearance of the iron-free form and stabilization of the iron-bound molecule, with associated Fe-characteristic isotopic pattern. Apart from the already identified desferrioxamines, 18 features of four clusters were also affected by the addition of iron (see Figures 10A, S12 and S14-S19). Calculation of molecular formulas and inspection of MS2-fragmentation spectra indicated relatedness between these four clusters, but not to desferrioxamine E (see Figures S12 and S13). These four clusters were exclusive to seven strains belonging to phylogroups V2 and S. At the same time, no desferrioxamines could be detected in extracts from these strains. Analysis with BiG-SCAPE/CORASON suggested a candidate BGC (NRPS17 in Figure 8) in one of the producer strains, ID67723. This cluster showed high similarity to the experimentally validated BGC for erythrochelin, a siderophore produced by Saccharopolyspora erythraea, as well as to a BGC with unknown function from Planobispora rosea ATCC53733, as indicated in Figure 10B. Consistently, extracts from both strain ID67723 and Planobispora rosea contained a feature with m/z 631.3408 [M+H]+ with the calculated molecular formula C26H46N8O10 (see Figure 10A). Characterization of this metabolite confirmed a salinichelin-like structure, with a lysine instead of an arginine in position two (manuscript in preparation). For strain ID67723, 11 additional iron-shifted features were identified. The similarity of their MS2 fragmentation spectra to the features with m/z 631.3408 [M+H]+ suggests that they were also synthesized by BGC NRPS17. While many were hypothesized to

67 Chapter 3 be congeners differing in one or more methylene groups, feature ID314 (m/z 1067.5659) appears to be a glycosylated version of feature ID289 (m/z 905.5136).

Fig. 10. Investigation of some iron-binding metabolites: (A) Features in several clusters, such as m/z 631.3 or 843.4, showed iron complexion upon treatment with FeCl3. LCMS-analysis of Fe-treated samples shows disappearance of the unbound form (black trace) and appearance of the Fe-bound form (red trace; for original traces, see Figure S11). (B) Comparison of the erythrochelin BGC with GCF NRPS17, shared by Planomonospora strain ID67723 and Planobispora rosea ATCC53733.

Also for strain ID67723, a BGC similar to the experimentally validated BGC for deimino-antipain [68] was detected (see NRPS24 in Figure 8 and Figure 11A). The compound antipain and several related ureylene-containing peptides, such as chymostatin A, were detected in the extracts from strain ID67723. The

68 Chapter 3 genomes of the other sequenced Planomonospora strains lacked this BGC and indeed, antipain and related compounds could not be detected in their extracts. In the molecular network, the ureylene-containing compounds clustered together, as expected for structurally similar molecules (see Figure 6B). In addition to the annotated compounds, further 32 related but unknown features were found for strain ID67723 (see Figure 11B). We hypothesize that all these features originate from NRPS24, because for the deimino-antipain BGC, promiscuity in amino acid incorporation has been reported before. [68] Related studies have shown that some single BGCs are capable to produce a variety of different molecules. [69,70] While a detailed characterization of these molecules exceeds the scope of this study, the abundance of products resulting from a single BGC is worth noting.

Fig. 11. (A) Comparison between the BGCs of deimino-antipain and NRPS24 from strain ID67723, performed with BiG-SCAPE/CORASON (cutoff-value: 1). (B) Cluster containing antipain-related ureylene-containing compounds. Features detected for strain ID67723, likely to result from NRPS24, are indicated in red.

69 Chapter 3

Prioritization. Despite thorough dereplication of the metabolome of Planomonospora, the majority of features remained unannotated, constituting an interesting source for the discovery of novel specialized metabolites. We prioritized candidates for isolation based on their apparent rarity, to circumvent the frequent pitfall of metabolite rediscovery. First, it was assumed that structural novelty would be an uncommon attribute even on a genus level, which led us to exclude features that were present in more than one phylogroup or detected in extracts of more than five strains of one phylogroup. A similar phylogeny-based strategy was adopted in a recent study by Olivon et al. [36] We further estimated the rarity of the candidate features by matching them against an in-house MS2 spectral library, containing MS2 spectra of approximately 10,000 extracts of actinomycetes. Only features with no matches were considered further. Eventually, we focused on a cluster of four features, produced by four strains of phylogroup A2. While two features were not investigated further due to their low intensity, the other two features, one with m/z 522.198 [M+H]+, the other with an m/z 506.203 [M+H]+ showed UV absorption maxima at 270 and 315 nm, the latter indicating an unusual chromophore. Indeed, isolation of the compound with m/z 522.198 yielded a novel acetylated tripeptide as described in chapter 4 and elsewhere. [48]

3.4 Conclusion

The work presented here provided unprecedented insight into the poorly explored genus Planomonospora. Among the 72 investigated strains, several were found to have yet unreported 16S rRNA sequences, phylogenetically distinct from previously known species. Using feature-based molecular networking, the majority of previously reported Planomonospora metabolites were found, demonstrating phylogroup-specificity for many of them. It was shown that Planomonospora can produce desferrioxamines as iron-chelating molecules, or, alternatively, members of an unknown siderophore family. With the help of different tools to organize MS2 data, we described a new congener of the siomycin

70 Chapter 3 family, siomycin E, and detected several other siomycin-like molecules that await further characterization. Furthermore, the recently described biarylitides, cyclic tripeptides with an unusual carbon-carbon biaryl bond, were prioritized from the features detected in the present study and are described in detail elsewhere. [48] Still, the majority of features detected in the Planomonospora extracts remain unknown. Also, only 3 of the detected BGC could be linked to known products, which indicates great potential for the discovery of new specialized molecules. We conclude that investigations in this interesting genus have only just begun.

3.5 Author Contributions & Notes

M.M.Z., M.S. and S.D. designed work, analyzed data and wrote the paper; M.M.Z. performed all experimental work, including high resolution mass spectrometry analysis, and identified metabolites; M.I. and S.M. analyzed data and identified metabolites; M.C. performed high resolution mass spectrometry analysis and analyzed data. M.I., S.I.M., M.S. and S.D. are employees and/or shareholders of Naicons Srl. M.M.Z. is a former employee of Naicons Srl. M.C. declares that no competing interests exist.

3.6 Acknowledgements

The authors thank Christian Milani and Marco Ventura for genome reassembly and data curation, Paolo Monciardini for suggestions regarding the phylogenetic analysis and the whole Naicons team for helpful comments and discussion. M.M.Z. thanks Prof. Tanneke den Blaauwen for guidance and advice. This work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No.721484 (Train2Target). M.C. received funding from the German Research Foundation (DFG) grant nr. CR464/7-1.

71 Chapter 3

3.7 Experimental Section

General Experimental Procedures. LR-ESIMS data were acquired using a Dionex UltiMate 3000 HPLC system (Thermo Scientific) coupled to an LCQ Fleet (Thermo Scientific) mass spectrometer. Separation was achieved on - an Atlantis T3 C18 column (5 µm, 4.6 mm x 50 mm) at a flow rate of 0.8 mL x min 1 and maintained at 40 °C. The injection volume was set at 10 µL. Elution was conducted with a 0.05% (v/v) TFA in H2O-MeCN gradient as follows: 0-1 min (10% MeCN), 1-7 min (10-95% MeCN), 7-12 min (95% MeCN). The ionization was carried out using an electrospray ionization source in the positive mode (range m/z 110-2000). HR-ESIMS spectra were recorded using a Dionex UltiMate 3000 HPLC system (Thermo Scientific) coupled to a micrOTOF III (Bruker) mass spectrometer. Separation was achieved on an Atlantis T3 C18 column (5 µm, 4.6 mm x 50 mm) at a flow rate of 0.3 mL x min-1 and maintained at 25 °C. The injection volume was set at 5 µL. Elution was conducted with a 0.1% (v/v) acetic acid in H2O-0.1% (v/v) acetic acid in MeCN gradient as follows: 0-1 min (10% MeCN), 1- 20 min (10-100% MeCN), 20-26 min (100% MeCN). The ionization was carried out using an electrospray ionization source in the positive mode (range m/z 100- 3000). All solvents and reagents were purchased from Sigma-Aldrich. Culturing, Extraction and Sample Preparation: Planomonospora strains were cultivated from frozen stocks (-80 °C) on S1 plates [71] at 28 °C for 2-3 weeks. The grown mycelium was then homogenized with a sterile pestle and used to inoculate 15 mL of AF (=AF/MS) [71] medium in a 50 mL baffled flask. After cultivation on a rotary shaker (200 rpm) at 30 °C for 72 h, 1.5 mL of the exponentially growing culture was used to inoculate each 15 mL of MC1 (35 g/L soluble starch, 10 g/L glucose, 2 g/L hydrolyzed casein, 3.5 g/L meat extract, 20 g/L yeast extract, 10 g/L soybean meal, 2 g/L CaCO3, adjusted to pH 7.2), AF2 (8g/L yeast extract, 30 g/L soybean meal, 11 g/L glucose, 25 g/L malt extract, 1 g/L L-valine, 0.5 g/L Biospumex (Cognis, France), adjusted to pH 7.4), R3 (=RARE3) [72] and AF media in a 50 mL baffled flask. After 7 days of cultivation as before, 10 mL of each culture was centrifuged at 16 000 rcf for 10 min and the resulting pellet separated from the supernatant. For each culture, two different extracts were prepared: one by solvent extraction of the mycelium, and one by solid-phase adsorption of the cleared broth. The mycelium was resuspended in 4 mL of EtOH and incubated under agitation at room temperature for 1 h. After centrifugation (16 000 rcf for 10 min), the pellet was discarded, and the mycelium extract was processed as described below. The cleared broth was stirred with 1 mL of HP20 resin suspension (Diaion) for 1 h at room temperature. The exhausted broth was discarded and the loaded HP20 resin was stirred with 4 mL of purified water (Milli- Q, Merck); then, it was recovered by centrifugation and eluted with 5 mL of a 80:20 (v/v) mixture of MeOH and H2O under agitation at room temperature for 10 min. Both extracts were transferred to 96-well plates (conic bottom). For mycelium extract, 100 µL of extract were deposited, while for supernatant extract, 125 µL were deposited. Extracts were dried under vacuum at 40 °C. Using the same protocols, solvent blanks were generated. Before LC-MS analysis, extracts were

72 Chapter 3 rehydrated: for mycelium extracts, 100 µL of a 90:10 (v/v) mixture of EtOH and H2O was used, while for supernatant extracts, 125 µL of a 80:20 (v/v) mixture of MeOH and H2O utilized. The samples were centrifuged for 3 min at 16 000 rcf to remove suspended particles. Then, 50 µL of mycelium extract and 50 µL of supernatant extract, coming from a single culture, were combined, centrifuged for 3 min at 16 000 rcf and transferred to HPLC vials for analysis. To test for iron- -1 binding metabolites, 10 µL of 10 mg x mL FeCl3 in water was added to selected rehydrated extracts. Data-Dependent LC-ESI-MS/MS Analysis. For the 286 Planomonospora extracts investigated in molecular networking, data acquisition was performed on a micrOTOF-Q III (Bruker) instrument equipped with an electrospray interface (ESI), coupled to a Dionex UltiMate 3000 (Thermo Scientific) LC system, as described in the General Experimental Procedures. LC- ESI-HRMS/MS fragmentation was achieved in auto mode, with rising collision energy (35-50 keV over a gradient from 500-2000 m/z) with a frequency of 4 Hz for all ions over a threshold of 100. Calibration solution containing Na-acetate as internal reference mass (C2H3NaO2, m/z 83.0109) was injected at the beginning of each run. For the analysis of commercial standards and selected Planomonospora extracts with added FeCl3, LC-MS analyses were performed on an LCQ Fleet (Thermo scientific) mass spectrometer equipped with an electrospray interface (ESI), coupled to a Dionex UltiMate 3000 (Thermo Scientific) LC system, as described in the General Experimental Procedures and reported elsewhere. [73] LC-ESI-HR-MS/MS Data Calibration and Conversion. LC-ESI-HR- MS/MS files were calibrated with Bruker DataAnalysis using an internal calibrant (Na-acetate) in HPC mode. The calibration was verified by inspection of medium component soyasaponin A, present in all samples (calc. [M+H]+ m/z 943.526 at RT 12.7-12.9 min). Average mass deviation was routinely below 20 ppm. LC-MS files were exported (see Figure S35) to the .mgf and .mzXML format with Bruker DataAnalysis (version 4.2 SR2 Build 365 64bit) and further processed using an ad hoc-written Perl5 script (see Figure S35 and S36). This was necessary for further processing with MZmine 2, because the export to the .mzXML format erroneously resulted in the insertion of the noncalibrated value in the precursor- entry () of each MS2-scan, leading to high mass deviations (> 100 ppm) and disrupting the logical connection between MS1 and MS2 scans in the .mzXML files. The .mgf files were not affected by this and therefore, the script uses the correctly calibrated entries from the .mgf file to replace the non-calibrated entries in the .mzXML file. Pre-Processing of HR-LC-MS/MS Data with MZmine 2. For pre- processing with MZmine 2.51, .mzXML files were imported and subjected to the following workflow: (A) MassDetection = retention time, auto; MS1 noise level, 1E3; MS2 noise level, 2E1. (B) ADAP chromatogram builder [74] = retention time, auto; MS-level, 1; min group size in # of scans, 8; group intensity threshold, 5E2; min highest intensity, 1E3; m/z tolerance, 20 ppm. (C) Chromatogram deconvolution = baseline-cutoff algorithm; min peak height, 1E3; peak duration, 0.1-1.3 min; baseline level, 2.5E2; m/z range MS2 pairing, 0.02; RT range MS2 pairing, 0.4 min. (D) Isotopic peaks grouper = m/z tolerance, 20 ppm; RT

73 Chapter 3 tolerance, 0.2 min; monotonic shape, no; maximum charge, 2; representative isotope, most intense. (E) RANSAC peak alignment = m/z tolerance, 20 ppm; RT tolerance, 0.7 min; RT tolerance after correction, 0.35 min; RANSAC iterations, 100000; minimum number of points, 50%; threshold value, 0.5; linear model, no; require same charge state, no. (F) Duplicate peak filter = filter mode, new average; m/z tolerance, 0.02 m/z; RT tolerance, 0.4 min. Features with no accompanying MS2 data were excluded from the analysis. Features present in both samples and media blanks were excluded because they related to media components. Features with m/z-values of <300 or a retention time <1.5 min or >20 min were excluded. The resulting feature list contained 1492 entries and was exported to the GNPS-compatible format, using the dedicated “Export for GNPS” built-in options. Global Natural Products Social Molecular Networking (GNPS) Feature-Based Molecular MS/MS Network. Using the Feature-Based Molecular Networking (FBMN) workflow (version release_14) [40] on GNPS, [12] a molecular network was created by processing the output of MZmine 2. Parameters were adapted from the GNPS documentation: MS2 spectra were filtered so that all MS/MS fragment ions within ± 17 Da of the precursor m/z were removed and only the top 6 fragment ions in the ± 50 Da window through the spectrum were utilized, with a minimum fragment ions intensity of 50. Both the MS/MS fragment ion tolerance and the precursor ion mass tolerance were set to 0.03 Da. Edges of the created molecular network were filtered to have a cosine score above 0.7 and more than 5 matched peaks between the connected nodes. Furthermore, these were only kept in the network if each of the nodes appeared in each other’s respective top 10 most similar nodes. The maximum size of clusters the network was set to 250 and the lowest scoring edges from each family were removed until member count was below this threshold. The MS2 spectra in the molecular network were searched against GNPS spectral libraries. [12,75] Reported matches between network and library spectra were required to have a score above 0.6 and at least 5 matched peaks. The DEREPLICATOR-program was used to annotate MS/MS spectra. [76] The molecular networks were visualized using Cytoscape 3.7.1. [41] The molecular networking job is accessible by the link gnps.ucsd.edu/ProteoSAFe/status.jsp?task= 92036537c21b44c29e509291e53f6382. HR-ESI-LC-MS/MS data were deposited in MassIVE (MSV000085376) and linked with the genomic data at the iOMEGA Pairing Omics Data Platform (http://pairedomicsdata.bioinformatics.nl/projects). MS2LDA Analysis. The molecular networking job described above was analyzed by MS2LDA (version release_14), accessing the tool directly on the GNPS website. Parameters were set as follows: bin width, 0.01; Nr of LDA iterations: 1000; min MS2 intensity, 100; LDA free motifs, 500. All MotifDBs except “Streptomyces and Salinispora Motif Inclusion” were excluded. Further parameters were left at default (overlap score threshold, 0.3; probability value threshold, 0.1; topX in node, 5). Results were uploaded to the MS2LDA-website, with the width of MS2 bins set to 0.005 Da, as recommended (http://ms2lda.org/). 16S rRNA Gene Amplification and Analysis. For 16S rRNA gene amplification, single colonies were picked from S1 medium plates and lysed at 95 °C in 100 µL of PCR-grade water for 5 min. Centrifuged lysate (5 µL) was added

74 Chapter 3 to the reaction mix, containing 25 µL of DreamTaq Green PCR Master Mix 2X (Thermo Scientific), 3 µL of 10x Denhardt’s reagents, [77] each 500 nM of eubacterial primers R1492 and F27 [78] and 12 µL water, resulting in a final volume of 50 µL. The amplification was performed as reported elsewhere. [78] PCR products were sequenced using Sanger sequencing by an external service provider (Cogentech, Milan, IT), with the primers mentioned above. 16S rRNA gene sequences were inspected and assembled manually using the software AliView, [79] yielding 35 nonredundant sequences with a consensus length of 1377 bp. The sequences were analyzed with programs contained in the PHYLIP package [80] as reported elsewhere, [78] with slight modifications (bootstrapped with 1000 replicates). The resulting consensus tree was visualized using the iTOL- web server (https://itol.embl.de/). [81] Sequences were deposited in GenBank, with accession numbers included in the supplemental information (Figures S1 and S2) of this study. Isolation of gDNA. To isolate genomic DNA (gDNA), mycelium from 5 mL Planomonospora cultures, cultivated in AF medium for 72 h as described above, was extracted with standard protocols for Streptomyces by phenol- chloroform, as described elsewhere. [49] gDNA was sequenced with both Illumina and PacBio technologies by an external service provider (Macrogen, Seoul, KOR) and assembled using the program SPAdes (3.11). Genome sequences were deposited in GenBank under the BioProject ID PRJNA633779, with accession numbers JABTEX000000000 (ID82291), JABTEY000000000 (ID91781) and JABTEZ000000000 (ID67723). Bioinformatics Analyses. For sequence similarity network analysis, BiG-SCAPE/CORASON 1.0 was used in “hybrids” mode, with all settings left to default, except the cutoff-value, which was set to 0.5. The .gbk-files of BGCs detected by the web-based application antiSMASH 5.0 (antismash.secondarymetabolites.org/) were analyzed and compared to MIBiG- deposited BGCs. For phylogenetic multilocus sequence analysis, the web-based application autoMLST (automlst.ziemertlab.com/analyze#) was used in “denovo mode” with default settings. To calculate average nucleotide identity, the web- based application OrthoANIu (www.ezbiocloud.net/tools/ani) was used with default settings.

75 Chapter 3

3.8 References

1 Newman, D. J.; Cragg, G. M. Natural Products as Sources of New Drugs Over The Nearly Four Decades from 01/1981 to 09/2019. J. Nat. Prod. 2020, 83 (3), 770-803. 2 Ortholand, J.-Y.; Ganesan, A. Natural Products and Combinatorial Chemistry: Back to the Future. Curr. Opin. Chem. Biol. 2004, 8 (3), 271–280. 3 Selva, E. Growing the Seeds Sown by Piero Sensi. J. Antibiot. 2014, 67 (9), 613-617. 4 Silva, R. R. da; Dorrestein, P. C.; Quinn, R. A. Illuminating the Dark Matter in Metabolomics. Proc. Natl. Acad. Sci. U. S. A. 2015, 112 (41), 12549–12550. 5 Wolfender, J.-L.; Litaudon, M.; Touboul, D.; Queiroz, E. F. Innovative Omics- Based Approaches for Prioritisation and Targeted Isolation of Natural Products–New Strategies for Drug Discovery. Nat. Prod. Rep. 2019, 36 (6), 855–868. 6 Yera, E. R.; Cleves, A. E.; Jain, A. N. Chemical Structural Novelty: On-Targets and Off-Targets. J. Med. Chem. 2011, 54 (19), 6771–6785. 7 Cimermancic, P.; Medema, M. H.; Claesen, J.; Kurita, K.; Brown, L. C. W.; Mavrommatis, K.; Pati, A.; Godfrey, P. A.; Koehrsen, M.; Clardy, J.; Birren, B. W.; Takano, E.; Sali, A.; Linington, R. G.; Fischbach, M. A. Insights Into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters. Cell 2014, 158 (2), 412–421. 8 Blin, K.; Wolf, T.; Chevrette, M. G.; Lu, X.; Schwalen, C. J.; Kautsar, S. A.; Suarez Duran, H. G.; Santos, E. L. C. de los; Kim, H. U.; Nave, M.; Dickschat, J. S.; Mitchell, D. A.; Shelest, E.; Breitling, R.; Takano, E.; Lee, S. Y.; Weber, T.; Medema, M. H. antiSMASH 4.0—Improvements in Chemistry Prediction and Gene Cluster Boundary Identification. Nucleic Acids Res. 2017, 45 (W1), W36–W41. 9 Kautsar, S. A.; Blin, K.; Shaw, S.; Navarro-Muñoz, J. C.; Terlouw, B. R.; Hooft, J. J. J. van der; Santen, J. A. van; Tracanna, V.; Suarez Duran, H. G.; Pascal Andreu, V.; Selem-Mojica, N.; Alanjary, M.; Robinson, S. L.; Lund, G.; Epstein, S. C.; Sisto, A. C.; Charkoudian, L. K.; Collemare, J.; Linington, R. G.; Weber, T.; Medema, M. H. MIBiG 2.0: a Repository for Biosynthetic Gene Clusters of Known Function. Nucleic Acids Res. 2020, 48 (D1), D454–D458. 10 Yang, J. Y.; Sanchez, L. M.; Rath, C. M.; Liu, X.; Boudreau, P. D.; Bruns, N.; Glukhov, E.; Wodtke, A.; Felicio, R. de; Fenner, A.; Wong, W. R.; Linington, R. G.; Zhang, L.; Debonsi, H. M.; Gerwick, W. H.; Dorrestein, P. C. Molecular Networking as a Dereplication Strategy. J. Nat. Prod. 2013, 76 (9), 1686– 1699. 11 Ramos, A. E. F.; Evanno, L.; Poupon, E.; Champy, P.; Beniddir, M. A. Natural Products Targeting Strategies Involving Molecular Networking: Different Manners, One Goal. Nat. Prod. Rep. 2019, 36 (7), 960–980. 12 Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crüsemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderón, M.; Kersten, R. D.; Pace,

76 Chapter 3

L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.; Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya, C. A. P.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O’Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodríguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Müller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, B. Ø.; Pogliano, K.; Linington, R. G.; Gutiérrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N. Sharing and Community Curation of Mass Spectrometry Data With Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016, 34 (8), 828–837. 13 Santen, J. A. van; Jacob, G.; Singh, A. L.; Aniebok, V.; Balunas, M. J.; Bunsko, D.; Neto, F. C.; Castaño-Espriu, L.; Chang, C.; Clark, T. N.; Cleary Little, J. L.; Delgadillo, D. A.; Dorrestein, P. C.; Duncan, K. R.; Egan, J. M.; Galey, M. M.; Haeckl, F. J. P.; Hua, A.; Hughes, A. H.; Iskakova, D.; Khadilkar, A.; Lee, J.-H.; Lee, S.; LeGrow, N.; Liu, D. Y.; Macho, J. M.; McCaughey, C. S.; Medema, M. H.; Neupane, R. P.; O’Donnell, T. J.; Paula, J. S.; Sanchez, L. M.; Shaikh, A. F.; Soldatou, S.; Terlouw, B. R.; Tran, T. A.; Valentine, M.; Hooft, J. J. J. van der; Vo, D. A.; Wang, M.; Wilson, D.; Zink, K. E.; Linington, R. G. The Natural Products Atlas: an Open Access Knowledge Base for Microbial Natural Products Discovery. ACS Cent. Sci. 2019, 5 (11), 1824– 1833. 14 Crüsemann, M.; O’Neill, E. C.; Larson, C. B.; Melnik, A. V.; Floros, D. J.; Silva, R. R. da; Jensen, P. R.; Dorrestein, P. C.; Moore, B. S. Prioritizing Natural Product Diversity in a Collection Of 146 Bacterial Strains Based on Growth and Extraction Protocols. J. Nat. Prod. 2017, 80 (3), 588–597. 15 Duncan, K. R.; Crüsemann, M.; Lechner, A.; Sarkar, A.; Li, J.; Ziemert, N.; Wang, M.; Bandeira, N.; Moore, B. S.; Dorrestein, P. C.; Jensen, P. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species. Chem. Biol. 2015, 22 (4), 460–471. 16 Ziemert, N.; Lechner, A.; Wietz, M.; Millán-Aguiñaga, N.; Chavarria, K. L.; Jensen, P. R. Diversity and Evolution of Secondary Metabolism in the Marine Actinomycete Genus Salinispora. Proc. Natl. Acad. Sci. U. S. A. 2014, 111 (12), E1130–E1139.

77 Chapter 3

17 Männle, D.; McKinnie, S. M. K.; Mantri, S. S.; Steinke, K.; Lu, Z.; Moore, B. S.; Ziemert, N.; Kaysser, L. Comparative Genomics and Metabolomics in the Genus Nocardia. Msystems 2020, 5 (3), e00125-20. 18 Krug, D.; Zurek, G.; Revermann, O.; Vos, M.; Velicer, G. J.; Müller, R. Discovering the Hidden Secondary Metabolome of Myxococcus xanthus: a Study of Intraspecific Diversity. Appl. Environ. Microbiol. 2008, 74 (10), 3058– 3068. 19 Maansson, M.; Vynne, N. G.; Klitgaard, A.; Nybo, J. L.; Melchiorsen, J.; Nguyen, D. D.; Sanchez, L. M.; Ziemert, N.; Dorrestein, P. C.; Andersen, M. R.; Gram, L. An Integrated Metabolomic and Genomic Mining Workflow to Uncover the Biosynthetic Potential of Bacteria. Msystems 2016, 1 (3) e00028- 15. 20 Monciardini, P.; Iorio, M.; Maffioli, S.; Sosio, M.; Donadio, S. Discovering New Bioactive Molecules from Microbial Sources. Microb. Biotechnol. 2014, 7 (3), 209–220. 21 Thiemann, J. E. A New Genus of the Actinoplanaceae: Planomonospora Gen. Nov. Microbiol. 1967, 15, 27–38. 22 Mertz, F. P. Planomonospora alba Sp. Nov. and Planomonospora sphaerica Sp. Nov., Two New Species Isolated from Soil by Baiting Techniques. Int. J. Syst. Evol. Microbiol. 1994, 44 (2), 274–281. 23 Ebata, M.; Miyazaki, K.; Otsuka, H. Studies on Siomycin. J. Antibiot. 1969, 22 (8), 364–368. 24 Thiemann, J.; Coronelli, C.; Pagani, H.; Beretta, G.; Tamoni, G.; Arioli, V. Sporangiomycin, an Antibacterial Agent Isolated from Planomonospora parontospora Var. antibiotica Var. Nov. Antibiot. 1968, 21, 525–531. 25 Maffioli, S. I.; Potenza, D.; Vasile, F.; De Matteo, M.; Sosio, M.; Marsiglia, B.; Rizzo, V.; Scolastico, C.; Donadio, S. Structure Revision of the Lantibiotic 97518. J. Nat. Prod. 2009, 72 (4), 605–607. 26 Castiglione, F.; Cavaletti, L.; Losi, D.; Lazzarini, A.; Carrano, L.; Feroggio, M.; Ciciliato, I.; Corti, E.; Candiani, G.; Marinelli, F.; Selva, E. A Novel Lantibiotic Acting on Bacterial Cell Wall Synthesis Produced by the Uncommon Actinomycete Planomonospora sp. Biochemistry 2007, 46 (20), 5884–5895. 27 Maffioli, S. I.; Monciardini, P.; Catacchio, B.; Mazzetti, C.; Münch, D.; Brunati, C.; Sahl, H.-G.; Donadio, S. Family of Class I Lantibiotics from Actinomycetes and Improvement of their Antibacterial Activities. ACS Chem. Biol. 2015, 10 (4), 1034–1042. 28 Kodani, S.; Inoue, Y.; Suzuki, M.; Dohra, H.; Suzuki, T.; Hemmi, H.; Ohnishi- Kameyama, M. Sphaericin, a Lasso Peptide from the Rare Actinomycete Planomonospora sphaerica. Eur. J. Org. Chem. 2017, 8, 1177–1183. 29 Wingender, W.; Hugo, H. von; Frommer, W.; Schäfer, D. A Protease Inhibitor Isolated from Planomonospora parontospora. J. Antibiot. 1975, 28 (8), 611– 612. 30 Suda, H.; Aoyagi, T.; Hamada, M.; Takeuchi, T.; Umezawa, H. Antipain, a New Protease Inhibitor Isolated from Actinomycetes. J. Antibiot. 1972, 25 (4), 263–265.

78 Chapter 3

31 Olivon, F.; Grelier, G.; Roussi, F.; Litaudon, M.; Touboul, D. Mzmine 2 Data- Preprocessing to Enhance Molecular Networking Reliability. Anal. Chem. 2017, 89 (15), 7836−7840. 32 Nothias, L.-F.; Nothias-Esposito, M.; Silva, R. da; Wang, M.; Protsyuk, I.; Zhang, Z.; Sarvepalli, A.; Leyssen, P.; Touboul, D.; Costa, J.; Paolini, J.; Alexandrov, T.; Litaudon, M.; Dorrestein, P. C. Bioactivity-Based Molecular Networking for the Discovery of Drug Leads in Natural Product Bioassay- Guided Fractionation. J. Nat. Prod. 2018, 81 (4), 758–767. 33 Kang, K. B.; Park, E. J.; Silva, R. R. da; Kim, H. W.; Dorrestein, P. C.; Sung, S. H. Targeted Isolation of Neuroprotective Dicoumaroyl Neolignans and Lignans from theezans Using In Silico Molecular Network Annotation Propagation-Based Dereplication. J. Nat. Prod. 2018, 81 (8), 1819–1828. 34 Kang, K. B.; Woo, S.; Ernst, M.; Hooft, J. J. J. van der; Nothias, L.-F.; Silva, R. R. da; Dorrestein, P. C.; Sung, S. H.; Lee, M. Assessing Specialized Metabolite Diversity of Alnus Species by a Digitized LC–MS/MS Data Analysis Workflow. Phytochemistry 2020, 173, 112292. 35 Olivon, F.; Remy, S.; Grelier, G.; Apel, C.; Eydoux, C.; Guillemot, J.-C.; Neyts, J.; Delang, L.; Touboul, D.; Roussi, F.; Litaudon, M. Antiviral Compounds from Codiaeum peltatum Targeted by a Multi-informative Molecular Networks Approach. J. Nat. Prod. 2019, 82 (2), 330–340. 36 Olivon, F.; Retailleau, P.; Desrat, S.; Touboul, D.; Roussi, F.; Apel, C.; Litaudon, M. Isolation of Picrotoxanes from Austrobuxus carunculatus Using Taxonomy-Based Molecular Networking. J. Nat. Prod. 2020, 83 (10), 3069– 3079. 37 Suriyachadkun, C.; Ngaemthao, W.; Chunhametha, S. Planomonospora corallina sp. nov., Isolated from Soil. Int. J. Syst. Evol. Microbiol. 2016, 66 (8), 3224–3229. 38 Chaouch, F. C.; Bouras, N.; Mokrane, S.; Bouznada, K.; Zitouni, A.; Pötter, G.; Spröer, C.; Klenk, H.-P.; Sabaou, N. Planomonospora algeriensis sp. nov., an Actinobacterium Isolated from a Saharan Soil Of Algeria. Antonie van leeuwenhoek 2017, 110 (2), 245–252. 39 Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry- Based Molecular Profile Data. BMC Bioinf. 2010, 11 (1), 395. 40 Nothias, L. F.; Petras, D.; Schmid, R.; Dührkop, K.; Rainer, J.; Sarvepalli, A.; Protsyuk, I.; Ernst, M.; Tsugawa, H.; Fleischauer, M.; Aicheler, F.; Aksenov, A.; Alka, O.; Allard, P.-M.; Barsch, A.; Cachet, X.; Caraballo, M.; Da Silva, R. R.; Dang, T.; Garg, N.; Gauglitz, J. M.; Gurevich, A.; Isaac, G.; Jarmusch, A. K.; Kameník, Z.; Kang, K. B.; Kessler, N.; Koester, I.; Korf, A.; Gouellec, A. L.; Ludwig, M.; Christian, M. H.; McCall, L.-I.; McSayles, J.; Meyer, S. W.; Mohimani, H.; Morsy, M.; Moyne, O.; Neumann, S.; Neuweger, H.; Nguyen, N. H.; Nothias-Esposito, M.; Paolini, J.; Phelan, V. V.; Pluskal, T.; Quinn, R. A.; Rogers, S.; Shrestha, B.; Tripathi, A.; Hooft, J. J. J. van der; Vargas, F.; Weldon, K. C.; Witting, M.; Yang, H.; Zhang, Z.; Zubeil, F.; Kohlbacher, O.; Böcker, S.; Alexandrov, T.; Bandeira, N.; Wang, M.; Dorrestein, P. C. Feature-

79 Chapter 3

Based Molecular Networking in the GNPS Analysis Environment. Nature Methods 2020, 17 (9), 905–908. 41 Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: a Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003, 13 (11), 2498–2504. 42 Hoffmann, T.; Krug, D.; Bozkurt, N.; Duddela, S.; Jansen, R.; Garcia, R.; Gerth, K.; Steinmetz, H.; Müller, R. Correlating Chemical Diversity with Taxonomic Distance for Discovery Of Natural Products in Myxobacteria. Nat. Commun. 2018, 9, 803. 43 Hubert, J.; Nuzillard, J.-M.; Renault, J.-H. Dereplication Strategies in Natural Product Research: How Many Tools and Methodologies Behind the Same Concept? Phytochem. Rev. 2017, 16 (1), 55–95. 44 Beutler, J. A.; Alvarado, A. B.; Schaufelberger, D. E.; Andrews, P.; McCloud, T. G. Dereplication of Phorbol Bioactives: Lyngbya majuscula and Croton cuneatus. J. Nat. Prod. 1990, 53 (4), 867–874. 45 Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R. Proposed Minimum Reporting Standards for Chemical Analysis. Metabolomics 2007, 3 (3), 211–221. 46 Imbert, M.; Béchet, M.; Blondeau, R. Comparison of the Main Siderophores Produced by some Species of Streptomyces. Curr. Microbiol. 1995, 31 (2), 129–133. 47 Dohra, H.; Suzuki, T.; Inoue, Y.; Kodani, S. Draft Genome Sequence of Planomonospora sphaerica JCM9374, A Rare Actinomycete. Genome Announc. 2016, 4 (4), e00779–16. 48 Zdouc, M. M.; Alanjary, M. M.; Maffioli, S. I.; Crüsemann, M.; Medema, M. M.; Donadio, S.; Sosio, M. A Biaryl-Linked Tripeptide from Planomonospora Reveals a Widespread Class of Minimal RiPP Gene Clusters. Cell Chem. Biol. 2021, 28, 1–7. 49 Tocchetti, A.; Bordoni, R.; Gallo, G.; Petiti, L.; Corti, G.; Alt, S.; Cruz, J. C.; Salzano, A. M.; Scaloni, A.; Puglia, A. M.; De Bellis, G.; Peano, C.; Donadio, S.; Sosio, M. A Genomic, Transcriptomic and Proteomic Look at the GE2270 Producer Planobispora rosea, an Uncommon Actinomycete. PLOS ONE 2015, 10 (7) e0133705. 50 Alanjary, M.; Steinke, K.; Ziemert, N. AutoMLST: an Automated Web Server for Generating Multi-Locus Species Trees Highlighting Natural Product Potential. Nucleic Acids Res. 2019, 47 (W1), W276–W282. 51 Lee, I.; Kim, Y. O.; Park, S.-C.; Chun, J. OrthoANI: an Improved Algorithm and Software for Calculating Average Nucleotide Identity. Int. J. Syst. Evol. Microbiol. 2016, 66 (2), 1100–1103. 52 Yoon, S.-H.; Ha, S.-m.; Lim, J.; Kwon, S.; Chun, J. Introducing EzBioCloud: a Taxonomically United Database of 16S rRNA Gene Sequences and Whole- Genome Assemblies. Antonie van Leeuwenhoek 2017, 110 (10), 1281–1286.

80 Chapter 3

53 Goris, J.; Konstantinidis, K. T.; Klappenbach, J. A.; Coenye, T.; Vandamme, P.; Tiedje, J. M. DNA–DNA Hybridization Values and their Relationship to Whole-Genome Sequence Similarities. Int. J. Syst. Evol. Microbiol. 2007, 57 (1), 81–91. 54 Richter, M.; Rosselló-Móra, R. Shifting the Genomic Gold Standard for the Prokaryotic Species Definition. Proc. Natl. Acad. Sci. U. S. A. 2009, 106 (45), 19126–19131. 55 Navarro-Muñoz, J. C.; Selem-Mojica, N.; Mullowney, M. W.; Kautsar, S. A.; Tryon, J. H.; Parkinson, E. I.; De Los Santos, E. L. C.; Yeong, M.; Cruz- Morales, P.; Abubucker, S.; Roeters, A.; Lokhorst, W.; Fernandez-Guerra, A.; Cappelini, L. T. D.; Goering, A. W.; Thomson, R. J.; Metcalf, W. W.; Kelleher, N. L.; Barona-Gomez, F.; Medema, M. H. A Computational Framework to Explore Large-Scale Biosynthetic Diversity. Nat. Chem. Biol. 2020, 16 (1), 60– 68. 56 Wiegand, S.; Jogler, M.; Boedeker, C.; Pinto, D.; Vollmers, J.; Rivas-Marín, E.; Kohn, T.; Peeters, S. H.; Heuer, A.; Rast, P.; Oberbeckmann, S.; Bunk, B.; Jeske, O.; Meyerdierks, A.; Storesund, J. E.; Kallscheuer, N.; Lücker, S.; Lage, O. M.; Pohl, T.; Merkel, B. J.; Hornburger, P.; Müller, R.-W.; Brümmer, F.; Labrenz, M.; Spormann, A. M.; Op den Camp, H. J. M.; Overmann, J.; Amann, R.; Jetten, M. S. M.; Mascher, T.; Medema, M. H.; Devos, D. P.; Kaster, A.-K.; Ovreas, L.; Rohde, M.; Galperin, M. Y.; Jogler, C. Cultivation and Functional Characterization of 79 Planctomycetes Uncovers their Unique Biology. Nat. Microbiol. 2020, 5 (1), 126–140. 57 Liao, R.; Duan, L.; Lei, C.; Pan, H.; Ding, Y.; Zhang, Q.; Chen, D.; Shen, B.; Yu, Y.; Liu, W. Thiopeptide Biosynthesis Featuring Ribosomally Synthesized Precursor Peptides and Conserved Posttranslational Modifications. Chem. Biol. 2009, 16 (2), 141–147. 58 a. Wandy, J.; Zhu, Y.; van der Hooft, J. J. J.; Daly, R.; Barrett, M. P.; Rogers, S. Ms2lda.org: Web-Based Topic Modelling for Substructure Discovery in Mass Spectrometry. Bioinformatics 2017, 34 (2), 317–318. b. Van Der Hooft, J. J. J.; Wandy, J.; Barrett, M. P.; Burgess, K. E.; Rogers, S. Topic Modeling for Untargeted Substructure Exploration in Metabolomics. Proc. Natl. Acad. Sci. U.S.A. 2016, 113 (48), 13738- 13743. 59 Duan, L.; Wang, S.; Liao, R.; Liu, W. Insights into Quinaldic Acid Moiety Formation in Thiostrepton Biosynthesis Facilitating Fluorinated Thiopeptide Generation. Chem. Biol. 2012, 19 (4), 443–448. 60 Ichikawa, H.; Bashiri, G.; Kelly, W. L. Biosynthesis of the Thiopeptins and Identification of an F420H2-Dependent Dehydropiperidine Reductase. J. Am. Chem. Soc. 2018, 140 (34), 10749–10756. 61 Just-Baringo, X.; Albericio, F.; Alvarez, M. Thiopeptide Antibiotics: Retrospective and Recent Advances. Angew. Chem., Int. Ed. 2014, 53 (26), 6602–6616. 62 Zhang, Q.; Liu, W. Expansion of Chemical Space for Natural Products by Uncommon P450 Reactions. Nat. Prod. Rep. 2013, 30 (2), 218–226.

81 Chapter 3

63 Schulz, D.; Nachtigall, J.; Riedlinger, J.; Schneider, K.; Poralla, K.; Imhoff, J. F.; Beil, W.; Nicholson, G.; Fiedler, H.-P.; Süssmuth, R. D. Piceamycin and its N-Acetylcysteine Adduct is Produced by Streptomyces sp. GB 4-2. J. Antibiot. 2009, 62 (9), 513–518. 64 Gilpin, M. L.; Fulston, M.; Payne, D.; Cramp, R.; Hood, I. Isolation and Structure Determination of two Novel Phenazines from a Streptomyces with Inhibitory Activity Against Metallo-Enzymes, Including Metallo-Β-Lactamase. J. Antibiot. 1995, 48 (10), 1081–1085. 65 Keberle, H. The Biochemistry of Desferrioxamine and its Relation to Iron Metabolism. Ann. N. Y. Acad. Sci. 1964, 119 (2), 758–768. 66 Winkelmann, G. Microbial Siderophore-Mediated Transport. Biochem. Soc. Trans. 2002, 30 (4), 691–696. 67 Bruns, H.; Crüsemann, M.; Letzel, A.-C.; Alanjary, M.; McInerney, J. O.; Jensen, P. R.; Schulz, S.; Moore, B. S.; Ziemert, N. Function-Related Replacement of Bacterial Siderophore Pathways. ISME J. 2018, 12 (2), 320– 329. 68 Maxson, T.; Tietz, J. I.; Hudson, G. A.; Guo, X. R.; Tai, H.-C.; Mitchell, D. A. Targeting Reactive Carbonyls for Identifying Natural Products and their Biosynthetic Origins. J. Am. Chem. Soc. 2016, 138 (46), 15157–15166. 69 Martinet, L.; Naômé, A.; Deflandre, B.; Maciejewska, M.; Tellatin, D.; Tenconi, E.; Smargiasso, N.; De Pauw, E.; Wezel, G. P. van; Rigali, S. A Single Biosynthetic Gene Cluster is Responsible for the Production of Bagremycin Antibiotics and Ferroverdin Iron Chelators. MBio 2019, 10 (4), e01230–19. 70 Hegemann, J. D.; Zimmermann, M.; Xie, X.; Marahiel, M. A. Caulosegnins I– III: a Highly Diverse Group of Lasso Peptides Derived from a Single Biosynthetic Gene Cluster. J. Am. Chem. Soc. 2013, 135 (1), 210–222. 71 Donadio, S.; Monciardini, P.; Sosio, M. Approaches to Discovering Novel Antibacterial and Antifungal Agents. Methods Enzymol. 2009, 458, 3–28. 72 Sosio, M.; Stinchi, S.; Beltrametti, F.; Lazzarini, A.; Donadio, S. The Gene Cluster for the Biosynthesis of the Glycopeptide Antibiotic A40926 by Nonomuraea Species. Chem. Biol. 2003, 10 (6), 541–549. 73 Iorio, M.; Tocchetti, A.; Cruz, J. C. S.; Del Gatto, G.; Brunati, C.; Maffioli, S. I.; Sosio, M.; Donadio, S. Novel Polyethers from Screening Actinoallomurus spp. Antibiotics 2018, 7 (2), 47. 74 Myers, O. D.; Sumner, S. J.; Li, S.; Barnes, S.; Du, X. One Step Forward for Reducing False Positive and False Negative Compound Identifications from Mass Spectrometry Metabolomics Data: New Algorithms for Constructing Extracted Ion Chromatograms and Detecting Chromatographic Peaks. Anal. Chem. 2017, 89 (17), 8696–8703. 75 Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T. Massbank: a Public Repository for Sharing Mass Spectral Data for Life Sciences. J. Mass Spectrom. 2010, 45 (7), 703–714.

82 Chapter 3

76 Mohimani, H.; Gurevich, A.; Shlemov, A.; Mikheenko, A.; Korobeynikov, A.; Cao, L.; Shcherbin, E.; Nothias, L.-F.; Dorrestein, P. C.; Pevzner, P. A. Dereplication Of Microbial Metabolites Through Database Search of Mass Spectra. Nat. Commun. 2018, 9 (1), 4035. 77 Volossiouk, T.; Robb, E. J.; Nazar, R. N. Direct DNA Extraction For PCR- Mediated Assays of Soil Organisms. Appl. Environ. Microbiol. 1995, 61 (11), 3972–3976. 78 Monciardini, P.; Sosio, M.; Cavaletti, L.; Chiocchini, C.; Donadio, S. New PCR Primers for the Selective Amplification of 16S rDNA from Different Groups of Actinomycetes. FEMS Microbiol. Ecol. 2002, 42 (3), 419–429. 79 Larsson, A. Aliview: A Fast and Lightweight Alignment Viewer and Editor for Large Datasets. Bioinformatics 2014, 30 (22), 3276–3278. 80 Felsenstein, J. PHYLIP (Phylogeny Inference Package), Version 3.5c. Joseph Felsenstein, 1993. 81 Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v4: Recent Updates and New Developments. Nucleic Acids Res. 2019, 47 (W1), W256–W259.

83 Chapter 3

3.9 Supplemental Information

Figures S1-S38.

Fig. S1. List of strains investigated in this study (1/2).

84 Chapter 3

Fig. S2. List of strains investigated in this study (2/2).

85 Chapter 3

Fig. S3. Schematic visualization of feature finding and molecular networking. (A) Extracted ion chromatograms from LC-MS samples are aligned by MZmine 2 and peaks with common retention time and m/z are detected. MS2 fragmentation spectra are associated to the newly created, non-redundant features. (B) MS2 fragmentation spectra of features are compared with each other and a molecular network is created, with features sporting similar MS2 fragmentation pattern connected by edges.

86 Chapter 3

Fig. S4. Summary of features identified in this study.

87 Chapter 3

Fig. S5. Table indicating presence/absence of identified features. Colors indicate the corresponding phylogroups, used to classify the strains cultivated in this study.

88 Chapter 3

Fig. S6. The quinalidic acid moiety of siomycin supposedly leads to characteristic fragmentation patterns. (A) Shows a break between the Ala2- and Dha3 as well as Thr12 and QA that leads to diagnostic peaks in [M+2H]2+-ions. (B) Shown are example spectra, observed in the present study.

89 Chapter 3

Fig. S7. (A) Putative fragmentation pathway of siomycins. (B) List of identified putative siomycins, congeners and related features, showing similar tandem mass fragmentation pattern.

90 Chapter 3

Fig. S8. MS2-fragmentation spectra of the [M+H]+ (A) and [M+2H]2+ (B) ions of siomycin E.

91 Chapter 3

Fig. S9. Comparison of thiopeptide precursor peptides. TsrH = thiostrepton precursor peptide; SioH = siomycin precursor peptide; TpnA = thiopeptin precursor peptide; RiPP4 = precursor peptide of siomycin in the genome of strain ID91781, detected in this study. All thiopeptide core peptides have two serines at their C-terminal end, which are eventually modified to dehydroalanines.

92 Chapter 3

Fig. S10. Annotated MS2LDA-motifs identified in this study.

93 Chapter 3

Fig. S11. Iron-complexion of metabolite m/z 631.4 and its related metabolite m/z 843.47 upon addition of FeCl3 (original LCMS-traces of examples shown in Figure 10 of the main text).

94 Chapter 3

Fig. S12. Overview of putative siderophores identified in this study.

95 Chapter 3

Fig. S13. Examples for similarities in MS2 fragmentation between iron-complexing metabolites from different molecular families, both in common fragments and neutral losses. (A) Feature ID279 (m/z 631.3), representative of molecular family I. (B) Feature ID267 (m/z 843.5), representative of molecular family II. (C) Feature ID1549 (m/z 742.4), representative of molecular family III. (D) Feature ID289 (m/z 905.5), representative of molecular family IV. (E) Desferrioxamine E, which shows no similarities to features A-D.

96 Chapter 3

Fig. S14. Iron-complexion of feature ID279 (631.3 m/z) and its related metabolites in molecular family I.

97 Chapter 3

Fig. S15. Iron-complexion of putative siderophore metabolites in molecular family II (1/2).

98 Chapter 3

Fig. S16. Iron-complexion of putative siderophore metabolites in molecular family II (2/2).

99 Chapter 3

Fig. S17. Iron-complexion of putative siderophore metabolites in molecular family III.

100 Chapter 3

Fig. S18. Iron-complexion of putative siderophore metabolites in molecular family IV (1/2).

101 Chapter 3

Fig. S19. Iron-complexion of putative siderophore metabolites in molecular family IV (1/2).

102 Chapter 3

Fig. S20. Evidence for the annotation of the metabolite antipain.

103 Chapter 3

Fig. S21. Evidence for the annotation of metabolite acyl-desferrioxamine C13.

Fig. S22. Evidence for the annotation of metabolite acyl-desferrioxamine C15.

104 Chapter 3

Fig. S23. Evidence for the annotation of metabolite acyl-desferrioxamine C16.

105 Chapter 3

Fig. S24. Evidence for the annotation of metabolite chymostatin A or C (Leu/Ile)

106 Chapter 3

Fig. S25. Evidence for the annotation of metabolite chymostatin B.

107 Chapter 3

Fig. S26. Evidence for the annotation of metabolite chymostatinol A.

108 Chapter 3

Fig. S27. Evidence for the annotation of metabolite chymostatinol B.

109 Chapter 3

Fig. S28. Evidence for the annotation of metabolite GE20372 A or B.

Fig. S29. Evidence for the annotation of metabolite KF-77-AG6.

110 Chapter 3

Fig. S30. Evidence for the annotation of metabolite desferrioxamine B.

111 Chapter 3

Fig. S31. Evidence for the annotation of metabolite lantibiotic 97518, also known as planosporicin.

112 Chapter 3

Fig. S32. Evidence for the annotation of metabolite sphaericin.

Fig. S33. Evidence for the annotation of metabolite riboflavin-glucoside.

113 Chapter 3

Fig. S34. Evidence for the annotation of metabolite riboflavol.

Fig. S35. Transcript of Bruker method-file for the bulk export of .mzXML and .mgf- files.

114 Chapter 3

Fig.S36. Precursor correction workflow. Both .mzXML and .mgf-files are exported from the Bruker .raw-format using Bruker Data Analysis software. The Perl5 “precursor modification script” uses the correct precursor values from the .mgf-file to modify the un-calbrated precursor values from the .mzXML. A new file with the suffix “_mod.mzXML” is created and used in the consequent data processing.

Fig. S37. Script for replacing un-calibrated precursor values in .mzXML-files with the calibrated ones from the .mgf-file.

115 Chapter 3

Fig. S38. Genes considered for the multilocus sequence analysis (autoMLST).

116 Chapter 4

4

A Biaryl-Linked Tripeptide from Planomonospora Reveals a Widespread Class of Minimal RiPP Gene Clusters

Mitja M. Zdouc and Mohammad M. Alanjary and Guadalupe S. Zarazúa and Sonia I. Maffioli and Max Crüsemann and Marnix M. Medema and Stefano Donadio and Margherita Sosio

Parts of this chapter have been published in Cell Chemical Biology, 28, 1-7 (2021).

117 Chapter 4

4.1 Abstract icrobial natural products impress by their bioactivity, structural M diversity and ingenious biosynthesis. While screening the less exploited actinobacterial genus Planomonospora, two cyclopeptides were discovered, featuring an unusual Tyr-His biaryl-bridging across a tripeptide scaffold, with the sequences N-acetyl-Tyr-Tyr-His and N- acetyl-Tyr-Phe-His. Planomonospora genomes pointed towards a ribosomal synthesis of the cyclopeptide from a pentapeptide precursor encoded by the 18- bp bytA gene, to our knowledge the smallest coding gene ever reported. Closely linked to bytA is bytO, encoding a cytochrome P450 monooxygenase likely to be responsible for biaryl installment. In Streptomyces, the bytAO segment was sufficient to direct production of the cross-linked N-acetylated Tyr-Tyr-His tripeptide. Bioinformatic analysis of related cytochrome P450 monooxygenases indicated that they constitute a widespread family of enzymes, and the corresponding genes are closely linked to 5-aa coding sequences in approximately 200 (actino)bacterial genomes, all with potential for biaryl linkage between amino acids 1 and 3. We propose the name biarylitides for this family of RiPPs.

118 Chapter 4

4.2 Introduction

Microbial natural products (NPs) impress not only for their potent bioactivity, but also for their chemical diversity and originality. Structurally novel compounds have a high chance of interacting with novel targets or with mechanisms of action different from known compounds. [1-3] In recent years, ribosomally synthesized and post-translationally modified peptides (RiPPs) have drawn attention due to their intricate, three-dimensional structures and wide distribution in bacteria. RiPP biosynthetic gene clusters (BGCs) are compact, encompassing only a precursor peptide, separated in a N-terminal leader and a C-terminal core peptide. After ribosomal translation, the precursor peptide is modified by dedicated enzymes, with consecutive cleavage of the leader peptide and release of the mature core peptide. [4,5] RiPPs are grouped into families based on their structural features and post-translational modifications. [5] Several bioinformatic tools have been developed for the automatic recognition of RiPP BGCs in microbial genomes, which are usually based on recognizing elements such as leader peptides or diagnostic RiPP-processing enzymes. [6] In this study, we identified two cross-linked, N-acetylated tripeptides featuring an unusual Tyr1- His3 biaryl-bridging. Genome analysis of the producing strains led to the identification of the shortest RiPP BGC so far reported, encoding just a pentapeptide precursor and a closely linked cytochrome P450 monooxygenase. Phylogenomic analysis of the latter sequence that cytochrome P450 monooxygenase genes closely linked to a 5-aa coding sequence represent a widespread family of minimal RiPP BGCs present especially in actinobacterial genomes.

119 Chapter 4

Fig. 1. (A) Chemical structures of 1 and 2. (B) Key 2D NMR correlations for 1.

Fig. 2. Structures of relevant metabolites mentioned in the text. K-13 (3), OF-4949 (4), pseudosporamide (5), tryptorubin (6) and cittilin A (7). Note that four variants of OF-4949 have been reported, with either H or Me on Tyr1-OH and H or OH on the Asn-carbon, but only one is shown for simplicity.

4.3 Results & Discussion

Identification of cross-linked N-acetylated Tyr-Tyr-His. In the search for microbial NPs, we investigated 72 strains belonging to the poorly characterized actinomycete genus Planomonospora [7] by extensive, LC-HR-ESI-MS2 based

120 Chapter 4 metabolome mining of extracts. [8] Molecular networking analysis revealed a group of related features (see Figure S1), observed in samples from four phylogenetically related strains, with 16S rRNA gene sequences having 99.5 to 99.6% identity with Planomonospora algeriensis PM3. [8] One strain, Planomonospora sp. ID82291, produced a compound - 1 - with m/z 522.198

+ [M+H] (calculated molecular formula C26H28N5O7), while the other three strains produced a compound - 2 - with m/z 506.203 [M+H]+ (calculated molecular formula

C26H28N5O6). Additional variants were found only in extracts from strain ID46116 (Figure S1A) and were not further investigated. 1 and 2 showed UV absorption maxima at 270 and 315 nm, the latter indicating an unusual chromophore. Planomonospora sp. ID82291 was selected for further investigation. From a 15 L culture, 5 mg pure compound 1 were isolated. Extensive 1D- and 2D NMR analyses indicated an acetylated short peptide. A first set of experiments, performed in DMSO-d6, allowed the observation of three aromatic spin systems containing CH and CH2 consistent with two tyrosine and one histidine residues. The correlations observed in a NOESY-experiment suggested the sequence N- acetyl-Tyr-Tyr-His, but with an additional unsaturation, which could not be assigned due to overlapping aromatic signals. A second set of analyses in CD3OD allowed the complete assignment of the non-exchangeable protons of compound 1. TOCSY and HSQC experiments showed that the histidine residue lacked its characteristic aromatic proton at position 5’’ but had a quaternary carbon at 142 ppm. In an HMBC experiment, a diagnostic cross-correlation was observed between both CH-5 and CH-8 and C5”, consistent with a C-C bond between C6 and C5" (Figure 1B). The proton and carbon assignments and the 1D- and 2D spectra are reported in Figures S2 and S8 through S18. The existence of the cross-link was confirmed after hydrolysis of 1, which lead to the release of a compound with m/z 335 [M+H]+, corresponding to the expected carbon-carbon linked tyrosine and histidine, and a compound with m/z 480 [M+H]+, consistent with the loss of the acetyl-group from the intact tripeptide. The m/z 335 [M+H]+ compound retained the UV maximum at 315 nm, as expected for the biaryl chromophore (see Figure S3).

121 Chapter 4

Compound 2, produced by the other three Planomonospora isolates, was deduced to be N-acetyl-Tyr-Phe-His, based on the similarity of the tandem mass fragmentation of compounds 1 and 2, with a neutral loss corresponding to phenylalanine (-147.07 Da) being the only difference between the fragmentation spectra (see Figure S4). We could only identify one literature precedent for NPs with a tyrosine- histidine biaryl bridge: aciculitins, cyclic nonapeptides decorated with glycolipids, from the marine sponge Aciculites orientalis. [9] They contain a carbon-carbon bond between tyrosine and histidine but with a connectivity different than in 1. Interestingly, aciculitins, which lack additional aromatic residues, show UV- absorption maxima at 270 and 310 nm, matching those of 1 and 2. [9] A few metabolites have been described that consist of a cross-linked (N- acetylated) tripeptide: K-13 (3) from Micromonospora halophytica ssp. exilisia with a Tyr-Tyr-Tyr sequence and an ether bond between the oxygen of Tyr-3 and C6 of Tyr-1; [10,11] OF-4949 (4) from Penicillinum rugulosum OF4949, which is identical to compound 3 except for asparagine in place of the central tyrosine and lack of the N-terminal acetyl group; [12] and pseudosporamide (5) from Pseudosporangium sp. RD062863, reported while writing this manuscript, with a Tyr-Pro-Trp sequence and a carbon-carbon bond between C6 of tyrosine and C- 9 of tryptophan. [13] Of note, compounds 3, 4 and 5 lack an absorption maximum at 315 nm. It should be noted that compounds 3 to 5 have been shown to consist of L-amino acids only, consistent with a ribosomal synthesis. Among the compound 1-related metabolites, 3 and 4 have been reported to inhibit the proteases angiotensin-converting enzyme (ACE) and aminopeptidase B, respectively, with apparent strict specificity, since 4 is not an ACE inhibitor. We were unable to observe inhibition of ACE or the peptidase chymotrypsin by 1 (see Figures S19 and S20), suggesting that the amino acid sequence and/or macrocycle size can strongly influence activity. Compound 5 was reported to have weak cytotoxic activity. [13] However, we could not observe growth inhibitory activity of compound 1 against a panel of bacterial and fungal strains.

122 Chapter 4

When the strain was grown in D2O-supplemented medium, compound 1 was found to be extensively labelled with deuterium (see Figure S5), indicating that its formation requires de novo amino acid synthesis. We are unaware of any biosynthetic studies on 3, 4 or 5. To get insights into the biosynthesis of compound 1, we analyzed the 7.58 Mb genome of Planomonospora sp. ID82291 [8] for the presence of antiSMASH-predicted NRPS or RiPP BGCs, [14] likely to specify a Tyr-Tyr-His peptide. However, this search proved unsuccessful, suggesting that compound 1 was formed by a BGC not recognized by the antiSMASH search tools. Searching the 6-frame translation of the ID82291 genome led to a single plausible candidate: a short open reading frame encoding the pentapeptide MRYYH, preceded by a ribosomal binding site and followed by PLM4_2056, encoding a cytochrome P450 monooxygenase (see Figure 3A). The proposed BGC for compound 1, encoding the two genes designated bytA and bytO (see below), is similar to the recently reported BGC for tryptorubin (6), a hexapeptide containing carbon-carbon and carbon-nitrogen bonds between aromatic amino acid residues. [15,16] Also for compound 6, the BGC encodes just one cytochrome P450 enzyme and a precursor peptide, the latter however containing a 22-aa leader peptide. Similarly, the recently reported BGC for cittilin A (7), a cross-linked YIYY tetrapeptide from Myxococcus xanthus, encodes a 27-aa precursor peptide, an adjacent cytochrome P450 enzyme, a methyltransferase and a distant endopeptidase for cleavage. [17] Cytochrome P450 monooxygenases are known to install cross-links between aromatic residues of peptides, as e.g. in the NRPS-generated glycopeptides. [18] These precedents made the BGC depicted in Figure 3A as the likely candidate for compound 1 biosynthesis. Apart from a possible regulator, the five genes upstream and downstream bytAO do not partake in compound 1 biosynthesis. A similar two-gene organization was found in the genome of Planomonospora sp. ID107089, one of the producers of compound 2 (see Figure 3A). Of note, the two bytAO regions are almost identical: there is only one nucleotide (A-T) difference between the bytA genes, responsible for the Y-F difference in the peptides; both genes are preceded by equally spaced, identical

123 Chapter 4 ribosomal binding sites; the bytO sequences share 97% identity, while the bytAO intergenic regions (120 bp) are 95% identical.

Fig. 3. (A) Overview of the bytAO region for 1 and 2 biosynthesis. bytA encodes the 5-aa precursor peptide, while bytO encodes the cytochrome P450 monooxygenase PLM4_2056. A lysR family transcriptional regulator is in vicinity of bytAO and the two empty arrows represent genes outside the 1 biosynthetic cluster. The corresponding segment from the genome of Planomonospora sp. ID107089 is shown at the bottom. Note amino acid different in bytA. (B) Extracted ion chromatograms (EIC) from heterologous expression of bytAO in Streptomyces coelicolor M1152 (top) and an authentic standard of 1 (bottom). See also Figure S6.

Heterologous expression of bytAO. To prove that production of compound 1 is indeed governed by bytAO, we cloned a 1303 base pair sequence containing bytA and bytO into pSET152 and conjugated the plasmid into the expression host Streptomyces coelicolor M1152. Cultivation of S. coelicolor pSET bytAO, followed by HP20 extraction and LC-MS2 analysis, resulted in the detection of a peak with a m/z 522.199, having the same UV spectrum, retention time and MS2 spectrum as compound 1 (Figures 3B and S6). This unambiguously confirms the heterologous production of mature compound 1 by the minimal gene cluster bytAO, making it the shortest RiPP BGC so far reported. To our knowledge, the shortest reported peptide-encoding gene is the 24-bp mccA, which encodes the precursor peptide of microcin C7 in Escherichia coli, a RiPP also devoid of leader peptide. [19,20] With just 18 bp (including the stop-codon), bytA is two codons smaller than mccA and thus the smallest peptide-coding gene ever reported across the tree of life. Since heterologous expression of bytAO in Streptomyces coelicolor resulted in the mature product, we infer that compound 1

124 Chapter 4 biosynthesis recruits one or more generic enzymes, involved in trimming the additional amino acid(s) at the N-terminus and in N-acetylation. Of note, the existence of cluster-encoded proteases in RiPP BGCs from Actinobacteria seems to be the exception. [21] As an additional observation, the N-acetylated compounds depicted in Figure 2 are all produced by actinobacteria, while those with a free N-terminus were observed in fungi or myxobacteria. Biarylitides: A widespread family of RiPPs. We reasoned that the cytochrome P450 monooxygenase PLM4_2056 would be selective in installing a biaryl cross-link on short peptides, and that it could belong to a distinct phylogenetic branch of this enzyme family. If the cognate pentapeptide-encoding gene was closely linked to the cytochrome monooxygenase gene, then one could easily identify additional BGCs related to those depicted in Figure 3B. We thus conducted a search for analogous small CDSs in the flanking regions of genes related to PLM4_2056. Assuming structural conservation of the carbon-carbon bond, we restricted our search to pentapeptides with tyrosine at position 3 and tyrosine/histidine at position 5. From approximately 3 300 PLM_2056-related sequences, CDSs encoding the specified peptides were identified in approximately 200 genomes, the majority from Actinobacteria. To increase our confidence in the hits, we filtered the detected CDSs for a Shine-Dalgarno motif 6-8 bp upstream of the peptide sequence. Interestingly, such a ribosomal binding site was only detected in sequences from Actinobacteria, where it occurred in vicinity of approximately half of the detected peptides (indicated by filled circles in Figures 4 and 5). Furthermore, this clade was seen to have the only examples of replicate motifs (denoted “2x” in Figure 5). To get insight into the amino acid distribution, detected peptides were aligned using the program MUSCLE. Peptides with a preceding ribosomal binding site showed less variation, with a strong preference for a basic amino acid at position 2, in comparison to the global alignment (see Figures 4 and S7). Intriguingly, the diversity of genomic loci in which PLM_2056 homologues were found together with bytA-like motifs, suggests that these BGCs encode the production of a much wider diversity of molecules with distinct chemical modifications, as suggested by the colocalization of

125 Chapter 4 methyltransferases, sulfotransferases, ATP-grasp enzymes and other tailoring enzymes in these loci (Figure 5). Together, this represents a treasure trove for further discovery efforts in this new family. As a side observation, sequence identity between bytO and actinobacterial P450 monooxygenases from the biarylitides clade ranges between 40 and 50%. In contrast, bytO shows 20-25% identity with P450 monooxygenases involved in glycopeptide cross-links. We propose the name biarylitides for this family of N-acetylated tripeptides, carrying a cross-link between the aromatic side chains of amino acids 1 and 3. Accordingly, compounds 1, 2, K-13 and pseudosporamide would become biarylitides YYH, YFH, YYY and YPW, respectively. Since the first comprehensive literature overview of just a few years ago, [4] several new RiPP families have been uncovered, mostly through searching precursor peptide-related features in genome sequences. [22,23] RiPPs such as the biarylitides can however not be detected by current bioinformatic search strategies because of the extremely small size of the bytA gene. This highlights the importance of metabolomics approaches to unveil novel chemistry and unexpected BGCs. As shown herein and recently described in Russell and Truman, 2020, with hindsight from genomic data, these minimal RIPPs can be identified via cytochome P450 homology and screening for the short peptide sequences in many bacterial genomes. While occurring mostly in Actinobacteria, one such BGC was also detected in a Pyxidicoccus (myxobacteria) strain. While we restricted our searches to a pentapeptide MxYxY/H-encoding gene within +/- 500 bp from a cytochrome P450 monooxygenase gene, the similar architecture of BGCs for tryptorubin and cittilin A suggests that many further variations in short, cross-linked peptides exists in microbial genomes. Additional search strategies are thus likely to further expand these RiPP families.

126 Chapter 4

Fig. 4. Overview phylogeny of cytochrome P450s from a variety of bacterial genera, used in this study. KEGG orthology IDs in grey boxes highlight selected functional groupings (K20497: methyl-branched lipid omega-hydroxylase, K23138: Family CYP109, K23139: Family CYP110, K05917: Steroid biosynthesis, K15468: Polyketide biosynthesis, K14338: Fatty acid degradation). Positive matches for pentapeptide motif hits are indicated as circles, where filled ones visualize a fully validated Shine-Dalgarno (SD) ribosomal binding site (RBS), while for empty ones, a SD sequence was not detected. Putative singleton hits are marked in light or dark cyan if the same motif is not seen in other P450s, with <95% amino acid identity, or if the hit overlaps with the P450 coding region, respectively. Identical motifs correspond to the same color shown in the legend. P450s likely associated with biarylitides are highlighted in orange. This clade also contains the bytAO sequence marked with a filled star. The related bytAO sequence from strain ID107089 has not been included for simplicity. Consensus sequences of all motifs with validated RBS sites are illustrated with the sequence logo at the bottom right. More annotations and interactive tree can be accessed at:https://itol.embl.de/tree/21312710611256691590408100

127 Chapter 4

Fig. 5. Summary phylogeny of cytochrome P450s, likely associated with biarylitide biosynthesis. Identical pentapeptide motif hits are shown as colored circles corresponding to the motif printed to the right of the tree. Filled and unfilled circles indicate those with and without a detected Shine-Dalgarno RBS, respectively. Selected regions of byt-like clusters are shown to the right of motifs, illustrating a variety of contexts and shared coding sequences potentially linked to the minimal P450 cluster (highlighted in blue). Expanded clades and detailed annotations are accessible at: https://itol.embl.de/tree/2131271061176991591098016.

128 Chapter 4

While the byt BGC is extremely small, it poses nonetheless some biochemical challenges for the specificity of the biaryl cross-link. The basic amino acid, often found at position -1, might act as a signal for processing. Further studies will be needed to understand the timing and sequence requirements for installing a cross-link and trimming the additional amino acid(s) at the N-terminus. Since the biaryl cross-link imparts proteolytic stability, understanding its specificity might have important applications in stabilizing peptides for different applications. Our understanding of the biological activities of biarylitides is extremely limited. The different protease inhibitory activities seen with compounds 1, 3 and 4 suggest that the amino acid sequence and/or ring size can strongly influence bioactivity. Thus, their biological properties and their role in the producing organisms await discovery.

4.4 Conclusion

We have identified cross-linked, cyclic tripeptides produced by different strains of Planomonospora and established that its formation in a heterologous host requires a minimal biosynthetic gene clusters consisting of just two genes: one encoding a pentapeptide, representing the smallest peptide-coding gene ever reported, and the other encoding a cytochrome P450 monooxygenases. Phylogenomic analysis of related cytochrome P450 sequences revealed that this class of minimal RiPP gene clusters is widespread in actinobacterial genomes.

4.5 Author Contributions

M.M.Z., S.D. and M.S. designed work, identified metabolites, analyzed data and wrote the paper; M.M.A. and M.H.M. performed bioinformatics analyses; G.S.Z and M.C. performed heterologous expression and analyzed data; S.I.M identified metabolites and analyzed data.

129 Chapter 4

4.6 Acknowledgements

The authors thank Stefan Kehraus for measuring optical rotation of compound 1 and Arianna Tocchetti for helping with protease assays. This work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No.721484 (Train2Target). M.G.S.Z. acknowledges the DAAD for a Ph.D scholarship. M.C. received funding from the German Research Foundation (DFG) grant nr. FOR2372. M.A. and M.H.M. were supported by an ERA NET CoBiotech (Bestbiosurf) grant through the Netherlands Organization for Scientific Research (NWO) [053.80.739].

4.7 Experimental Section

General Experimental Procedures. Optical rotation (H2O, pH10) was measured at 589 nm on a P2000 Jasco (Dunmow, UK) polarimeter. Mono- and bidimensional NMR spectra were measured in DMSO-d6 or CD3OD at 298K using a Bruker Avance 300 MHz spectrometer. LR-ESIMS data were recorded on an LCQ Fleet (Thermo Scientific) mass spectrometer, coupled to a Dionex UltiMate 3000 HPLC system (Thermo Scientific). Separation was achieved on an Atlantis -1 T3 C18 column (5 µm, 4.6 mm x 50 mm) at a flow rate of 0.8 mL x min and maintained at 40 °C. The injection volume was set at 10 µL. Elution was conducted with a 0.05% (v/v) trifluoroacetic acid in H2O-MeCN gradient as follows: 0-1 min (10% MeCN), 1-7 min (10-95% MeCN), 7-12 min (95% MeCN). The ionization was carried out using an electrospray ionization source in the positive mode (range m/z 110-2000), as described elsewhere. [24] HR-ESIMS spectra were recorded using a Dionex UltiMate 3000 HPLC system (Thermo Scientific) coupled to a micrOTOF III (Bruker) mass spectrometer. Separation was achieved on an -1 Atlantis T3 C18 column (5 µm, 4.6 mm x 50 mm) at a flow rate of 0.3 mL x min and maintained at 25 °C. The injection volume was set at 5 µL. Elution was conducted with a 0.1% (v/v) acetic acid in H2O-0.1% (v/v) acetic acid in MeCN gradient as follows: 0-1 min (10% MeCN), 1-20 min (10-100% MeCN), 20-26 min (100% MeCN). The ionization was carried out using an electrospray ionization source in the positive mode (range m/z 100-3000), as described elsewhere. [8] Pre-fractionation was performed on a CombiFlash system (Teledyne ISCO), using -1 a Biotage SNAP Ultra HP-Sphere C18 25 µm 12 g at 15 mL x min . Elution was conducted with a H2O-MeCN gradient as follows: 0-5 min (5% MeCN), 5-20 min (5-30% MeCN), 20-25 min (30-100% MeCN) and 25-30 min (100% MeCN). Semi- preparative RP-HPLC-UV separations were performed on a Shimadzu LC-2010 CHT instrument, equipped with a 250/10 Nucleodur C18 Pyramid 5M column (Macherey-Nagel) and a C18 pre-column, using the built-in UV-vis detector at 270

130 Chapter 4 and 315 nm. Elution was conducted using an isocratic method with 90% 0.05% trifluoroacetic acid in water (pH 2.5)-10% MeCN at 3 mL x min-1, maintaining a column temperature of 40 °C. All chromatographic steps were performed using purified water (MilliQ, Merck). All solvents and reagents were purchased from Sigma-Aldrich. Bacterial Strains and Growth Conditions. Planomonospora strains from frozen stocks (-80 °C) were propagated on S1 [25] plates at 28 °C for 2-3 weeks. The grown mycelium was then homogenized with a sterile pestle and used to inoculate 15 mL of AF medium (20 g/L dextrose, 8 g/L soybean meal, 2 g/L yeast extract, 4 g/L CaCO3, 1 g/L NaCl, pH 7.2) in a 50 mL baffled flask. After cultivation on a rotary shaker (200 rpm) at 30 °C for 72 h, 1.5 ml of the exponentially growing culture was used to inoculate each 15 mL of AF medium, as described. [8,25] After 7 days of cultivation as before, cultures were harvested and extracted by centrifugation and the supernatant, after adsorption on 200 µL of HP20 resin (Diaion) for 1 h, was eluted with 200 µL of methanol (Sigma). Samples were analyzed by LC-MS as described below. For isolation and purification of 1, 1.5 mL of a frozen stock of strain ID82291 was used to inoculate 15 mL AF medium in a 50 mL baffled flask. After 72 h, 10 mL were used to inoculate 100 mL fresh AF medium in a 500 mL baffled flask. After further 72 h, 100 mL was used to inoculate 900 mL AF medium in a 3 L flask, containing 50 g 5-mm glass beads, and cultivated for 7 days as described. [8,25] Cultivation in D2O. 1.5 mL of a frozen stock culture of strain ID82291 was used to inoculate 15 mL of AF medium in a 50 mL baffled flask. After 72 h, 1.5 mL were used to inoculate each 15 mL of AF medium and 15 mL of AF medium prepared with 30% D2O (v/v), replacing H2O, in 50 mL baffled flasks and further cultivated under identical conditions. Production of 1 was monitored by sampling 1 mL of broth every 24 h. Samples were centrifuged and the supernatant, after adsorption on 200 µL of HP20 resin (Diaion) for 1 h, was eluted with 200 µL of methanol (Sigma). Samples were analyzed by LR-LC-ESIMS as described in the General Experimental Procedures. Isolation and Purification of Compound 1. After 7 days, the culture (prepared as described above) was first centrifuged and then filtered, and the resulting cleared broth was adsorbed on 200 mL HP20 resin (Diaion). The loaded resin was washed with demineralized water and eluted with aqueous 50% MeOH (v/v). The eluate was brought to dryness. The dried extract was taken up in 10% MeCN/90% water (v/v) and pre-fractionated on a Biotage SNAP Ultra HP-Sphere C18 25 µm 12 g column as described in the General Experimental Procedures. After screening fractions by LR-LC-ESIMS as described in the General Experimental Procedures, 1-containing fractions were dried, taken up in 10% MeCN/90% 0.05% trifluoroacetic acid in water (v/v) and processed further by semi-preparative HPLC on a 250/10 Nucleodur C18 Pyramid 5M column (Macherey-Nagel) and a C18 pre-column, as described above. Due to co-eluting contaminants, the semi-purified substance was purified for a second time under the same conditions, except for the change of phase A to water. Drying of the eluent yielded 5 mg of 1. Biarylitide YYH (1). white crystalline powder; [α]D = -165 (c 0.11, H2O pH 10, 25 °C); 1H and 13C NMR, Table S2; HR-ESI-MS m/z found 522.198 [M+H]+

131 Chapter 4

+ (calcd for C26H27N5O7, 522.1988); HR-ESI-MS/MS ([M+H] ) m/z (%) 269.139 (100), 359.134 (84), 315.143 (69), 494.201 (66), 289.128 (61), 285.133 (56), 331.140 (54), 243.122 (51), 313.129 (37), 448.198 (27), 476.191 (17), 228.111 (10), 406.181 (9), 214.094 (9), 199.081 (7). Hydrolysis of 1. A small amount of purified compound 1 (< 1 mg) was dissolved in 250 µL 0.1 M sodium borate buffer (pH 9), followed by addition of 250 µL fresh 12M HCl (Sigma-Aldrich). The solution was heated to 120 °C and sampled at 24, 48 and 96 h. LR-LC-ESIMS measurements for analysis of 1 were performed as described in the General Experimental Procedures, with alterations in elution gradient and mass range. Gradient: 0-1 min (0% MeCN), 1-7 min (0- 25% MeCN), 7-8 min (25-95% MeCN), 8-12 min (95% MeCN). Mass range: m/z 110-1000. Cloning and Heterologous Expression of bytAO. A DNA stretch containing bytAO (1303 bp) was cloned into pSET152_ermE* via Gibson assembly. Linear DNA fragments were generated by PCR using primers designed to include 20 bp overlap sequences (pSET_fwd: CCTCTCTAGAGTCGACCTGCAGC, pSET_rev: CCTTCCGTACCTCCGTTGCT, bytAO_fwd: AGCAACGGAGGTACGGAAGGAGGAGGTGTGCGATGCGC and bytAO_rev: GCAGGTCGACTCTAGAGAGGCTAGCGGGGGAGAAGGAC). Purified linear vector and insert were mixed (ratio 1:3) with 15 µL of Gibson assembly master mixture (1X ISO buffer, 10 U/L T5 exonuclease, 2 U/L Phusion polymerase, and 40 U/µL Taq ligase) and incubated at 50 °C for 1 h. Then, 5 µL of reaction mix were used to transform chemically competent E. coli DH5 cells, which were plated onto LB agar supplemented with 50 µg/mL apramycin. Positive clones containing the plasmid pSET-bytAO were selected by colony PCR (primers: pSET-bytAO_fwd: GCGTCGATTTTTGTGATGCTCG and pSET- bytAO_rev: CAGCGAATTCGGAAAACGGC) and confirmed using restriction digest and sequencing. pSET-bytAO was conjugated into Streptomyces coelicolor M1152 using a standard intergeneric conjugation protocol [26] with the methylation-deficient E. coli ET12567/pUZ8002/pSET-bytAO as the donor. Cells were spread onto MS agar (20 g/L soya flour, 20 g/L mannitol, 20 g/L agar, 50 mM MgCl2) and incubated at 30 °C. After 20 h, 0.5 mg nalidixic acid and 1 mg apramycin were overlaid on the plates and incubation continued until ex- conjugants showed up. Single colonies were spread on fresh plates containing 25 mg/L nalidixic acid and 50 mg/L apramycin to confirm the ex-conjugants. Colonies containing bytAO were validated by colony PCR and cultivated in TSB broth for 12 days at 30 °C. Cultures were extracted as described above and analyzed by HR-LCMS, as described in the General Experimental Procedures. Protease Activity Assay. Angiotensin I Converting Enzyme (ACE) inhibition was measured using the Sigma-Aldrich Angiotensin I Converting Enzyme (ACE) Activity Fluorescence Assay Kit (CS0002), following the product information bulletin. Briefly, a serial dilution of compound 1, starting at 100 µM, was tested for inhibitory activity against ACE. As a positive control, teptrotide (Sigma-Aldrich A0773) at 100 µM was used. Reactions were incubated at 37 °C and measured every 5 min, using a PerkinElmer Enspire Multimode Plate Reader. All reactions were performed in duplicates and the mean calculated. α- chymotrypsin inhibition was measured by a fluorescence-based assay. [27]

132 Chapter 4

Briefly, a serial dilution of compound 1, starting at 100 µM, was tested for inhibitory activity against α-chymotrypsin from bovine pancreas (Sigma-Adrich C4129). As a positive control, chymostatin from microbial origin (Sigma-Aldrich C7268) at 40 µg/mL was used. Reactions were incubated at 25°C and measured every 5 min, using a PerkinElmer Enspire Multimode Plate Reader. All reactions were performed in duplicates and the mean calculated. Antibacterial and Antifungal Assays. For antibacterial testing, agar diffusion assays were prepared as follows: In 15 mL liquified Mueller-Hinton Agar at ambient temperature, 105 CFU/mL of the testing strain were inoculated. 5 µL of partially purified compound 1 with an estimated concentration of 0.1 mg/mL (dissolved in 10% DMSO/Water) were spotted on the solidified agar plate, alongside with 5 µL of a 5 mg/mL apramycin solution (dissolved in 10% DMSO/Water) as positive control and 5 µL of 10% DMSO/Water as negative control. Plates were inoculated at 30 °C for 18-20 h. Test strains were Acinetobacter baumannii L3030, Escherichia coli G1655, Escherichia coli ∆tolC L4242, Moraxella catarrhalis L3292, Klebsiella pneumoniae L3392, Pseudomonas aeruginosa ATCC27853 and Staphylococcus aureus L100. For antifungal testing, agar diffusion assays were prepared as follows: in 15 mL liquified Sabouraud agar at ambient temperature, 104 CFU/mL of the testing strain were inoculated. Compound 1 and negative control were spotted as described above. For the positive control, 5 µL of a 5 mg/mL amphotericin B solution (dissolved in 10% DMSO/Water) were used. The plate was inoculated at 30 °C for 18-20 h. Test strain was Candida albicans L145. Strains with L-prefixes were taken from the Naicons library. Phylogenetic Analyses. A set of complete genomes from the antiSMASH database were used along with top blast results to PLM4_2056 from Planomonospora sp. ID82291 (NCBI reference JABTEX000000000). Cytochrome P450 enzymes were identified via a hidden Markov model search [28] using the Pfam (PF00067) P450 model and trusted bit-score cutoffs. After ensuring model and query alignment coverage over 70%, similar sequences (>95% amino acid ID) were clustered via an all vs. all blast and one representative was used for the final dataset, which yielded over 3,300 sequences. Any hit to the motif search analysis (see below) were also included in this dataset. All sequences were aligned via the tool MAFFT [29] using the local iterative (-linsi) option followed by alignment trimming using the trimal application with automated option to improve maximum-likelihood tree inference. Due to the large number of sequences the tool FastTree [30] was used to infer the final phylogenetic gene tree. Peptide Motif Search. A custom python script (https://git.wur.nl/snippets/65) was used to examine 6-frame translations of corresponding genomic regions +/- 500 bp of identified cytochrome P450 enzymes. This reported any motif matching MxYx[Y/H]*, where “x” is any amino acid and * is a stop codon. Reported positions were then cross-referenced to highlight motifs occurring up/downstream of the P450 CDS. An additional screen of these results for the Shine-Dalgarno sequence “AGGAGG” was preformed upstream of all motif positions to annotate RBS sites. Identified peptides were mapped onto the generated, P450-based phylogenetic tree and visualized using

133 Chapter 4 the iTOL-webserver (https://itol.embl.de/). [31] The sequences of the biarylitide- YYH and biarylitide-YFH gene clusters have been deposited in the NCBI under the accession number GenBank MW201788 and MW201789, respectively.

4.8 References

1 Yera, E. R.; Cleves, A. E.; Jain, A. N. Chemical Structural Novelty: On-Targets and Off-Targets. J. Med. Chem. 2011, 54 (19), 6771–6785. 2 Gfeller, D.; Michielin, O.; Zoete, V. Shaping the Interaction Landscape of Bioactive Molecules. Bioinformatics 2013, 29 (23), 3073–3079. 3 Chen, Y.; Mathai, N.; Kirchmair, J. Scope of 3D Shape-Based Approaches in Predicting the Macromolecular Targets of Structurally Complex Small Molecules Including Natural Products and Macrocyclic Ligands. J. Chem. Inf. Model. 2020, 60 (6), 2858–2875. 4 Arnison, P. G.; Bibb, M. J.; Bierbaum, G.; Bowers, A. A.; Bugni, T. S.; Bulaj, G.; Camarero, J. A.; Campopiano, D. J.; Challis, G. L.; Clardy, J.; Cotter, P. D.; Craik, D. J.; Dawson, M.; Dittmann, E.; Donadio, S.; Dorrestein, P. C.; Entian, K.-D.; Fischbach, M. A.; Garavelli, J. S.; Göransson, U.; Gruber, C. W.; Haft, D. H.; Hemscheidt, T. K.; Hertweck, C.; Hill, C.; Horswill, A. R.; Jaspars, M.; Kelly, W. L.; Klinman, J. P.; Kuipers, O. P.; Link, A. J.; Liu, W.; Marahiel, M. A.; Mitchell, D. A.; Moll, G. N.; Moore, B. S.; Müller, R.; Nair, S. K.; Nes, I. F.; Norris, G. E.; Olivera, B. M.; Onaka, H.; Patchett, M. L.; Piel, J.; Reaney, M. J. T.; Rebuffat, S.; Ross, R. P.; Sahl, H.-G.; Schmidt, E. W.; Selsted, M. E.; Severinov, K.; Shen, B.; Sivonen, K.; Smith, L.; Stein, T.; Süssmuth, R. D.; Tagg, J. R.; Tang, G.-L.; Truman, A. W.; Vederas, J. C.; Walsh, C. T.; Walton, J. D.; Wenzel, S. C.; Willey, J. M.; Van Der Donk, W. A. Ribosomally Synthesized and Post-Translationally Modified Peptide Natural Products: Overview and Recommendations for a Universal Nomenclature. Nat. Prod. Rep. 2013, 30 (1), 108–160. 5 Funk, M. A.; Van Der Donk, W. A. Ribosomal Natural Products, Tailored to Fit. Acc. Chem. Res. 2017, 50 (7), 1577–1586. 6 Russell, A. H.; Truman, A. W. Genome Mining Strategies for Ribosomally Synthesised and Post-Translationally Modified Peptides. Comput. Struct. Biotechnol. J. 2020, 18, 1838–1851. 7 Monciardini, P.; Iorio, M.; Maffioli, S.; Sosio, M.; Donadio, S. Discovering New Bioactive Molecules from Microbial Sources. Microb. Biotechnol. 2014, 7 (3), 209–220. 8 Zdouc, M. M.; Iorio, M.; Maffioli, S. I.; Crüsemann, M.; Donadio, S.; Sosio, M. Planomonospora: A Metabolomics Perspective on an Underexplored Actinobacteria Genus. J. Nat. Prod. 2021, 84 (2), 204-219. 9 Bewley, C. A.; He, H.; Williams, D. H.; Faulkner, D. J. Aciculitins A-C: Cytotoxic and Antifungal Cyclic Peptides from the Lithistid Sponge Aciculites orientalis. J. Am. Chem. Soc. 1996, 118 (18), 4314–4321.

134 Chapter 4

10 Kase, H.; Kaneko, M.; Yamada, K. K-13, a Novel Inhibitor of Angiotensin I Converting Enzyme Produced by Micromonospora halophytica subsp. exilisia. J. Antibiot. 1987, 40 (4), 450–454. 11 Yasuzawa, T.; Shirahata, K.; Sano, H. K-13, a Novel Inhibitor of Angiotensin I Converting Enzyme Produced by Micromonospora halophytica subsp. exilisia. J. Antibiot. 1987, 40 (4), 455–458. 12 Sano, S.; Ikai, K.; Katayama, K.; Takesako, K.; Nakamura, T.; Obayashi, A.; Ezure, Y.; Enomoto, H. OF4949, New Inhibitors of Aminopeptidase B. J. Antibiot. 1986, 39 (12), 1685–1696. 13 Saito, S.; Atsumi, K.; Zhou, T.; Fukaya, K.; Urabe, D.; Oku, N.; Karim, M. R. U.; Komaki, H.; Igarashi, Y. A Cyclopeptide and Three Oligomycin-Class Polyketides Produced by an Underexplored Actinomycete of the Genus Pseudosporangium. Beilstein J. Org. Chem. 2020, 16, 1100–1110. 14 Blin, K.; Wolf, T.; Chevrette, M. G.; Lu, X.; Schwalen, C. J.; Kautsar, S. A.; Suarez Duran, H. G.; De Los Santos, E. L. C.; Kim, H. U.; Nave, M.; Dickschat, J. S.; Mitchell, D. A.; Shelest, E.; Breitling, R.; Takano, E.; Lee, S. Y.; Weber, T.; Medema, M. H. AntiSMASH 4.0 - Improvements in Chemistry Prediction and Gene Cluster Boundary Identification. Nucleic Acids Res. 2017, 45 (W1), W36–W41. 15 Wyche, T. P.; Ruzzini, A. C.; Schwab, L.; Currie, C. R.; Clardy, J. Tryptorubin A: A Polycyclic Peptide from a Fungus-Derived Streptomycete. J. Am. Chem. Soc. 2017, 139 (37), 12899–12902. 16 Reisberg, S. H.; Gao, Y.; Walker, A. S.; Helfrich, E. J.; Clardy, J.; Baran, P. S. Total Synthesis Reveals Atypical Atropisomerism in a Small-Molecule Natural Product, Tryptorubin A. Science 2020 367 (6476) 458-463. 17 Hug, J. J.; Dastbaz, J.; Adam, S.; Revermann, O.; Koehnke, J.; Krug, D.; Müller, R. Biosynthesis of Cittilins, Unusual Ribosomally Synthesized and Post-Translationally Modified Peptides from Myxococcus xanthus. ACS Chem. Biol. 2020, 15 (8), 2221–2231. 18 Greule, A.; Stok, J. E.; De Voss, J. J.; Cryle, M. J. Unrivalled Diversity: The Many Roles and Reactions of Bacterial Cytochromes P450 in Secondary Metabolism. Nat. Prod. Rep. 2018, 35 (8), 757–791. 19 González-Pastor, J. E.; San Millán, J. L.; Moreno, F. The Smallest Known Gene. Nature 1994, 369 (6478), 281. 20 Guijarro, J. I.; González-Pastor, J. E.; Baleux, F.; San Millán, J. L.; Castilla, M. A.; Rico, M.; Moreno, F.; Delepierre, M. Chemical Structure and Translation Inhibition Studies of the Antibiotic Microcin C7. J. Biol. Chem. 1995, 270 (40), 23520–23532. 21 Chen, S.; Xu, B.; Chen, E.; Wang, J.; Lu, J.; Donadio, S.; Ge, H.; Wang, H. Zn-Dependent Bifunctional Proteases Are Responsible for Leader Peptide Processing of Class III Lanthipeptides. Proc. Natl. Acad. Sci. U.S.A. 2019, 116 (7), 2533–2538. 22 Kloosterman, A. M.; Shelton, K. E.; Van Wezel, G.; Medema, M. H.; Mitchell, D. A. RRE-Finder: A Genome-Mining Tool for Class-Independent RiPP Discovery. mSystems 2020, 5 (5), e00267-20. 23 Montalban-Lopez, M.; Scott, T. A.; Ramesh, S.; Rahman, I. R.; Van Heel, A. J.; Viel, J. H.; Bandarian, V.; Dittmann, E.; Genilloud, O.; Goto, Y.; Grande

135 Chapter 4

Burgos, M. J.; Hill, C.; Kim, S.; Koehnke, J.; Latham, J. A.; Link, A. J.; Martínez, B.; Nair, S. K.; Nicolet, Y.; Rebuffat, S.; Sahl, H.-G.; Sareen, D.; Schmidt, E. W.; Schmitt, L.; Severinov, K.; Süssmuth, R. D.; Truman, A. W.; Wang, H.; Weng, J.-K.; Van Wezel, G. P.; Zhang, Q.; Zhong, J.; Piel, J.; Mitchell, D. A.; Kuipers, O. P.; Van der Donk, W. A. New Developments in RiPP Discovery, Enzymology and Engineering. Nat. Prod. Rep. 2021, Advance Article. 24 Iorio, M.; Tocchetti, A.; Cruz, J. C. S.; Del Gatto, G.; Brunati, C.; Maffioli, S.; Sosio, M.; Donadio, S. Novel Polyethers from Screening Actinoallomurus Spp. Antibiotics 2018, 7 (2), 47. 25 Donadio, S.; Monciardini, P.; Sosio, M. Approaches to Discovering Novel Antibacterial and Antifungal Agents. Methods Enzymol. 2009, 458, 3–28. 26 Du, L.; Liu, R.-H.; Ying, L.; Zhao, G.-R. An Efficient Intergeneric Conjugation of DNA from Escherichia coli to Mycelia of the Lincomycin-Producer Streptomyces lincolnensis. Int. J. Mol. Sci. 2012, 13 (4), 4797–4806. 27 Pastor, I.; Vilaseca, E.; Madurga, S.; Garcés, J. L.; Cascante, M.; Mas, F. Effect of Crowding by Dextrans on the Hydrolysis of N-Succinyl-L-Phenyl-Ala- p-Nitroanilide Catalyzed by 훼-Chymotrypsin. J. Phys. Chem. B 2011, 115 (5), 1115–1121. 28 Finn, R. D.; Clements, J.; Eddy, S. R. HMMER Web Server: Interactive Sequence Similarity Searching. Nucleic Acids Res. 2011, 39, W29–W37. 29 Katoh, K.; Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30 (4), 772–780. 30 Price, M. N.; Dehal, P. S.; Arkin, A. P. FastTree 2–approximately Maximum- Likelihood Trees for Large Alignments. PloS one 2010, 5 (3), e9490. 31 Letunic, I.; Bork, P. Interactive Tree of Life (ITOL) V4: Recent Updates and New Developments. Nucleic Acids Res. 2019, 47 (W1), W256–W259.

136 Chapter 4

4.9 Supplemental Information

Fig. S1. (A) Molecular cluster showing the occurrence of compound 1 (m/z 522.19) and compound 2 (m/z 506.2), as well as two minor, uncharacterized congeners detected only in strain ID46116. (B) LC-LR-ESIMS traces of compounds 1 and 2. The slight difference in retention time for strain ID46116 can be explained by retention time drift.

137 Chapter 4

Fig. S2. NMR spectroscopic data for 1, recorded in CD3OD and DMSO-d6.

138 Chapter 4

Fig. S3. LC-MS-traces of hydrolysis of compound 1 (m/z 522). After 24 h, deacetylation (m/z 480) can be observed, while at 72 h, hydrolysis and loss of the central tyrosine (m/z 335) has occurred.

139 Chapter 4

Fig. S4. HR-MS2 fragmentation patterns of biarylitides YYH (compound 1) and YFH (compound 2), with the only difference between them indicating a phenylalanine instead of a tyrosine loss.

Fig. S5. (A) Cultivation of strain ID82291 in D2O-supplemented medium leads to incorporation of deuterium into compound 1 (m/z 522.2), resulting in Gaussian isotope pattern in ESIMS. (B) In the D2O-free control, a monotonic isotope pattern can be seen.

140 Chapter 4

Fig. S6. Comparison of HR-MS2 fragmentation spectra of (A) compound 1 from heterologous expression in Streptomyces coelicolor M1152 and (B) compound 1 isolated from Planomonospora sp. ID82291.

141 Chapter 4

Fig. S7. Amino acid distribution in pentapeptides, detected in bioinformatics search. (A) Distribution of amino acids in pentapeptides, preceded by a Shine- Dalgarno ribosomal binding site. (B) Distribution in all pentapeptides. Note that variance in amino acids in (B) is higher than in (A).

142 Chapter 4

1 Fig. S8. H of 1 in CD3OD.

143 Chapter 4

Fig. S9. 1H of 1 in DMSO-d6.

144 Chapter 4

Fig. S10. COSY of 1 in CD3OD.

Fig. S11. TOCSY of 1 in CD3OD.

145 Chapter 4

Fig. S12. HSQC of 1 in CD3OD.

Fig. S13. HMBC of 1 in CD3OD.

146 Chapter 4

Fig. S14. Selective band HSQC of 1 in CD3OD.

Fig. S15. Selective band HMBC of 1 in CD3OD

147 Chapter 4

Fig. S16. COSY of 1 in DMSO-d6.

Fig. S17. TOCSY of 1 in DMSO-d6.

148 Chapter 4

Fig. S18. NOESY of 1 in DMSO-d6.

Fig. S19. Zoom into the amidic region in a combined NOESY-TOCSY experiment of 1 in DMSO-d6.

149 Chapter 4

Fig. S20. ACE-inhibition assay for biarylitide YYH (compound 1). MAX Fluorescence denotes the negative control, while the inhibitor teprotide was used for positive control. No inhibitory activity by compound 1 could be observed.

Fig. S21. α-chymotrypsin-inhibition assay for biarylitide YYH (compound 1). MAX Fluorescence denotes the negative control, while the inhibitor chymostatin was used for positive control. No inhibitory activity by compound 1 could be observed.

150 General Discussion

5

General Discussion

151 General Discussion

n this thesis, we investigated the less explored actinomycetal genus I Planomonospora to identify novel natural products. Utilizing publicly available and ad hoc developed genomic and metabolomic tools, we examined 72 strains of this genus for their phylogenetic status as well as metabolomic fingerprints. Applying a dereplication workflow that integrated an in house in silico fragmentation database alongside experimental databases, we identified several metabolites that were not previously reported for this genus. Combining phylogenetic data with the mining of publicly available mass spectrometry datasets allowed us to prioritize metabolites by rareness and uniqueness, thus independent from detected bioactivity. This led to the discovery of the biarylitides, a group of previously unknown cyclic peptides. We demonstrated that their biosynthesis is governed by minimal BGCs, containing only two genes. Interestingly, the precursor peptide is encoded by the smallest coding gene so far reported, comprising only 18 base pairs. Using bioinformatics, we were able to identify similar two-gene BGC in other bacterial genomes, indicating a widespread yet unknown gene cluster family. [1] Chapter 2. In this chapter, we describe our efforts in amending the proprietary ABL database with in silico-calculated tandem mass fragmentation spectra, to allow its application in automated dereplication workflows (described in chapter 3). We predicted the fragmentation of approximately 13 000 actinobacterial metabolites, using the publicly available tool CFM-ID. [2] We accessed prediction fidelity of the algorithm by comparison of several in silico fragmented compounds with their experimentally determined fragmentation. In our sample, we observed that the fragmentation of compounds with macrocycles was not predicted satisfactory. We suspect that the program could not determine the initial point of fragmentation of the macrocycle. This was exemplified by daptomycin, where the fragmentation of the exocyclic tail of the molecules was predicted correctly, while the cyclic part was not. However, the observed trends might have occurred only in the metabolites in our sample, and a larger, randomly selected sample would have been necessary to access prediction fidelity quantitatively. Future work might include a comparison of the

152 General Discussion performance of multiple publicly available tools, with the selection of the most suitable one for fragmentation prediction of natural products. Based on the observed trends, we conclude that the calculated fragmentation spectra can be a useful addition to dereplication workflows, even though the prediction fidelity of some classes of metabolites might be questionable. Chapter 3. Here, we characterized Planomonospora strains using a “paired-omics” approach, combining genomic and metabolomic information. Using a dereplication workflow, we annotated several known metabolites and determined that they were specific to certain phylogroups. Applying subsequent exclusion criteria to metabolites detected by mass spectrometry, we prioritized phylogenetically specific but rarely occurring signals for further isolation and characterization, as described in chapter 4. For our investigation, we have chosen the underexplored genus Planomonospora, with only few isolates reported in public strain libraries. Favorably, we found approximately 300 strains, categorized as Planomonospora, in the Naicons strain library. This promised a great opportunity not only for the discovery of natural products, but also for the characterization of this underexplored actinomycete genus. Mass spectrometry-based, metabolomic fingerprinting of strains supported our hypothesis that in Planomonospora, phylogenetic diversity correlated with metabolic diversity. Most metabolites were phylogroup-specific. For example, the thiopeptide siomycin A could only be detected in strains of phylogroup S, while desferrioxamine E was specific to phylogroup C. A notable exception was phylogroup A2, where strains shared only few phylogroup-specific metabolites, despite close phylogenetic relationships. Also, the average number of features per strain in A2 was much lower than in phylogroups C, S or V2. We assume that specific nutritional requirements of strains of phylogroup A2 were not catered, leading to sub-optimal growth and low numbers of metabolites. Regarding strain-specific metabolites, specimen in phylogroups V2 and C were especially productive. Interestingly, in phylogroup V2, all strains had a considerable number of unique features associated to them, despite identical 16S

153 General Discussion rRNA sequence. This was also observed for strains ID1299671 and ID1299672, originating from the same soil sample. Similarly, in phylogroup S, several strains with identical 16S rRNA sequences had unique features associated to them. We conclude that in Planomonospora, 16S rRNA can be used as a marker to predict metabolomic diversity, since different phylogroups show remarkably different metabolomic fingerprints. Still, genomic or metabolomic investigations are required to provide information about the chemical diversity present. While the phylogenetic distance between most of the isolates and the nearest type strain was small, phylogroup V2 diverged. When we analyzed the genome of representative strain ID67723, average nucleotide identity and multilocus-analysis of the genome placed it closer Planobispora rosea than to Planomonospora sphaerica. The results suggest that taxonomic relationships between the two genera Planomonospora and Planobispora, both belonging to the family Streptosporangiaceae, require re-evaluation. In this regard, it would have been interesting to characterize strain ID67723 taxonomically and to formally confirm it as an independent species. Also, while 16S rRNA was sufficient to determine diversity for the purpose of this study, full genome sequences of additional Planomonospora strains would have also been beneficial to clarify phylogenetic relationships. Our efforts in metabolite annotation provided considerable insight into the pan-metabolome of this under-explored genus. Apart from traditional dereplication methods, the mining and pairing of metabolomic and genomic data proved to be especially beneficial. Using the GNPS platform, we were able to graphically display the chemistry detected and to investigate metabolites based on metadata such as phylogeny. We identified several undescribed congeners of siomycin by mining for specific MS2 fragments, using the tool MS2LDA. Applying the program BiG-SCAPE, we showed that a single BGC was responsible for the production of several ureylene-containing peptides, many of which were yet unreported. Further, we described several features that belonged to a new family of erythrochelin-like siderophores and associated it with a BGC. Although most

154 General Discussion annotations remained putative and require additional characterization efforts, they may represent interesting starting points for future investigations. In the prioritization workflow described at the end of chapter 3, we focused on phylogroup-specific Planomonospora metabolites that could not be detected in any other sample the Naicons extracts library. This strategy led to the discovery of the biarylitides described in chapter 4. While the approach is useful for the discovery of novel metabolites, its disadvantage is the lack of information about the abundance of a metabolite. In mass spectrometry, signal intensity is not necessarily proportional to quantity and the risk of selecting a minute compound should be kept in mind when applying this strategy. [3] Chapter 4. In this chapter, we describe the discovery of the biarylitides, metabolites specific to a distinct branch of phylogroup “A2”. We proved that a minimal RiPP BGC, containing a tiny precursor peptide (bytA) and a cytochrome P450 monooxygenase (bytO), sufficed to produce the mature product in a heterologous host. To our knowledge, the bytA gene constitutes the shortest coding gene yet reported. Bioinformatics investigation indicated that similar BGCs are relatively widespread, appearing in approximately 200 bacterial genomes. While we characterized the chemical structure of biarylitide YYH by mass spectrometry and NMR, its absolute conformation remains unclear. Even though the ribosomal origin of the compound suggests L-conformation of all amino acids,

RiPPs containing D-amino acids have been reported as well. [4] First attempts to characterize the metabolite by chemical derivatization and X-ray crystallography were halted due to the low amount of compound at hand. Improvements in compound availability would be of special importance for further investigations. This could be achieved by optimization of both medium and isolation protocol or over-expression in a heterologous host. Another approach, addressing both availability and characterization, would be total synthesis of the compound. The small size of the compound, alongside commercially available building blocks, should allow for a time-efficient synthesis strategy. Traditionally, activity finding is a lengthy and resource-intense process that relies on the brute-force testing of the compound against different assays until

155 General Discussion activity is found. In recent years, various new methodologies for the prediction of the molecular target of a compound have been developed, ranging from computational tools [5,6] to innovative phenotype-based assays that integrate multiple layers of data. [7] Even though predictions must be verified experimentally, such tools can guide researchers and reduce expenses. Therefore, a bioactivity finding study for biarylitide YYH can be envisioned that applies state-of-the-art tools for the prediction of activity, with consecutive experimental verification. Even though heterologous expression of the bytAO BGC has led to the formation of the mature product, many aspects of its biosynthesis are still unclear. N-acetylation and cleavage of the leading amino acid(s) suggest that the biarylitide-precursor peptide recruits a N-acetyl transferase and a peptidase, respectively, outside of its BGC. Several other RiPPs that employ likewise enzymes outside their BGC have been reported. [4,8–10] The formation of the mature compound in the heterologous host Streptomyces coelicolor implies the presence of likewise enzymes in this species. For sake of complete characterization of the biosynthetic pathway, identification of both the N-acetyl transferase and the peptidase would be beneficial. An attempt to express the product in a Gram-negative heterologous host, such as E. coli, would shed further light on the biosynthesis, with possible improvements of yield of compound. Furthermore, characterization of the newly described cytochrome P450 monooxygenase BytO would illuminate the formation of the carbon-carbon bond in biarylitide YYH, which could have implications in biotechnological applications. Site-directed mutagenesis of the biarylitide YYH-precursor would allow to investigate substrate promiscuity of the enzyme. We have also identified and described biarylitide YFH, in which Tyr2 is replaced with a phenylalanine. Here, it would be interesting to see if a change to any other amino acids would also yield a product. Further, amino acids 1 or 3 could be swapped to see if BytO could deal with such a directional change. The recently reported metabolite pseudosporamide contains a carbon-carbon bond between Tyr1 and Trp3, and even though its biosynthetic pathway is yet unknown, it can be assumed to be

156 General Discussion similar to the one of biarylitides. [11] Eventually, these experiments would allow to create a library of different biarylitides, which could be further tested for bioactivity. Bioinformatics analysis have identified similar BGCs in other bacterial strains. Search for these putative products by cultivation of the strains would represent another step in understanding the ecological function of this apparently widespread, underexplored family of compounds. Alternatively, the identified BGC could be expressed in a heterologous host, which could expedite the identification of the resulting metabolite. [1] Conclusion. We described the exploration of the actinomycete genus Planomonospora, which increased our knowledge about its phylogeny and biosynthetic capacity significantly. Using state-of-the-art genomic and metabolomic tools, we have identified and connected several metabolites with their biosynthetic pathways. One of these are the biarylitides, codified by a minimal RiPP BGCs containing the shortest peptide-coding gene so far reported. This work represents a substantial contribution to the field of natural product discovery and provides multiple starting points for further studies into Planomonospora and the biarylitides.

157 General Discussion

5.1 References

1 Zdouc, M. M.; Alanjary, M. M.; Maffioli, S. I.; Crüsemann, M.; Medema, M. M.; Donadio, S.; Sosio, M. A Biaryl-Linked Tripeptide from Planomonospora Reveals a Widespread Class of Minimal RiPP Gene Clusters. Cell Chem. Biol. 2021, 28, 1–7. 2 Allen, F.; Greiner, R.; Wishart, D. Competitive Fragmentation Modeling of ESI- MS/MS Spectra for Putative Metabolite Identification. Metabolomics 2015, 11 (1), 98–110. 3 Zdouc, M. M.; Iorio, M.; Maffioli, S. I.; Crüsemann, M.; Donadio, S.; Sosio, M. Planomonospora: A Metabolomics Perspective on an Underexplored Actinobacteria Genus. J. Nat. Prod. 2021, 84 (2), 204-219. 4 Manuel; Scott, T. A.; Ramesh, S.; Rahman, I. R.; Heel, A. J. van; Viel, J. H.; Bandarian, V.; Dittmann, E.; Genilloud, O.; Goto, Y.; Grande Burgos, M. J.; Hill, C.; Kim, S.; Koehnke, J.; Latham, J. A.; Link, A. J.; Martínez, B.; Nair, S. K.; Nicolet, Y.; Rebuffat, S.; Sahl, H.-G.; Sareen, D.; Schmidt, E. W.; Schmitt, L.; Severinov, K.; Süssmuth, R. D.; Truman, A. W.; Wang, H.; Weng, J.-K.; Wezel, G. P. van; Zhang, Q.; Zhong, J.; Piel, J.; Mitchell, D. A.; Kuipers, O. P.; Donk, W. A. van der. New Developments in RiPP Discovery, Enzymology and Engineering. Nat. Prod. Rep. 2021. 5 Verma, N.; Rai, A. K.; Kaushik, V.; Brünnert, D.; Chahar, K. R.; Pandey, J.; Goyal, P. Identification of Gefitinib Off-Targets Using a Structure-Based Systems Biology Approach; Their Validation with Reverse Docking and Retrospective Data Mining. Sci. Rep. 2016, 6 (1), 1–12. 6 Hodos, R. A.; Kidd, B. A.; Shameer, K.; Readhead, B. P.; Dudley, J. T. In Silico Methods for Drug Repurposing and Pharmacology. Wiley Interdiscip. Rev. Syst. Biol. Med. 2016, 8 (3), 186–210. 7 Yoshida, M. Recent Advances in Target Identification of Bioactive Natural Products. Biosci. Biotechnol. Biochem. 2019, 83 (1), 1–9. 8 Hug, J. J.; Dastbaz, J.; Adam, S.; Revermann, O.; Koehnke, J.; Krug, D.; Müller, R. Biosynthesis of Cittilins, Unusual Ribosomally Synthesized and Post-Translationally Modified Peptides from Myxococcus xanthus. ACS Chem. Biol. 2020, 15 (8), 2221–2231. 9 Ziemert, N.; Ishida, K.; Liaimer, A.; Hertweck, C.; Dittmann, E. Ribosomal Synthesis of Tricyclic Depsipeptides in Bloom-Forming Cyanobacteria. Angew. Chem. 2008, 120 (40), 7870–7873. 10 Philmus, B.; Christiansen, G.; Yoshida, W. Y.; Hemscheidt, T. K. Post- Translational Modification in Microviridin Biosynthesis. ChemBioChem 2008, 9 (18), 3066–3073. 11 Saito, S.; Atsumi, K.; Zhou, T.; Fukaya, K.; Urabe, D.; Oku, N.; Karim, M. R. U.; Komaki, H.; Igarashi, Y. A Cyclopeptide and Three Oligomycin-Class Polyketides Produced by an Underexplored Actinomycete of the Genus Pseudosporangium. Beilstein J. Org. Chem. 2020, 16, 1100–1110.

158 Summary

6

Summary

159 Summary

acteria belonging to the order of Actinomycetales are the source of B many clinically important drugs, especially antibiotics. Due to the large back-catalogue of molecules isolated from these bacteria, it is difficult to find novel compounds by using traditional methods and sources. In this thesis, we used state-of-the-art metabolomics and genomics techniques to identify new molecules from the “rare actinomycete” genus Planomonospora. Our efforts led to the discovery of a new family of metabolites, encoded by the shortest gene so-far reported across the tree of life. While the bioactivity of the compounds remains to be discovered, our work serves as a blueprint for similar investigations, with the aim to find new drugs. In chapter 1, we introduced the biology and genetics of actinomycetes and reviewed the current knowledge about the rare genus Planomonospora. Further, we gave an overview of the current state-of-the-art in metabolome and genome mining. We also introduced the central problem of natural product drug discovery, namely, metabolite re-discovery, and possible approaches to deal with this issue. In chapter 2, we discussed methods for the differentiation of known and unknown molecules, called dereplication, in detail. Further, we described the creation of a computationally predicted tandem mass fragmentation library from a proprietary metabolite database, which could speed up and automate the identification of known compounds. Investigations into the prediction fidelity indicated limitations of the algorithm, which have to be taken into account in automated dereplication. Still, we were able to use the library to identify some metabolites in a real-world dataset. In chapter 3, we discussed our study into the “rare actinomycete” genus Planomonospora. Seventy-two strains were cultivated in different growth conditions and analyzed for the metabolites they produced. We were able to correlate phylogenetic diversity with metabolite diversity, showing that specific groups of strains produced specific molecules. Further, we sequenced the genomes of selected strains to gain insight into their theoretical biosynthetic capacity. Surprisingly, these strains showed only low similarity, despite their close

160 Summary phylogenetic relationship. Eventually, we established criteria to rationally select metabolites with a high probability of novelty, for consecutive isolation. In chapter 4, we describe the characterization of a family of novel metabolites, prioritized in the previous study. Investigation of the biosynthetic pathway suggested that the metabolite was a ribosomally synthesized and post- translationally modified peptide. Indeed, we could prove that a minimal, two-gene biosynthetic gene cluster was responsible for the production of the compound. One of the two genes is, to our knowledge, the shortest peptide-encoding gene ever reported. Bioinformatics investigations detected similar genes in other bacterial genomes, suggesting that similar compounds could be more widespread than expected. In chapter 5, we discuss the overall results of the thesis, their implications and future perspectives. The discovery of a new family of biosynthetic gene clusters, including the smallest peptide-encoding gene so far reported, has far- reaching implications in natural product research. While preliminary, this work can be the steppingstone of many investigations into the under-appreciated genus Planomonospora in particular and “rare actinomycetes” in general.

161 Summary

162 Samenvatting

7

Samenvatting

163 Samenvatting

acteriën van de orde van de Actinomycetales produceren veel B klinisch belangrijke medicijnen, met name antibiotica. Doordat in het verleden al een extreem grote diversiteit aan moleculen uit deze bacteriën geïsoleerd is, is het erg moeilijk om nieuwe soorten moleculen te ontdekken met een traditionele aanpak. In dit proefschrift hebben we de laatste nieuwe methoden op het gebied van metabolomics en genomics gebruikt om nieuwe soorten moleculen, gemaakt door het zeldzame actinomycete genus Planomonospora, te identificeren. We hebben een nieuwe familie metabolieten gevonden, die gecodeerd worden door het korste gen dat tot nu toe gevonden is in de gehele stamboom van het leven. De bioactiviteit van deze nieuw ontdekte groep moleculen is nog onduidelijk, maar ons werk kan als blauwdruk dienen voor soortgelijke onderzoeken, die erop gericht zijn nieuwe medicijnen te ontdekken. In hoofdstuk één introduceren we de actinomyceten, hun biologie en genetica en bespreken we wat tot nu toe bekend is over het zeldzame genus Planomonospora. Daarna volgt een overzicht van de laatste nieuwe methoden om te onderzoeken wat te vinden is in metabolomen en genomen. Daarnaast introduceren we het centrale probleem van het ontdekken van natuurlijke producten als medicijnen: de herontdekking van reeds bekende metabolieten en mogelijke manieren om hiermee om te gaan. Hoofdstuk 2 beschrijft het proces van dereplicatie: dat zijn methoden om onderscheid te maken tussen reeds bekende en nog onbekende moleculen. We laten zien hoe we geprobeerd hebben om een voorspellende database te ontwikkelen. Die database zou het identificeren van reeds bekende moleculen kunnen versnellen en automatiseren. De kwaliteit van de voorspellingen door deze database was echter alleen goed genoeg om moleculen met een simpele structuur te herkennen, maar zou nog steeds een nuttige bijdrage kunnen leveren aan de analyse van data. In hoofdstuk 3 bespreken we onze studie van het zeldzame actinomycete genus Planomonospora. We hebben 72 stammen gecultiveerd in verschillende groeicondities en hebben vervolgens de geproduceerde metabolieten

164 Samenvatting geanalyseerd. We konden de fylogenetische diversiteit met de metabole diversiteit correleren, wat laat zien dat specifieke groepen stammen specifieke moleculen produceren. Daarnaast hebben we de genomen van een aantal geselecteerde stammen gesequenced om inzicht te krijgen in hun biosynthetische capaciteit. Tot onze verrassing hadden deze stammen slechts kleine overeenkomsten, ondanks hun fylogenetische nabijheid. Tenslotte hebben we criteria vastgesteld voor het rationeel selecteren van metabolieten waarvan de kans groot is dat ze nog onbekend zijn, zodat we die moleculen vervolgens konden isoleren. In hoofdstuk 4 wordt de karakterisering van een familie van nieuwe metabolieten beschreven, die in het voorafgaande hoofdstuk geselecteerd waren. Nadere bestudering van de biosynthetische route suggereert dat het metaboliet waarschijnlijk is geproduceerd door een ribosoom. We konden inderdaad bewijzen dat een minimaal biosynthetische genengroep, bestaande uit twee genen, verantwoordelijk is voor de productie van dit nieuwe molecuul. Eén van deze twee genen is, voor zover wij weten, het kortste gen, coderend voor een peptide, dat ooit gerapporteerd is. Met behulp van bioinformatica konden we soortgelijke genen detecteren in andere bacteriële genomen. Dit suggereert dat soortgelijke moleculen meer wijdverspreid zouden kunnen zijn dan we verwacht hadden. In hoofdstuk 5 worden de resultaten van dit proefschrift en hun implicaties besproken en werpen we een blik op de toekomst van dit soort onderzoek. De ontdekking van een nieuwe familie biosynthetische gengroepen, inclusief het kleinste tot nu toe bekende gen dat voor een peptide codeert, zal veel invloed hebben op toekomstig onderzoek naar natuurlijk geproduceerde moleculen. Hoewel dit werk nog met enig voorbehoud is, kan het een startpunt zijn voor meer onderzoek naar het ondergewaardeerde genus Planomonospora en actinomyceten in het algemeen.

165 Samenvatting

166 Zusammenfassung

8

Zusammenfassung

167 Zusammenfassung

akterien der Ordnung Actinomycetales sind der Ursprung B zahlreicher klinisch relevanter Arzneimittel, insbesondere Antibiotika. Die großen Anzahl an Molekülen, die bereits aus diesen Bakterien isoliert worden sind, erschwert es, unbekannte Metabolite mittels traditioneller Strategien zu finden. In dieser Dissertation benutzten wir moderne Methoden der Metabolomik und Genomik um neue Moleküle in Stämmen des Aktinomyces-Genus Planomonospora zu identifiziern. Unsere Bemühungen führten zur Entdeckung einer neuen Familie von Metaboliten, codiert vom bisher kürzesten bekannten Gen. Obwohl die Suche nach der Bioaktivität des Moleküls erfolglos blieb, kann die vorliegende Arbeit als Ispiration für ähnliche Studien dienen. In Kapitel 1 stellen wir die Biologie und Genetik von Aktinomyceten sowie einen Überblick über den momentanen Wissensstand hinsichtlich des Genus Planomonospora vor. Weiters beleuchten wir aktuelle Methoden des metabolome und genome minings. Außerdem besprechen wir das zentrale Problem der Naturstoffsuche, nämlich die neuerliche Beschreibung von bereits bekannten Metaboliten, sowie Methoden zu dessen Umgehung. In Kapitel 2 beschreiben wir analytische Methoden zur Differenzierung von bekannten und unbekannten Molekülen, auch bekannt als dereplication. Wir besprechen die computergestützte Generierung einer Massenspektrometrie- Datenbank zur automatischen Erkennung von bereits bekannten Molekülen. Unsere Untersuchungen in die Vorhersagetreue des Computeralgorithmus zeigten, dass es bestimmte Schwäche gibt, die bei der automatisierten Anwendung der Datenbank in Betracht gezogen werden müssen. Nichtsdestotrotz ermöglichte uns die erzeugte Datenbank die Identifizierung einer Reihe von Metaboliten in unserem Datensatz. In Kapitel 3 stellen wir unsere Studie über den seltenen Genus Planomonospora vor. Wir kultivierten 72 Mikroorganismenstämme unter verschiedenen Bedingungen und analysierten sie mittels Massenspektrometrie. Es gelang uns, einen Zusammenhang zwischen Phylogenie und dem Vorhandensein bestimmter Moleküle herzustellen. Anders ausgedrückt,

168 Zusammenfassung produzierten eng verwandte Stämme spezifische Metaboliten, die in anderen Stämmen nicht zu beobachten waren. Weiters sequenzierten wir die Genome dreier Stämme um Einblick in die theoretische biosynthetische Kapazität von Planomonospora zu gewinnen. Interessanterweise zeigten die Stämme nur geringe Ähnlichkeit zueinander. Letztendlich etablierten wir Selektionskriterien um neue Metaboliten zu prioritisieren. In Kapitel 4 besprechen wir die Charakterisierung einer Famlie von neuartigen Peptiden, die in der vorhergehenden Studie ausgewählt wurden. Die Untersuchung des Biosyntheseweges legte nahe, dass es sich bei den neuen Molekülen um ribosomal synthethisierte und post-translational modifizierte Peptide handelt. Wir konnten beweisen, dass ein minimales biosynthetisches Gencluster mit nur zwei Genen für die Produktion des Metaboliten verantwortlich ist. Bei einem der beiden Gene handelt es sich um das bisher kürzeste bekannte Gen. Bioinformatische Analysen zeigten, dass ähnliche Metabolite wahrscheinlich auch in anderen Bakterien zu finden sind. In Kapitel 5 werden die gesamten Resultate der Dissertation zusammengefasst und besprochen. Die Entdeckung eines derart kurzen Gens könnte breite Auswirkungen auf die Naturstoffforschung haben. Weiters dient die vorliegende Dissertation als Sprungbrett für die weitere Erforschung von Planomonospora.

169 Zusammenfassung

170 Povzetek

9

Povzetek

171 Povzetek

akterije, pripadajoče razredu Aktinomicetales, so vir stevilnih B važnih zdravil, v glavnem antibiotikov. Zaradi številnih bioaktivnih molekul, ki so bile izolirane od teh bakterij, je težko odkrivati nove spojine ko se uporabljajo tradicionalne strategije. V tem delu smo uporabljali sodobne metode metabolomike in genomike, tako da bi našli nove molekule nepogostega roda Planomonospora. Odkrili smo novo družino spojin, ki so kodirane od najkrajšega doslej opisanega gena. Čeprav nismo ugotovili biološke aktivnosti spojine, lahko služi naše delo kot primer za vse ki raziskujejo na področju molekul z narave. V prvem poglavju tega dela, predstavimo biologijo in genetiko aktinomicetov ter povzamemo trenutno znanje o redkem rodu Planomonospora. Poleg tega vpeljemo bralce v sodobno stanje metod na področju metabolome in genome mining. Poglavlje zaključimo z uvodom v glavni problem raziskave kemičnih spojin z narave, namreč: ponovno odkritje že znanih snovi. Predstavimo tudi možnosti, kako se temu izogne. V drugem poglavju obravnavamo analitične metode za razlikovanje znanih in neznanih spojin, tako imenovano de-replikacijo. Opišemo ustanovitev podatkovne baze za masno spektrometrijo z informatiko. Preiskali smo napovedno zvestobo računalniškega programa in dognali omejitev algoritma. Ta se mora upoštevati v avtomatizirani de-replikaciji. Uspelo nam je identificirati število znanih spojin v naših podatkih. V tretjem poglavju obravnavamo raziskovanje roda Planomonospora. Gojili smo 72 zvrst pod različnimi pogoji in analizirali izvlečke glede produciranih metabolitov. Uspelo nam je dokazati soodvisnost med filogenetiko zvrsti ter zalotenih spojin. Ozko sorodne zvrsti producirajo določene spojine, ki jih ne vidimo pri manj sorodnih. Poleg tega smo sekvencirali genome treh zvrsti, kar je nam omogočilo upogled v biosintetsko sposobnost roda Planomonospora. Raziskavo zaključimo z ustanovitvijo kriterij, ki omogočajo izbiro naravnih spojin z visoko verjetnostjo novosti. V četrtemu poglavju opisujemo odkritje doslej neznane družine naravnih kemičnih spojin z roda Planomonospora, ki smo jih poimenovali biarylitides. Z raziskavo biosintetične poti smo dokazali, da je za produkcijo teh spojin

172 Povzetek odgovoren doslej najkrajši opisani gen. Naše računalniške preizkave kažejo, da obstajajo podobne spojine tudi v drugih rodovih bakterij. V petem poglavju obravnavamo skupne rezultate disertacije, posledice ter pomen za nadaljno delo. Odkritje najkrajšega gena ter nove biosintetske poti ima lahko pomen za celotno panogo. Ta discertacija nudi lestvico za poglobitveno delo v rod Planomonospora.

173 Povzetek

174 Acknowledgements

10

Acknowledgements

175 Acknowledgements

his thesis would have not been able without the help and assistance T from many sides, who helped me at different times of my PhD program. While I will thank only some by name, I want to assure that I am grateful to all of you. First, I want to thank my academic supervisor Prof. Tanneke den Blaauwen for her help, commitment and in guiding me through the bureaucratic mazes of the University of Amsterdam. Without your support, I would have been lost hopelessly. Second, I would like to thank my industry supervisor Dr. Margherita Sosio for her continuous support and who allowed me to explore my curiosity but guided me whenever I was about to lose track of my primary goals. Your vast experience motivated me to learn as much as possible about the fascinating world of actinomycetes, which I was mostly ignorant of before my work in your lab. Third, I want to thank Dr. Stefano Donadio for always having both an open ear and good advice for the various issues I came up with. Our discussions were highly valuable to me, since you often provided perspectives, which I would have never thought of on my own. Further, I would like to thank all my colleagues at Naicons who supported and helped me during my stay. Specifically, I want to mention: Sonia for her expertise in chemistry, natural product elucidation and especially her optimism in approaching new challenges; Arianna for instructing me in all the microbiological, molecular and enzymatic methods I was ignorant of, as well as supporting me in my endeavours in the Milanese housing market; Kristiina, for being my “sister in arms” and a good friend that I could always count on; and Marianna and Matteo, for their knowledge of NPs, dereplication and for instructing me in analytical chemistry. Also, I want to thank Dr. Max Crüsemann for the warm reception at the University of Bonn as well as fruitful discussions about all possible -omics methods. To my committee members, thank you for your time and valued feedback. To my parents and my sister: thank you for their unconditional assistance. Draga Zala in Max, brez Vaju in Vajine podpore ter ljubezni tega dela ne bi bilo.

176 Acknowledgements

Hvala, da sta mi pokazala pomenj trdega dela in vseel sem, da sem Vajin sin. To my co-author in life, Julia Schink: Danke für Deine stete emotionale Unterstützung in den vergangenen Jahren. Wann immer ich mich niedergeschlagen gefühlt habe, wusste ich, dass ich auf Dich zählen kann. Ohne Dich hätte ich es nicht geschafft. Further, I want to mention some people and institutions who helped me on my way: Laureen M. for your kindness and help; my dear ESR colleagues from the Train2Target project – Carlos, Tiago, Raquel, Jessica, Elisa, Matthias, Kara, Pilar, Elisabete, Valentina, Arancha, Federico and Carina; also, Google Scholar, GNU/Linux, the TOR project and the Sci-Hub initiative. Finally, I would like to acknowledge the support from the European Union’s Horizon 2020 research and innovation program under grant agreement No.721484 (Project name: Train2Target).

Thank you.

Mitja Maximilian Zdouc, October 2020

177 Acknowledgements

178 List of Publications

11

List of Publications

179 List of Publications

1 Zdouc, M. M.; Iorio, M.; Maffioli, S. I.; Crüsemann, M.; Donadio, S.; Sosio, M. Planomonospora: a Metabolomics Perspective on an Underexplored Actinobacteria Genus. J. Nat. Prod. 2021, 82 (2), 204-219 (doi.org/10.1021/acs.jnatprod.0c00807).

2 Zdouc, M. M.; Alanjary, M. M.; Zarazúa, G. S.; Maffioli, S. I.; Crüsemann, M.; Medema, M. M.; Donadio, S.; Sosio, M. A Biaryl-Linked Tripeptide from Planomonospora Reveals a Widespread Class of Minimal RiPP Gene Clusters. Cell Chem. Biol. 2021, 28, 1–7 (doi.org/10.1016/ j.chembiol.2020.11.009).

3 Zdouc, M. M.; Iorio, M.; Vind, K.; Simone, M.; Serina, S.; Brunati, C.; Monciardini, P.; Tocchetti, A.; Zarazúa, G. S.; Crüsemann, M.; Maffioli, S. I.; Sosio, M.; Donadio, S. J. Effective Approaches to Discover New Microbial Metabolites in a Large Strain Library. J. Ind. Microbiol. Biotechnol. 2021 (doi.org/10.1093/jimb/kuab017).

4 Iorio, M.; Davatgarbenam, S.; Serina, S.; Criscenzo, P.; Zdouc, M. M.; Simone, M.; Maffioli, S. I.; Ebright, R. H.; Donadio, S.; Sosio, M. Metabolomic Investigation of the Pseudouridimycin Producer, a Prolific Streptomycete. Sci. Rep. 2021 (doi.org/10.1101/2020.11.05.369249).

180