Gallagher and Jensen BMC Genomics (2015) 16:960 DOI 10.1186/s12864-015-2110-3

RESEARCH ARTICLE Open Access Genomic insights into the evolution of hybrid isoprenoid biosynthetic gene clusters in the MAR4 marine streptomycete clade Kelley A. Gallagher and Paul R. Jensen*

Abstract Background: Considerable advances have been made in our understanding of the molecular genetics of secondary metabolite biosynthesis. Coupled with increased access to genome sequence data, new insight can be gained into the diversity and distributions of secondary metabolite biosynthetic gene clusters and the evolutionary processes that generate them. Here we examine the distribution of gene clusters predicted to encode the biosynthesis of a structurally diverse class of molecules called hybrid isoprenoids (HIs) in the genus . These compounds are derived from a mixed biosynthetic origin that is characterized by the incorporation of a terpene moiety onto a variety of chemical scaffolds and include many potent antibiotic and cytotoxic agents. Results: One hundred and twenty Streptomyces genomes were searched for HI biosynthetic gene clusters using ABBA prenyltransferases (PTases) as queries. These enzymes are responsible for a key step in HI biosynthesis. The strains included 12 that belong to the ‘MAR4’ clade, a largely marine-derived lineage linked to the production of diverse HI secondary metabolites. We found ABBA PTase homologs in all of the MAR4 genomes, which averaged five copies per strain, compared with 21 % of the non-MAR4 genomes, which averaged one copy per strain. Phylogenetic analyses suggest that MAR4 PTase diversity has arisen by a combination of horizontal gene transfer and gene duplication. Furthermore, there is evidence that HI gene cluster diversity is generated by the horizontal exchange of orthologous PTases among clusters. Many putative HI gene clusters have not been linked to their secondary metabolic products, suggesting that MAR4 strains will yield additional new compounds in this structure class. Finally, we confirm that the mevalonate pathway is not always present in genomes that contain HI gene clusters and thus is not a reliable query for identifying strains with the potential to produce HI secondary metabolites. Conclusions: We found that marine-derived MAR4 streptomycetes possess a relatively high genetic potential for HI biosynthesis. The combination of horizontal gene transfer, duplication, and rearrangement indicate that complex evolutionary processes account for the high level of HI gene cluster diversity in these , the products of which may provide a yet to be defined adaptation to the marine environment. Keywords: Biosynthetic gene clusters, Genome evolution, Streptomyces, ABBA prenyltransferases, Secondary metabolism, Hybrid isoprenoids

* Correspondence: [email protected] Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0204, USA

© 2015 Gallagher and Jensen. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Gallagher and Jensen BMC Genomics (2015) 16:960 Page 2 of 13

Background molecules and thus play a critical role in the biosynthesis In bacteria, the genes responsible for the biosynthesis of of HI secondary metabolites. These enzymes also play secondary metabolites are typically clustered on the important roles in the biosynthesis of primary metabo- chromosome [1]. These biosynthetic gene clusters evolve lites including membrane sterols and lipoquinones [19]. rapidly relative to other genetic elements [2, 3], which One sub-group of PTases specific to secondary metabol- likely contributes to the remarkable structural diversity ism are the ABBA PTases, named in reference to the observed among secondary metabolites. While the se- ααββ structural repeats that form a large β-barrel fold lective pressures driving secondary metabolite diversifi- comprising the active center of the enzyme. ABBA cation remain largely unknown, the availability of large PTases attach isoprene moieties to aromatic substrates numbers of genome sequences has made it possible to and, to date, all characterized ABBA PTases are involved begin to identify the evolutionary mechanisms that gov- in the biosynthesis of HI secondary metabolites [20]. ern the biogenesis of structural novelty [2–4]. ABBA PTases can be further divided into two sub- The genus Streptomyces is well known as the source of groups, both of which are found in fungi and bacteria. structurally diverse secondary metabolites [5, 6]. With The first group is the indole ABBA PTases, which are re- nearly 600 named species, it is the most specious of all sponsible for the prenylation of indole substrates. One bacterial genera and comprises the majority of diversity example is the enzyme CymD, which is responsible for within the family . Streptomycetes are prenylation of the bacterial cyclic peptide cyclomarin typically saprophytic and found in terrestrial soils and [21]. The second group is the ‘Orf2’ PTases, which marine sediments. They also occur as plant endophytes, attach isoprenoid moieties to a variety of aromatic invertebrate mutualists, and human or plant pathogens substrates including phenazines, naphthoquinones, and [7, 8]. Within the genus Streptomyces, the ‘MAR4’ clade aminocoumarins. This group is named for the first has been described as a largely marine-derived lineage characterized member, later renamed ‘NphB’, which is in- [9]. Based on available sequence data, it currently en- volved in naphterpin biosynthesis [22]. Although the in- compasses 57 cultured strains and 180 cloned sequences dole and Orf2 PTases bear little sequence homology, and displays 4.1 % 16S rRNA divergence. Members of they are thought to have arisen from a common ancestor this clade consistently form two sub-clades represented [20]. To the best of our knowledge, the unrelated PTase by the type strains S. aculeolatus and S. synnematofor- CnqPT1 is the only non-ABBA PTase involved in the mans [9], which are the only named species within the production of a HI secondary metabolite. CnqPT1 is MAR4 lineage. membrane-bound and responsible for the O-prenylation MAR4 strains have previously been linked to the pro- of phenazine in the biosynthesis of marinophenazine in duction of secondary metabolites broadly classified as the MAR4 strain CNQ-509 [13]. hybrid isoprenoids (HIs) [10]. These compounds are bio- The isoprene used in bacterial terpenoid biosynthesis synthetic hybrids that derive part of their structures is derived from either the mevalonate (mev) or non- from five-carbon isoprene units. The addition of iso- mevalonate pathways [23]. Most bacteria, including prene (a process called “prenylation”) can occur on a Streptomyces, possess the non-mevalonate pathway, the variety of chemical scaffolds thus creating considerable products of which are used in the biosynthesis of structural diversity. HIs frequently possess biological primary metabolites such as respiratory ubiquinones activity and thus their discovery is of interest to the [24, 25]. In cases where bacteria also possess the mev pharmaceutical industry [11]. Based on literature re- pathway in addition to the non-mevalonate pathway, ports, HI production appears to be scattered throughout it has always been found flanking a HI gene cluster the Streptomyces phylogeny and is a relatively rare part [26] suggesting that the isoprene produced by this of secondary metabolism [10]. In contrast, some mem- biosynthetic route is used for secondary metabolism. bers of the MAR4 clade have been observed to produce Feeding experiments have confirmed that the mev up to three distinct classes of HI secondary metabolites pathwaycanprovidesomeoralloftheisoprenein- [10, 12, 13]. In addition, all MAR4 strains tested produce corporated into HI secondary metabolites [23, 27–30] at least one HI, an ability that has not been reported and led to the hypothesis that the mev pathway is a elsewhere in the Streptomyces genus [9]. To date, HI sec- marker for HI production [26]. The availability of a ondary metabolites produced by MAR4 strains include large number of Streptomyces genome sequences now naphthoquionones in the napyradiomycin [14] and mari- provides the opportunity to further explore the asso- none class [15, 16], the phthalazinone azamerone [17], ciation of the mev pathway with HI biosynthesis. the phenazines lavanducyanin [18] and marinophenazine To date, most evidence for the presence of HI path- [13], and the highly unusual pyrrole nitropyrrolin [12]. ways in MAR4 strains is based the detection and/or iso- Prenyltransferases (PTases) are responsible for the lation of HI secondary metabolites from MAR4 strains attachment of isoprene moieties to a variety of acceptor [9, 12, 13, 15, 16, 31, 32]. Here we use comparative Gallagher and Jensen BMC Genomics (2015) 16:960 Page 3 of 13

genomics to more rigorously test the hypothesis that yet all were assigned to ABBA PTase-specific pfam fam- MAR4 strains are enriched in HI gene clusters relative ilies. The 95 ABBA PTases were distributed among 35 to other streptomycetes. We show that MAR4 PTases strains (29.2 % of the total), some of which contained have undergone gene duplication events and been ex- more than one of these genes (Fig. 1). The 12 MAR4 changed among unrelated gene clusters. We also show strains contained from 3 to 8 ABBA PTases per genome. that most strains possessing HI gene clusters do not With the exception of a single indole PTase in strain contain the mev pathway, suggesting it is not a good CNQ-509, all of these fell within the Orf2 class. Of the marker for the presence of this type of biosynthetic cap- 108 non-MAR4 strains, 23 contained ABBA PTases. Of acity. The evolutionary history of HI pathways in MAR4 these, one contained three, five contained two, and 17 streptomycetes provides insight into how chemical diver- contained one copy of this gene. In contrast to the MAR4 sity is generated in microbial secondary metabolism. ABBA PTases, these could be delineated into 16 indole and 14 Orf2 PTases. Results ABBA prenyltransferases in Streptomyces genomes Distribution of ABBA PTases among Streptomyces spp One hundred and twenty Streptomyces genome se- The phylogenetic relationships of the 35 strains that quences were analyzed including 11 that were acquired contained ABBA PTases were examined in the context from MAR4 strains as part of this study (Additional file 1: of a Streptomyces species tree that included the 120 Table S1). These genomes were initially screened for the strains used in this study (Fig. 2). This phylogeny was presence of HI biosynthetic gene clusters (HIBGCs) using generated from the shared, single copy housekeeping 17 experimentally characterized Orf2 and indole ABBA genes AtpD and RpoB. The MAR4 clade is well delin- PTases as BLAST search queries (Additional file 1: eated within this phylogeny, which supports previous Table S2). The top BLAST matches were incorporated 16S rRNA gene sequence analyses [9]. The results clearly into a phylogeny that included experimentally charac- show that ABBA PTases are sparsely distributed terized enzymes representing both PTase sub-classes throughout the genus with the exception of the MAR4 [20] (data not shown). In total, 95 PTases claded with clade and a distantly related clade designated ‘S. coelico- either the Orf2 or indole ABBA PTase lineages (Additional lor’ in reference to its best-characterized member. These file 1: Table S3). These sequences showed varying levels of two clades are composed entirely of ABBA PTase- identity to the queries (29–100 % AA sequence identity), containing strains, with members of the MAR4 clade

AB9 indole MAR4 8 Orf2 7 Genomes containing 6 PTases (29%) Genomes 5 containing 4 no PTases in genome Tases

(71%) A P 3

2 # ABB 1

0 113 TK24 A3(2) 87.22 u4000 91-03 T 84-104 Tu 4 ZG0656 JCM 4913 TCC 25435 TCC 10970 BKS 13-15 Amel2xC10 DSM 41855 DSM 41686 A A ATCC 27064 ATCC NBRC 13819 sp. TOR3209 sp. TOR3209 sp. CNQ-509 sp. CNQ-865 . sp. CNT-371 . sp. CNS-335 . sp. CNX-435 . sp. CNY-243 . sp. CNB-632 . sp. CNP-082 . sp. CNS-654 . sp. CNH-287 . sp. CNH-099 . sp. CNQ-329 . sp. CNQ-766 . sp. CNQ-525 . sp. CNH-189 . sp. FXJ7.023 S S S S S S S S S S S S. S. S S S S. scabiei S. S. lividans . sp. S S. coelicolor . sp. 303MFCol5.2 S. ipomoeae S S. acidiscabies S. griseoflavus S. coelicoflavus S. gancidicus S. davawensis S. violaceusniger S. clavuligerus S. scabrisporus S. bottropensis S. vitaminophilus S. mobaraensis S. rimosus Fig. 1 PTase distributions among genome sequences. a Percentage of genomes that contain ABBA PTases. b Distribution of indole and Orf2 ABBA PTases among the 35 strains that possessed these genes Gallagher and Jensen BMC Genomics (2015) 16:960 Page 4 of 13

Streptomyces sp. TOR3209 55 Streptomyces sp. FXJ7.023 80 Streptomyces coelicoflavus ZG0656 Streptomyces lividans TK24 S. coelicolor 88 100 Streptomyces coelicolor A3(2) clade Streptomyces griseoflavus Tu4000 Streptomyces gancidicus BKS 13-15 78 Streptomyces sp. CNH-189 99 Streptomyces ghanaensis ATCC 14672 Streptomyces viridosporus T7A Streptomyces sp. e14 93 Streptomyces hygroscopicus jinggangensis TL01 Streptomyces hygroscopicus jinggangensis 5008 97 Streptomyces sp. LCC Streptomyces sp. PsTaAH-130 71 Streptomyces collinus Tu 365 100 Streptomyces afghaniensis 772 91 Streptomyces chartreusis NRRL 12338 94 Streptomyces viridochromogenes DSM 40736 Streptomyces sp. HGB0020 77 Streptomyces ipomoeae 91-03 Streptomyces davawensis JCM 4913 84 Streptomyces chartreusis NRRL 3882 Streptomyces viridochromogenes Tue57 Streptomyces sp. PsTaAH-137 Streptomyces sp. 303MFCol5.2 Streptomyces griseoaurantiacus M045 Streptomyces acidiscabies 84-104 100 Streptomyces bottropensis ATCC 25435 Streptomyces scabiei 87.22 Streptomyces sp. Amel2xC10 Streptomyces zinciresistens K42 Streptomyces sp. 351MFTsu5.1 100 Streptomyces sviceus ATCC 29083 68 Streptomyces canus 299MFChir4.1 75 Streptomyces sp. 142MFCol3.1 Streptomyces avermitilis MA-4680 54 Streptomyces turgidiscabies Car8 Streptomyces aurantiacus JA 4570 56 98 Streptomyces sp. SPB78 85 Streptomyces sp. SA3_actG 100 Streptomyces sp. Tu6071 Streptomyces sp. HrubLS-53 Streptomyces sp. SPB74 77 62 Streptomyces albus J1074 Streptomyces sp. LaPpAH-201 55 Streptomyces sp. CNY-228 100 Streptomyces sp. PP-C42 Streptomyces sp. S4 56 Streptomyces sp. W007 Streptomyces sp. CNB-091 Streptomyces globisporus C-1207 Streptomyces griseus griseus NBRC 13350 Streptomyces sp. XyelbKG-1 1 Streptomyces sp. PgraA7 75 Streptomyces sp. CNS-654 97 Streptomyces roseosporus NRRL 11379 Streptomyces roseosporus NRRL 15998 58 Streptomyces fulvissimus DSM 40593 Streptomyces sp. ScaeMP-e122 Streptomyces sp. DvalAA-21 89 Streptomyces sp. DpondAA-B6 92 Streptomyces sp. SirexAA-E 65 Streptomyces flavidovirens DSM 40150 66 Streptomyces sp. PAMC26508 Streptomyces sp. KhCrAH-43 100 Streptomyces sp. KhCrAH-40 Streptomyces sp. KhCrAH-337 Streptomyces pristinaespiralis ATCC 25486 100 Streptomyces lysosuperificus ATCC 31396 99 Streptomyces sp. Mg1 Streptomyces sp. C Streptomyces purpureus KA281 Streptomyces sp. CNT-372 100 Streptomyces violaceusniger SPC6 Streptomyces somaliensis DSM 40738 85 100 Streptomyces sp. SS Streptomyces venezuelae ATCC 10712 Streptomyces sp. CNT-302 100 Streptomyces sp. CNR-698 Streptomyces sp. CNS-615 98 Streptomyces clavuligerus ATCC 27064 Streptomyces tsukubaensis NRRL 18488 Streptomyces mobaraensis NBRC 13819 51 Streptomyces sp. WMMB 322 Streptomyces sp. TAA-204 Streptomyces sp. TAA-486 100 Streptomyces sp. WMMB 714 71 Streptomyces sp. Amel2xB2 Streptomyces sulphureus DSM 40104 99 100 Streptomyces sp. CNT-318 Streptomyces sp. AA1529 64 Streptomyces sp. CNT-360 63 Streptomyces sp. HPH0547 65 88 Streptomyces sp. CNH-287 60 Streptomyces sp. CNS-606 Streptomyces sp. TAA-040 62 Streptomyces sp. AA0539 Streptomyces xinghaiensis S187 Streptomyces sp. CNQ-865 Streptomyces sp. CNY-243 Streptomyces sp. CNT-371 93 Streptomyces sp. CNQ-766 72 Streptomyces sp. CNS-355 57 Streptomyces sp. CNP-082 MAR4 50Streptomyces sp. CNQ-329 57 Streptomyces sp. CNQ-525 clade 69 Streptomyces sp. CNQ-509 100 Streptomyces sp. CNX-435 Streptomyces sp. CNB-632 0 12345678 91 100 Streptomyces sp. CNH-099 Streptomyces bingchenggensis BCW-1 No. PTases found in genome 60 Streptomyces violaceusniger Tu 4113 Streptomyces hygroscopicus ATCC 53653 91 89 Streptomyces auratus AGR0001 94 Streptomyces rimosus rimosus ATCC 10970 100 Streptomyces albulus CCRC 11814 Streptomyces cattleya ATCC 35852 Streptomyces vitaminophilus DSM 41686 Streptomyces scabrisporus DSM 41855 100 Pseudonocardia dioxanivorans CB1190 Pseudonocardia asaccharolytica DSM 44247

0.4 substitutions per site Fig. 2 (See legend on next page.) Gallagher and Jensen BMC Genomics (2015) 16:960 Page 5 of 13

(See figure on previous page.) Fig. 2 Maximum likelihood phylogeny of the 120 Streptomyces strains used in this study. Phylogeny is based on concatenated AtpD and RpoB amino acid sequences. Bootstrap values >50 % are indicated at their respective nodes (based on 100 replicates). Colors indicate the number of ABBA PTases found in each genome. The MAR4 and S. coelicolor clades are indicated. Sequences derived from two Pseudonocardia genomes were used to root the tree

containing on average five ABBA PTases per genome likelihood analysis was then used to predict the ancestral compared with one per genome for members of the node for each PTase based on their distributions within S. coelicolor clade. The sporadic distribution of ABBA the phylogeny. This analysis predicts that no PTases PTases throughout the tree largely conforms to previ- were present in the MAR4 common ancestor, suggesting ous reports of HI production in this genus [10]. that all were acquired by HGT. In addition, it is pre- dicted that nine of 13 PTases were acquired more than Orf2 ABBA PTase phylogeny once within the MAR4 clade while three were lost in An Orf2 ABBA PTase phylogeny was constructed to as- certain strains following acquisition events. Specifically, sess the evolutionary relationships among the sequences PTC6 and PTC10 are predicted to have been lost in the identified in this study (Fig. 3). The initial bifurcation in CNQ-509 lineage, while PTC1 is predicted to have been the Orf2 phylogeny reveals two clades, the smaller of lost in the CNQ-329 lineage. Five PTases are found in which appears at the bottom of the tree and contains seven or more MAR4 strains (PTCs 1,6,9,10,13) while all the previously characterized CloQ and NovQ PTases, others (2–5, 7–8, and 11–12) are present in four or which are responsible for the prenylation of the amino- fewer. In all cases, the phylogeny within each PTC coumarin molecules chlorobiocin and novobiocin, re- (Fig. 3) is largely congruent with that of the MAR4 spe- spectively [33]. None of the MAR4 sequences fall in this cies phylogeny (Fig. 4) suggesting that, once acquired, clade. The larger, sister clade includes PTases known to these genes follow a vertical model of inheritance. prenylate naphthoquinone (e.g. NphB, Fur7, and Fnq26) and phenazine (EpzP and PpzP) scaffolds [20]. All of the Hybrid isoprenoid biosynthetic gene clusters (HIBGCs) PTases found in the MAR4 genomes fell within this lar- We next analyzed the gene neighborhoods surrounding ger clade. The MAR4 PTases could be further delineated each MAR4 PTase and sought to identify gene clusters into thirteen highly supported sub-clades. These sub- that putatively encode for the biosynthesis of HI second- clades were each assigned prenyltransferase clade (PTC) ary metabolites. The HI biosynthetic gene clusters numbers (Fig. 3). PTC5 contains a single prenyltransfer- (HIBGCs) observed in different strains were then ase but was sufficiently separated from the other se- grouped when shared gene content and MultiGeneBlast quences to assign it an independent clade number. The [34] analyses revealed sufficient similarity to predict they only two experimentally characterized Orf2 PTases that encode structurally related secondary metabolites. This clade with MAR4 sequences are both from the napyra- led to the identification of 13 HIBGCs of which only one diomycin (nap) biosynthetic pathway (ABS50462 and could be linked to a previously characterized pathway ABS50461) [14]. (nap, Table 1, Fig. 5) [14, 28]. It is generally difficult to predict the scaffold that will be prenylated based on Orf2 evolution HIBGC gene content or Orf2 phylogeny and thus little By comparing the Orf2 phylogeny (Fig. 3) with the spe- can be inferred about the secondary metabolites encoded cies tree (Fig. 2), it becomes clear that closely related by the remaining 12 gene clusters. Nonetheless, HIBGC9 PTases occur in distantly related strains. For example, contains a type III polyketide synthase, which would be the PTase that is sister to PTC13 occurs in S. griseofla- expected for the production of the naphthoquinone vus Tu4000, which is distantly related to the MAR4 moiety of compounds in the marinone series (Fig. 5). clade in the Streptomyces phylogeny. This suggests that Likewise, HIBGC11 contains a full suite of phenazine these PTases have been exchanged by HGT. In other biosynthesis genes and is therefore predicted to encode cases, there is evidence of PTase duplication followed by the production of the known MAR4 HI lavanducyanin. divergence. This is demonstrated for the sister PTC7 However, these bioinformatic predictions remain to be and PTC8 clades, which share the same phylogenies and experimentally verified. occur in the same taxa (Fig. 3). Two HIBGCs (8 and 9) did not meet the criteria to be To better resolve the relationships among the MAR4 assigned to the same cluster, however their gene content strains and the Orf2 PTases, a species phylogeny was is largely identical except that HIBGC9 contains an add- generated using five single-copy housekeeping genes de- itional set of genes including a second PTase and a type rived from the 12 MAR4 genome sequences (Fig. 4). A III polyketide synthase. Because HIBGC8 occurs on a Gallagher and Jensen BMC Genomics (2015) 16:960 Page 6 of 13

2524596589 Streptomyces sp. CNQ-865 HIBGC11 2518002741 Streptomyces sp. CNQ-766 HIBGC11 100 2518010999 Streptomyces sp. CNS-335 HIBGC11 2518513180 Streptomyces sp. CNY-243 100 HIBGC11 2562118558 Streptomyces sp. CNQ-525 HIBGC11 81 2516108106 Streptomyces sp. CNT-371 HIBGC11 PTC1 100 AA958_30735 Streptomyces sp. CNQ-509 HIBGC11 98 2561470561 Streptomyces sp. CNB-632 HIBGC11 2516097927 Streptomyces sp. CNH-099 HIBGC11 82 HIBGC11` 99 2561478010 Streptomyces sp. CNX-435 100 2561473002 Streptomyces sp. CNB-632 HIBGC4 93 2516101748 Streptomyces sp. CNH-099 HIBGC4 100 92 2561481643 Streptomyces sp. CNX-435 HIBGC12 PTC2 100 AA958_12645 Streptomyces sp. CNQ-509 HIBGC3 2521683528 Streptomyces sp. 303MFCol5.2 100 2527270384 Streptomyces sp. CNP-082 HIBGC7 2528491298 Streptomyces sp. CNQ-329 HIBGC7 PTC3 100 2561473755 Streptomyces sp. CNB-632 HIBGC6 99 2516101115 Streptomyces sp. CNH-099 HIBGC6 100 2561479929 Streptomyces sp. CNX-435 HIBGC6 PTC4 77 AA958_18620 Streptomyces sp. CNQ-509 HIBGC6 100 69 1ZB6_A NphB Streptomyces sp. Cl190 AA958_24325 Streptomyces sp. CNQ-509 HIBGC9 PTC5 2521683684 Streptomyces sp. 303MFCol5.2 2518014051 Streptomyces sp. CNS-335 nap ABS50462 NapT9 S. aculeolatus NRRL 18422 nap 2518518936 Streptomyces sp. CNY-243 nap 2524600746 Streptomyces sp. CNQ-865 nap 100 2518006639 Streptomyces sp. CNQ-766 nap 2562123521 Streptomyces sp. CNQ-525 nap PTC6 88 100 84 2528491790 Streptomyces sp. CNQ-329 nap 2516112035 Streptomyces sp. CNT-371 nap 60 2527270867 Streptomyces sp. CNP-082 nap 60 95 2516104298 Streptomyces sp. CNH-099 HIBGC8 100 2561476030 Streptomyces sp. CNB-632 HIBGC8 HIBGC9 PTC7 51 AA958_24270 Streptomyces sp. CNQ-509 100 2516104522 Streptomyces sp. CNH-099 HIBGC5 duplication 90 100 2561476124 Streptomyces sp. CNB-632 HIBGC5 PTC8 99 AA958_12625 Streptomyces sp. CNQ-509 HIBGC3 2515835839 Streptomyces sp. CNH-189 2524600624 Streptomyces sp. CNQ-865 HIBGC1 2518006200 Streptomyces sp. CNQ-766 HIBGC1 2518013585 Streptomyces sp. CNS-335 HIBGC1 100 2518518703 Streptomyces sp. CNY-243 HIBGC1 PTC9 2528489393 Streptomyces sp. CNQ-329 HIBGC1 2516111586 Streptomyces sp. CNT-371 HIBGC1 2562123443 Streptomyces sp. CNQ-525 HIBGC1 2524586714 Streptomyces sp. CNH-287 2518013945 Streptomyces sp. CNS-335 nap 2518006640 Streptomyces sp. CNQ-766 nap 2524600745 Streptomyces sp. CNQ-865 nap 98 2562123520 Streptomyces sp. CNQ-525 nap 97 2518518937 Streptomyces sp. CNY-243 nap PTC10 75 ABS50461 NapT8 S. aculeolatus NRRL 18422 nap 100 2516112034 Streptomyces sp. CNT-371 nap nap 67 2528491789 Streptomyces sp. CNQ-329 2527270866 Streptomyces sp. CNP-082 nap 100 2561476125 Streptomyces sp. CNB-632 HIBGC5 100 2516104521 Streptomyces sp. CNH-099 HIBGC5 PTC11 AA958_12635 Streptomyces sp. CNQ-509 HIBGC3 100 BAE78975 Fur7 Streptomyces sp. KO-3988 100 2562412110 Streptomyces davawensis JCM 4913 CAL34104 Fnq26 Streptomyces cinnamonensis DSM 1042 2515785691 Streptomyces vitaminophilus DSM 41686 100 2562084974 Streptomyces sp. CNS-654 100 CAX48655 PpzP Streptomyces anulatus 9663 ADQ43372 EpzP Streptomyces cinnamonensis DSM 1042 100 2524588539 Streptomyces sp. CNH-287 100 2561473004 Streptomyces sp. CNB-632 HIBGC4 2516101746 Streptomyces sp. CNH-099 HIBGC4 PTC12 100 2538963887 Streptomyces mobaraensis NBRC 13819 2547553903 Streptomyces acidiscabies 84-104 2524593636 Streptomyces sp. CNQ-865 HIBGC2 2518518536 Streptomyces sp. CNY-243 HIBGC2 86 2518010821 Streptomyces sp. CNS-335 HIBGC2 82 2518005124 Streptomyces sp. CNQ-766 HIBGC2 100 2562121671 Streptomyces sp. CNQ-525 HIBGC2 PTC13 100 2516104881 Streptomyces sp. CNT-371 HIBGC2 96 2561472124 Streptomyces sp. CNB-632 HIBGC2 83 2516099186 Streptomyces sp. CNH-099 HIBGC2 HGT 89 2561476450 Streptomyces sp. CNX-435 HIBGC2 645416841 Streptomyces griseoflavus Tu4000 88 637271461 Streptomyces coelicolor A3(2) (SCO7190) 100 645400867 Streptomyces lividans TK24 2539351921 Streptomyces gancidicus BKS 13-15 100 EDN93598 PtfSs Sclerotinia sclerotiorum 1980 54 EDN25735 PtfBf Botryotinia fuckeliana B05.10 100 EAU39467 PtfAt Aspergillus terreus NIH2624 100 AAN65239 CloQ Streptomyces roseochromogenes oscitans DS 12.976 AAF67510 NovQ Streptomyces niveus DSM 40088

0.5 substitutions per site Fig. 3 Maximum likelihood phylogeny of Orf2 ABBA PTases with mid-point rooting. The phylogeny contains all of the Orf2 ABBA PTases identified in the 120 Streptomyces genomes (identified by IMG gene number) including 12 that are experimentally characterized (identified by accession number). Homologs found in non-MAR4 genomes are in black, while those from MAR4 strains are color-coded based on strain. MAR4 PTases are delineated into 13 prenyltransferase clades (PTCs). Each MAR4 PTase is also assigned a hybrid isoprenoid gene cluster (HIBGC) number, which defines the gene cluster in which it was observed. Bootstrap values >50 % are indicated at their respective nodes (based on 100 replicates). Examples of horizontal gene transfer (HGT) and gene duplication are indicated. Two of the characterized PTases (ABS50462 and ABS50461) are from the MAR4 strain S. aculeolatus NRRL 18422 relatively short contig, further sequencing is required to For example, HIBGC4 contains two PTases that belong determine if these two clusters are in fact the same. Four to PTCs 2 and 12 (Fig. 3) and thus appear to have been of the HIBGCs contain more than one PTase (Figs. 3, 5, independently recruited into the cluster. This is sup- Table 1). In each of these cases, the PTases are distantly ported by the likelihood analysis, which predicts that related to each other (i.e. they occur in different PTCs). PTC2 and PTC12 were acquired at different points in Gallagher and Jensen BMC Genomics (2015) 16:960 Page 7 of 13

ABCMAR4 phylogeny PTase distribution PTC13 phylogeny

123456781213 91011 2524593636 CNQ-865 52/53 CNY-243

50/75 CNQ-766 2518518536 CNY-243

61/100 CNS-335 2518010821 CNS-335 86 58/94 CNQ-865 2518005124 CNQ-766 82 13 CNQ-525 100/100 100 9 52/67 CNT-371 2562121671 CNQ-525

1 3 57/89 CNQ-329 2516104881 CNT-371 6 10 2 4 5 7 8 11 CNQ-509 100 52/79 2561472124 CNB-632 3 CNP-082 96 100/100 2516099186 CNH-099 89 7 8 1112 CNB-632 100/100

65/60 CNH-099 2561476450 CNX-435 1 2 4 13 CNX-435 S. vitaminophilus DSM 41686 S. bottropensis ATCC 25435

0.03 substitutions per site Fig. 4 Evolutionary relationships between MAR4 strains and PTases. a MAR4 phylogeny generated from five single-copy housekeeping genes (atpD, rpoB, trpB, recA,andgyrB). Maximum likelihood (ML) and maximum parsimony (MP) trees showed the same topology. Bootstrap values >50 % are indicated at the respective nodes (MP/ML) based on 100 replicates. Black circles and associated PTC numbers indicate predicted PTase acquisition points with the % fill of the circle indicating the proportional likelihood that the PTase was present at that node (only values ≥50 % are shown). Note: some PTases are predicted to have been acquired at multiple nodes. b Boxes depict PTase distributions among MAR4 strains (black = present, white = absent). Numbers indicate the PTC. c The PTase phylogeny for each PTC was largely congruent with that of the species phylogeny for the strains in which it was observed. As an example, the phylogeny of PTC13 is depicted in comparison with the strain phylogeny. The likelihood analysis predicts two independent acquisition events for PTC13 after which the PTase phylogeny supports a model of vertical inheritance. Bootstrap values >50 % are indicated at the respective nodes (ML).IMGgeneIDandthesourcestrainforeachPTasearegiven the evolutionary history of strains CNB-632 and CNH- however, three notable exceptions (PTCs 2, 8, and 11), 099 (Fig. 4). Overall, the varying gene content among where orthologous PTases are found in different the MAR4 HIBGCs reveals the potential biosynthetic HIBGCs. A closer look reveals that the four PTC2 diversity maintained by these strains. sequences are found in three different HIBGCs, each of which contains from 1 to 3 PTases (Fig. 6). The presence Evidence for PTase rearrangement of orthologous PTases in different gene clusters suggests In most cases, all of the PTases that fall within a single they are actively exchanged and can be involved in PTC occur in the same HIBGC (Fig. 3). There are, the production of distinct secondary metabolites. Thus,

Table 1 Summary of putative HI gene clusters (HIBGCs) identified in MAR4 strains. The PTases present in each HIBGC are indicated along with the annotation of key biosynthetic genes Gene cluster PTC(s) present HIBGC content Product nap PTC10, PTC6 Type III polyketide synthase, three haloperoxidases, mev pathway napyradiomycin HIBGC1 PTC9 FabH-like protein, AvrD-like protein Unknown HIBGC2 PTC13 haloperoxidase Unknown HIBGC3 PTC2, PTC8, PTC11 squalene-hopene cyclase Unknown HIBGC4 PTC2, PTC12 squalene-hopene cyclase Unknown HIBGC5 PTC8, PTC11 Found on very short contig, additional genes in this cluster may exist Unknown HIBGC6 PTC4 isopentenyl pyrophosphate synthase Unknown HIBGC7 PTC3 polyprenyl synthase Unknown HIBGC8 PTC7 three haloperoxidases; found on very short contig, additional genes in this cluster may exist Unknown HIBGC9 PTC7, PTC5 Two type III polyketide synthases; portion of this HIBGC is identical to HIBGC8 marinonea HIBGC10 N/A indole PTase, phytoene synthase Unknown HIBGC11 PTC1 full suite of phenazine biosynthesis genes (phzABCDEFG) lavanducyanina HIBGC12 PTC2 squalene-hopene cyclase Unknown aPredicted Gallagher and Jensen BMC Genomics (2015) 16:960 Page 8 of 13

nap PTC10 PTC6

PTC9 HIBGC1 ABBA prenyltransferase other/unknown HIBGC2 PTC13 regulatory protein squalene cyclase oxidoreductase PTC2 PTC11 PTC8 peroxidase HIBGC3 fatty acid-CoA synthetase isomerase PTC2 PTC12 methyltransferase HIBGC4 transporter FabH-like PTC11 PTC8 aldolase HIBGC5 halogenase polyprenyl synthase PTC4 acyltransferase HIBGC6 phenazine biosynthesis phytoene synthase PTC3 HIBGC7 aminotransferase type III polyketide synthase PTC7 cytochrome P450 HIBGC8 acetyl-CoA carboxylase hydrolase mev pathway genes HIBGC9 PTC5 PTC7

indole HIBGC10 PTase

PTC1 HIBGC11

PTC2 HIBGC12

Fig. 5 Representative putative HI biosynthetic gene clusters (HIBGCs) identified in MAR4 strains. Putative gene functions (predicted by pfam annotations) are indicated by color. The PTC associated with each PTase is indicated

PTase phylogeny is not a good predictor of the second- previous reports that the isoprene incorporated into the ary metabolites produced by HIBGCs. napyradiomycins is derived from mev [28, 30]. However, the mev pathway was often found on short contigs mak- Mevalonate pathway distribution ing it difficult to establish its proximity to the nap path- The 120 Streptomyces genomes were examined for the six way. In all other cases, the mev pathway was found in genes that comprise the mev pathway to determine its close proximity to an ABBA PTase (Additional file 1: association with HIBGCs. Thirteen genomes were found Table S4). Thus, it appears that when present, the mev to contain the complete mev pathway (Additional file 1: pathway is generally associated with genes for secondary Table S4). Of these, seven were MAR4 strains. In contrast, metabolism, however these genes do not appear to be 35 strains contained ABBA PTases. The mev pathway was consistently associated with HIBGCs and are often ab- never found more than once in a single genome. Of the sent in strains with the genetic potential to produce HIs. 13 strains that contained the mev pathway, 11 contained at least one ABBA PTase. In the two strains that contained Discussion the mev pathway but lacked an ABBA PTase (Streptomy- Access to genome sequence data is providing unprece- ces sp. TAA-040 and Streptomyces sp. CNT-372), the mev dented opportunities to explore the evolution of second- pathway was found in close proximity to genes involved in ary metabolite biosynthesis [3, 4, 35]. Studies in this area terpene biosynthesis (i.e. a terpene synthase or non-ABBA have largely focused on modular enzyme systems, such PTases) suggesting it may be involved in the biosynthesis as those encoding non-ribosomal peptides and polyke- of terpenoid secondary metabolites. tides. In contrast, relatively little is known about the di- The seven MAR4 strains that contained the mev path- versity and distributions of gene clusters responsible for way also contained the nap gene cluster, supporting HI biosynthesis and how these systems evolve to create Gallagher and Jensen BMC Genomics (2015) 16:960 Page 9 of 13

PTC2 HIBGC12 Streptomyces sp. CNX-435 ABBA prenyltransferase other/unknown regulatory protein squalene cyclase oxidoreductase peroxidase fatty acid CoA synthetase PTC2 Streptomyces sp. CNB-632 isomerase HIBGC4 Streptomyces sp. CNH-099 methyltransferase hydrolase PTC12

HIBGC3 Streptomyces sp. CNQ-509 PTC2 PTC11 PTC8

Streptomyces sp. CNB-632 HIBGC5 Streptomyces sp. CNH-099 PTC11 PTC8 Fig. 6 PTase rearrangement within MAR4 strains. Four HIBGCs are illustrated with putative gene functions (predicted by pfam annotations) indicated by color. Homologs are indicated by gray bars. ABBA PTases are shown in red, and the PTC associated with each PTase indicated. The strain(s) containing each HIBGC are labeled new secondary metabolite diversity. HIs encompass a class. The consistent occurrence and relatively high wide spectrum of biosynthetic paradigms, however a abundance of PTases in MAR4 strains suggests they pro- consistent feature of the associated gene clusters is the vide a selective advantage to these bacteria. presence of PTases, which facilitate the addition of iso- All of the MAR4 PTases could be assigned to 13 prene moieties to the secondary metabolite. HIBGCs, each of which is predicted to encode the Using a dataset of 120 Streptomyces genomes, we pro- production of a distinct HI molecule. While there was vide additional bioinformatic support for the hypothesis generally a strong correlation between the phylogenetic that MAR4 streptomycetes are enriched in PTases rela- relationships among the PTases and the HIBGCs in tive to other streptomycetes [10]. Interestingly, all but which they occurred, there was also clear evidence that one of the MAR4 PTases falls within the Orf2 class of PTases are exchanged among HIBGCs (e.g., PTases in ABBA PTases. This is unlike non-MAR4 streptomycetes, PTC2 were observed in three different HIBGCs, Fig. 6). which contain approximately equal numbers of indole While it might be assumed that PTase duplication and Orf2 PTases. ABBA PTases were also enriched in followed by divergence would be a primary mechanism the S. coelicolor A3(2) clade, however these strains con- by which HIBGCs evolve to generate new structural tained on average one copy per genome compared to diversity, there were no cases where a single HIBGC five in MAR4 strains. Outside of these two clades, contained paralogous PTases. In fact, in all cases where PTases were sparsely distributed throughout the Strepto- multiple PTases were found in the same gene cluster myces species tree. (HIBGCs 3, 4, 5, and 9) the PTases were distantly related Likelihood analyses predict that the MAR4 PTases to each other (Fig. 3), and in one case (HIBGC4), PTases were acquired via multiple HGT events (Fig. 4). The in the same cluster are predicted to have been acquired alternative and less parsimonious hypothesis is that the at different points during MAR4 evolution (Fig. 3). PTases were acquired via HGT by a common MAR4 ABBA PTases have been shown to prenylate a variety of ancestor and subsequently lost by many of the strains. substrates in vitro [20], suggesting they could be func- Regardless of which scenario is correct, HGT appears to tionally incorporated into new HIBGCs. These results have played a major role in the acquisition of PTases by suggest that a primary mechanism of HIBGC evolution MAR4 strains (Fig. 2). There is also evidence that PTases is the acquisition of new PTases. In the one case where have been lost in some lineages, duplicated in others, there is clear evidence for PTC duplication (PTC 7 and and exchanged among HIBGCs, revealing the dynamic 8), the paralogous PTases in each stain occur in different evolutionary processes that are acting on this enzyme HIBGCs. Given that some orthologous PTases occurred Gallagher and Jensen BMC Genomics (2015) 16:960 Page 10 of 13

in different HIBGCs, which implies they encode the bio- Conclusions synthesis of distinct molecules (Fig. 6), a generalized Overall, the results of this study support the hypothesis conclusion of this study is that PTase phylogeny is not a that MAR4 strains are enriched in the biosynthetic ma- reliable predictor of HIBGC content and, by extension, chinery required to produce HI secondary metabolite secondary metabolite production. Furthermore, PTase relative to other streptomycetes. We also show that gene exchange among gene clusters appears to be a mechan- duplication, HGT, and gene rearrangement are involved ism by which novel HIBGCs evolve. This finding is in in the evolution of HIBGCs and, by extension, the gen- accordance with a recent study of biosynthetic gene eration of HI chemical novelty. Secondary metabolites cluster evolution which showed that certain portions of have previously been linked to functional adaptation in gene clusters can act as independent evolutionary en- [37], therefore the accumulation of HIs in tities, and that the merger of these biosynthetic subunits this clade could be related to the colonization of a par- is a mechanism that nature has used to generate struc- ticular environmental niche. The goal of understanding tural novelty in secondary metabolism [3]. why such a diversity of HI gene clusters have evolved Among the MAR4 Orf2 PTases, only nap has been and been maintained in MAR4 strains will require the formally linked to its secondary metabolic products challenging task of linking these molecules to their eco- [14, 28]. HIBGC11 contains all of the genes required logical roles in the environment. for phenazine biosynthesis (phzA-G), suggesting it may be responsible for lavanducyanin biosynthesis Methods (Table 1). Surprisingly, however, the PTase in this Genome sequences cluster (PTC1) does not clade with the characterized MAR4 strains were cultured in 200 mL of A1 medium sequences EpzP and PpzP, which are known to preny- (10 g soluble starch, 4 g yeast extract, 2 g peptone, late phenazine scaffolds [20]. HIBGC9 contains a type III 750 mL seawater, 250 mL deionized water) as previ- polyketide synthase, suggesting it may encode the produc- ously described [4]. DNA was extracted according to tion of the marinone class of naphthoquinones. This the Joint Genome Institute (JGI) standard protocol HIBGC was observed in CNQ-509, which is known to (http://my.jgi.doe.gov/general/protocols.html). Genome produce compounds in the marinone series [9]. However, sequencing, annotation, and assembly were carried experimental evidence will be required to confirm this hy- out as previously described [4]. The genome sequence pothesis. The strains containing HIBGC8 (CNB-632 and of MAR4 strain CNQ-509 was provided by Prof. Lutz CNH-099) also produce marinones [9], suggesting this Heide (University of Tuebingen, Germany) and will version of the gene cluster may be truncated due to inad- be made public as part of an upcoming publication equate sequencing (Table 1). It is difficult to predict the [38]. An additional 108 publically available Streptomyces types of molecules produced by the remainder of the genome sequences were obtained from IMG (Additional uncharacterized HIBGCs. There are examples where file 1: Table S1; https://img.jgi.doe.gov/cgi-bin/er/main.cgi). PTases are not clustered with the genes required to pro- duce the entire HI scaffold [13, 36], creating additional Identification of ABBA prenyltransferase homologs challenges for structure prediction. Regardless, the large Seventeen characterized ABBA PTases of fungal and number of uncharacterized pathways identified in MAR4 bacterial origin [20] were used as query sequences in a strains suggests they possess the genetic potential to pro- BLASTp search of the 120 Streptomyces genomes using duce a greater diversity of HI secondary metabolites than the BLAST interface at IMG/ER (https://img.jgi.doe.gov/ − formerly appreciated. er/), with an e-value cutoff of 1e 5. The MAR4 strain The mev pathway has previously been associated with CNQ-509, which was not available through JGI, was HI biosynthesis [26]. From our analysis, only 13 of 120 individually searched with the same query set using strains contained this pathway, including 11 of 35 BLAST+ [39]. Multiple sequence alignments of all strains that contain an ABBA PTase (Additional file 1: BLAST hits to characterized indole and Orf2 PTases Table S4). In the two strains that contain the mev were constructed independently using MUSCLE [40]. pathway but lack an ABBA PTase, the mev pathway is For each alignment, a maximum likelihood phylogeny associated with terpenoid biosynthetic genes suggesting was constructed using raxmlGUI [41] and the GTR + G that this isoprene is also incorporated into secondary me- model. For the Orf2 PTase phylogeny, two outgroups tabolism. These observations lend support to the hypoth- were included from the indole PTase family. Likewise, esis that when present in Streptomyces genomes, the mev for the indole family, two Orf2 PTases were used as pathway provides isoprene for secondary metabolism. outgroups. A BLAST hit was identified as an Orf2 or However, the infrequency of the mev pathway in ABBA indole ABBA PTase if it belonged to either of those PTase-containing genomes indicates that it is not a reli- clades and did not display an excessively long branch able marker for HI production. length. Gallagher and Jensen BMC Genomics (2015) 16:960 Page 11 of 13

Streptomyces phylogeny bit scores dropped precipitously in strains that did Five housekeeping genes (recA, atpD, rpoB, gyrB, and not contain the cluster. Uncharacterized pathways were trpB) that have previously been used in Streptomyces assigned “hybrid isoprenoid biosynthetic gene cluster” multi-locus sequence typing [42] were identified in the numbers (HIBGC 1-12). The previously characterized set of 120 Streptomyces genomes using a BLAST search napyradiomycin (nap) pathway was assembled with the for each gene. The amino acid sequences were aligned aid of the published sequence [14]. using MUSCLE and individual maximum likelihood phy- logenies built using raxmlGUI [41]. Gene identities were MAR4 phylogeny confirmed if they formed a monophyletic clade with The five housekeeping genes used in a prior Streptomy- their respective homologs. Housekeeping genes from ces phylogeny [42] were all present in single copy in the two Pseudonocardia strains (P. dioxanivorans CB1190 12 MAR4 genomes and used to build a more robust and P. asaccharolytica DSM44247) were included as MAR4 species phylogeny. Nucleotide multiple sequence outgroups. Of the five housekeeping genes examined, alignments for each of these genes and the 16S rRNA only rpoB and atpD were present in single copy in all of gene were individually built using MUSCLE [40] and the 120 genomes. These two housekeeping genes were manually trimmed to the same length. The six align- thus selected to build a Streptomyces phylogeny. Mul- ments were then concatenated and a maximum likeli- tiple sequence alignments of these two genes were hood phylogeny built using raxmlGUI [41] with 100 manually trimmed to the same length and concatenated. bootstrap replicates using the GTR + G substitution raxmlGUI was used to build a maximum likelihood model. A maximum parsimony tree was built using phylogeny of the resulting multiple sequence alignment. PAUP* [44] with 100 bootstrap replicates. The two LG + I + G was selected as the best-fit model for the strains that were the most closely related to the MAR4 data set based on a ProtTest [43] analysis of the clade in the Streptomyces phylogeny (S. vitaminophilus concatenated alignment. DSM 41686 and S. bottropensis ATCC 25435) were in- cluded as outgroups. Acquisition points of PTCs within Orf2 prenyltransferase phylogeny the MAR4 lineage were predicted using the trace charac- An amino acid phylogeny of the Orf2 ABBA prenyl- ter history function in Mesquite [45] as previously de- transferases was constructed using all homologs identi- scribed [4]. Likelihood scores >50 % were used to predict fied in the Streptomyces genome sequences and a set of the points of PTC acquisition in the MAR4 phylogeny. characterized Orf2 prenyltransferases [20]. The se- quences were aligned using MUSCLE [40] and a max- Mevalonate pathway identification imum likelihood phylogeny built using raxmlGUI [41] as To identify the mevalonate (mev) pathway, the 3-hydroxy- describe above. Based on a ProtTest analysis [43], LG 3-methyl-glutaryl-CoA reductase gene from S. aculeolatus was chosen as the best-fit model for the data. A prelim- NRRL 18422 (ABS50444) was used as a query in a inary tree using two indole prenyltransferases as out- BLASTp search of the Streptomyces genomes. This gene groups (SCO7190, accession number WP_011031680; has previously been used as a marker for the mev pathway CymD, accession number SARE_4565) was built to con- [29, 46]. Once located, the neighboring genes were manu- firm that the Orf2 prenyltransferase sequences were ally examined to confirm the presence and synteny of monophyletic. To improve the alignment quality, the the five remaining genes in the pathway (3-hydroxy-3- outgroups were removed, the sequences re-aligned, and methyl-glutaryl-CoA synthase, isopentenyl pyrophosphate a final tree built using the same methods with midpoint isomerase, phosphomevalonate kinase, mevalonate de- rooting. carboxylase, and mevalonate kinase).

HI gene cluster identification Availability of supporting data Gene clusters associated with all MAR4 PTases were All genome sequences discussed in this study (with analyzed using a combination of BLAST (http://blast. the exception of the sequence derived from strain ncbi.nlm.nih.gov/Blast.cgi) and IMG/ER, (https:// CNQ-509) are available through the Joint Genome img.jgi.doe.gov/cgi-bin/er/main.cgi). Clusters were de- Institute Integrated Microbial Genomes database (https:// fined by shared content and gene annotation. Gene img.jgi.doe.gov/cgi-bin/er/main.cgi) and are accessible clusters in different strains were compared using through the identification numbers listed in Additional MultiGeneBlast [34] and a database of the 12 MAR4 file 1: Table S1. The genome sequence of strain CNQ- genome sequences. Gene clusters were considered to 509 will be made public as a part of an upcoming publica- bethesameiftheyshared>85%genecontentand tion [38]. Primary phylogenetic data has been uploaded had a MultiGeneBlast cumulative BLAST bit score >80 % into TreeBase (http://purl.org/phylo/treebase/phylows/ of the query sequences’ score to itself. Cumulative BLAST study/TB2:S18360). Gallagher and Jensen BMC Genomics (2015) 16:960 Page 12 of 13

Additional file 13. Zeyhle P, Bauer JS, Steimle M, Leipoldt F, Rösch M, Kalinowski J, et al. A membrane-bound prenyltransferase catalyzes the O-prenylation of 1,6-dihydroxyphenazine in the marine bacterium Streptomyces sp. CNQ-509. Additional file 1: Table S1. Genome sequences used in this study. Chembiochem. 2014;15:2385–92. Table S2. ABBA PTases used as query sequences. Table S3. List of ABBA 14. Winter JM, Moffitt MC, Zazopoulos E, McAlpine JB, Dorrestein PC, Moore BS. PTases identified in this study. Table S4. List of strains containing the Molecular basis for chloronium-mediated meroterpene cyclization - Cloning, complete mevalonate pathway. (PDF 215 kb) sequencing, and heterologous expression of the napyradiomycin biosynthetic gene cluster. J Biol Chem. 2007;282:16362–68. Abbreviations 15. Pathirana C, Jensen PR, Fenical W. Marinone and debromomarinone - HI: Hybrid isoprenoid; PTase: Prenyltransferase; mev: Mevalonate antibiotic sesquiterpenoid naphthoquinones of a new structure class from a – biosynthetic pathway; HIBGC: Hybrid isoprenoid biosynthetic gene cluster; marine bacterium. Tetrahedron Lett. 1992;33:7663 6. PTC: Prenyltransferase clade; nap: Napyradiomycin biosynthetic pathway. 16. Hardt IH, Jensen PR, Fenical W. Neomarinone, and new cytotoxic marinone derivatives, produced by a marine filamentous bacterium (actinomycetales). Tetrahedron Lett. 2000;41:2073–6. Competing interests 17. Cho JY, Kwon HC, Williams PG, Jensen PR, Fenical W. Azamerone, a The authors declare that they have no competing interests. terpenoid phthalazinone from a marine-derived bacterium related to the genus Streptomyces (actinomycetales). Org Lett. 2006;8:2471–4. Authors’ contributions 18. Kondratyuk TP, Park E-J, Yu R, van Breemen RB, Asolkar RN, Murphy BT, et al. KAG and PRJ designed the study. KAG performed the bioinformatic analyses. Novel marine phenazines as potential cancer chemopreventive and anti- Both authors contributed towards writing the manuscript and have read and inflammatory agents. Mar Drugs. 2012;10:451–64. approved the final manuscript. 19. Heide L. Prenyl transfer to aromatic substrates: genetics and enzymology. Curr Opin Chem Biol. 2009;13:171–9. Acknowledgements 20. Bonitz T, Alva V, Saleh O, Lupas AN, Heide L. Evolutionary relationships of This work was funded by the NIH under grant RO1GM085770 (to P.R.J.). microbial aromatic prenyltransferases. PLoS One. 2011;6:e27336. K.A.G. was supported by the NIH Training Program in Marine Biotechnology 21. Schultz AW, Lewis CA, Luzung MR, Baran PS, Moore BS. Functional under grant GM067550. Genome sequencing was conducted by the U.S. characterization of the cyclomarin/cyclomarazine prenyltransferase CymD Department of Energy Joint Genome Institute and supported by the Office directs the biosynthesis of unnatural cyclic peptides. J Nat Prod. 2010;73:373–7. of Science of the U.S. Department of Energy under Contract No. DE-AC02- 22. Kuzuyama T, Noel JP, Richard SB. Structural basis for the promiscuous 05CH11231. Thanks to Lutz Heide for providing the genome sequence of biosynthetic prenylation of aromatic natural products. Nature. 2005;435:983–7. strain CNQ-509. We also thank Greg Rouse, Nadine Ziemert, and Leonard 23. Kuzuyama T, Seto H. Diversity of the biosynthesis of the isoprene units. Kaysser for helpful discussions about the data. Anindita Sarkar, Natalie Millán- Nat Prod Rep. 2003;20:171–83. Aguiñaga, Krystle Chavarria, Kevin Penn, Kelle Freel, Nadine Ziemert, and Eun 24. Lombard J, Moreira D. Origins and early evolution of the mevalonate Ju Choi assisted with DNA extractions. pathway of isoprenoid biosynthesis in the three domains of life. Mol Biol Evol. 2011;28:87–99. Received: 2 June 2015 Accepted: 19 October 2015 25. Dairi T, Kuzuyama T, Nishiyama M, Fujii I. Convergent strategies in biosynthesis. Nat Prod Rep. 2011;28:1054–86. 26. Kawasaki T, Kuzuyama T, Furihata K, Itoh N, Seto H, Dairi T. A relationship References between the mevalonate pathway and isoprenoid production in – 1. Osbourn A. Secondary metabolic gene clusters: evolutionary toolkits for actinomycetes. J Antibiot (Tokyo). 2003;56:957 66. chemical innovation. Trends Genet. 2010;26:449–57. 27. Bringmann G, Haagen Y, Gulder TAM, Gulder T, Heide L. Biosynthesis of the 2. Fischbach MA, Walsh CT, Clardy J. The evolution of gene collectives: How isoprenoid moieties of furanonaphthoquinone I and endophenazine A in – natural selection drives chemical innovation. Proc Natl Acad Sci U S A. Streptomyces cinnamonensis DSM 1042. J Org Chem. 2007;72:4198 204. 2008;105:4601–08. 28. Winter JM, Jansma AL, Handel TM, Moore BS. Formation of the pyridazine 3. Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA. A systematic natural product azamerone by biosynthetic rearrangement of an aryl – computational analysis of biosynthetic gene cluster evolution: lessons for diazoketone. Angew Chemie. 2009;121:781 4. engineering biosynthesis. PLoS Comput Biol. 2014;10:e1004016. 29. Izumikawa M, Khan ST, Takagi M, Shin-Ya K. Sponge-derived Streptomyces 4. Ziemert N, Lechner A, Wietz M, Millán-Aguiñaga N, Chavarria KL, Jensen PR. producing isoprenoids via the mevalonate pathway. J Nat Prod. 2010;73:208–12. Diversity and evolution of secondary metabolism in the marine actinomycete 30. Shiomi K, Iinuma H, Naganawa H, Isshiki K, Takeuchi T, Umezawa H. genus Salinispora. Proc Natl Acad Sci U S A. 2014;111:E1130–9. Biosynthesis of napyradiomycins. J Antibiot (Tokyo). 1987;40:1740–5. 5. Berdy J. Bioactive microbial metabolites - A personal view. J Antibiot 31. Farnaes L, Coufal NG, Kauffman CA, Rheingold AL, DiPasquale AG, (Tokyo). 2005;58:1–26. Jensen PR, et al. Napyradiomycin derivatives, produced by a marine- 6. Watve MG, Tickoo R, Jog MM, Bhole BD. How many antibiotics are derived actinomycete, illustrate cytotoxicity by induction of apoptosis. produced by the genus Streptomyces? Arch Microbiol. 2001;176:386–90. J Nat Prod. 2014;77:15–21. 7. Moran MA, Rutherford LT, Hodson RE. Evidence for indigenous Streptomyces 32. Soria-Mercado IE, Prieto-Davo A, Jensen PR, Fenical W. Antibiotic terpenoid populations in a marine environment determined with a 16S rRNA probe. chloro-dihydroquinones from a new marine actinomycete. J Nat Prod. Appl Environ Microbiol. 1995;61:3695–700. 2005;68:904–10. 8. Seipke RF, Kaltenpoth M, Hutchings MI. Streptomyces as symbionts: an 33. Saleh O, Haagen Y, Seeger K, Heide L. Prenyl transfer to aromatic substrates emerging and widespread theme? FEMS Microbiol Rev. 2012;36:862–76. in the biosynthesis of aminocoumarins, meroterpenoids and phenazines: 9. Gallagher KA, Rauscher K, Pavan Ioca L, Jensen PR. Phylogenetic and The ABBA prenyltransferase family. Phytochemistry. 2009;70:1728–38. chemical diversity of a hybrid isoprenoid-producing streptomycete lineage. 34. Medema MH, Takano E, Breitling R. Detecting sequence homology at the Appl Environ Microbiol. 2013;79:6894–902. gene cluster level with MultiGeneBlast. Mol Biol Evol. 2013;30:1218–23. 10. Gallagher KA, Fenical W, Jensen PR. Hybrid isoprenoid secondary metabolite 35. Doroghazi JR, Albright JC, Goering AW, Ju K-S, Haines RR, Tchalukov KA, et al. production in terrestrial and marine actinomycetes. Curr Opin Biotechnol. A roadmap for natural product discovery based on large-scale genomics and 2010;21:794–800. metabolomics. Nat Chem Biol. 2014;10:963–8. 11. Tello M, Kuzuyama T, Heide L, Noel JP, Richard SB. The ABBA family of 36. Haagen Y, Glueck K, Fay K, Kammerer B, Gust B, Heide L. A gene aromatic prenyltransferases: broadening natural product diversity. Cell Mol cluster for prenylated naphthoquinone and prenylated phenazine Life Sci. 2008;65:1459–63. biosynthesis in Streptomyces cinnamonensis DSM 1042. Chembiochem. 12. Kwon HC, Espindola APDM, Park J-S, Prieto-Davo A, Rose M, Jensen PR, et al. 2006;7:2016–27. Nitropyrrolins A-E, cytotoxic farnesyl-alpha-nitropyrroles from a marine- 37. Penn K, Jenkins C, Nett M, Udwary DW, Gontang EA, McGlinchey RP, et al. derived bacterium within the actinomycete family Streptomycetaceae. J Nat Genomic islands link secondary metabolism to functional adaptation in Prod. 2010;73:2047–52. marine Actinobacteria. ISME J. 2009;3:1193–203. Gallagher and Jensen BMC Genomics (2015) 16:960 Page 13 of 13

38. Rueckert C, Leipoldt F, Zeyhle P, Fenical W, Jensen PR, Kalinowski J, et al. Complete genome sequence of Streptomyces sp. CNQ-509, a prolific producer of meroterpenoid chemistry. J Biotechnol. In press. 39. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. 40. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–97. 41. Silvestro D, Michalak I. raxmlGUI: a graphical front-end for RAxML. 2011. 42. Doroghazi JR, Buckley DH. Widespread homologous recombination within and between Streptomyces species. ISME J. 2010;4:1136–43. 43. Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–5. 44. Swofford DL. PAUP*: phylogenetic analysis using parsimony, version 4.0b10. 2003. 45. Maddison WP, Maddison DR. Mesquite: a modular system for evolutionary analysis. Version 2.75. 2011. 46. Khan ST, Izumikawa M, Motohashi K, Mukai A, Takagi M, Shin-Ya K. Distribution of the 3-hydroxyl-3-methylglutaryl coenzyme A reductase gene and isoprenoid production in marine-derived Actinobacteria. FEMS Microbiol Lett. 2010;304:89–96.

Submit your next manuscript to BioMed Central and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit