Evolutionary Implications of Bacterial Polyketide Synthases

Holger Jenke-Kodama,* Axel Sandmann, Rolf Mu¨ller, and Elke Dittmann* *Humboldt University, Institute of Biology, Chausseestrasse, Berlin, Germany; and Pharmaceutical Biotechnology, Saarland University, Saarbru¨cken, Germany

Polyketide synthases (PKS) perform a stepwise biosynthesis of diverse carbon skeletons from simple activated carboxylic acid units. The products of the complex pathways possess a wide range of pharmaceutical properties, including , antitumor, antifungal, and immunosuppressive activities. We have performed a comprehensive phylogenetic analysis of multimodular and iterative PKS of and fungi and of the distinct types of fatty acid synthases (FAS) from different groups of organisms based on the highly conserved ketoacyl synthase (KS) domains. Apart from enzymes that meet the classification standards we have included enzymes involved in the biosynthesis of mycolic acids, polyunsaturated fatty acids (PUFA), and glycolipids in bacteria. This study has revealed that PKS and FAS have passed through a long joint evolution process, in which modular PKS have a central position. They appear to have derived from bacterial FAS and primary iterative PKS and, in addition, share a common ancestor with animal FAS and secondary iterative PKS. Further- Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 more, we have carried out a phylogenomic analysis of all modular PKS that are encoded by the complete eubacterial genomes currently available in the database. The phylogenetic distribution of acyltransferase and KS domain sequences revealed that multiple gene duplications, gene losses, as well as horizontal gene transfer (HGT) have contributed to the evolution of PKS I in bacteria. The impact of these factors seems to vary considerably between the bacterial groups. Whereas in actinobacteria and cyanobacteria the majority of PKS I genes may have evolved from a common ancestor, several lines of evidence indicate that HGT has strongly contributed to the evolution of PKS I in . Discovery of new evolutionary links between PKS and FAS and between the different PKS pathways in bacteria may help us in understanding the selective advantage that has led to the evolution of multiple secondary metabolite biosyntheses within individual bacteria.

Introduction The polyketide class of natural products shows a re- that comprise iteratively acting modules, e.g., the biosyn- markable functional and structural diversity. Apart from be- thesis of aureothin (He and Hertweck 2003). ing toxic for microorganisms or higher eukaryotes, some of Modular PKS I are predominantly found in actinobac- the compounds play a role in metal transport (Crosa and teria, , pseudomonades, and cyanobacteria Walsh 2002), others are closely linked to microbial differ- (Bode and Mu¨ller 2005). A minimal module is composed entiation (Black and Wolk 1994; Ohnishi et al. 1999). Poly- of a ketoacyl synthase (KS) domain, an acyltransferase ketides are classified according to the architecture of their (AT) domain, and an acyl carrier protein (ACP) domain. Fre- biosynthesis enzymes. Each of the classes of polyketide quently ketoreductase (KR), dehydratase (DH), and enoyl synthases (PKS) resembles one of the classes of fatty acid reductase (ER) domains are also embedded in the multifunc- synthases (FAS): the type I PKS possess a multidomain tional megasynthases (fig. 1). Genetics and biochemistry of architecture similar to the type I FAS of fungi and animals bacterial type I polyketide biosynthesis has been well inves- and type II PKS carry each catalytic site on a separate tigated for the biosynthesis of the aglycone of erythromycin protein, characteristic of FAS II found in bacteria and in Saccharopolyspora erythrea (Donadio et al. 1991). These (fig. 1). Whereas fungi usually contain monomodular findings have subsequently led to the elucidation of many iterative PKS I, the majority of bacterial PKS I consists of PKS I pathways, in particular those involved in the forma- multiple sets of domains, or modules, that normally corre- tion of promising drug leads (for review, see Staunton and spond to the number of acyl units in the product (Staunton Weissman 2001). In bacteria, the type I PKS pathway is fre- and Weissman 2001, fig. 1). Apart from the clearly defined quently co-occurring with a second type of natural product PKS and FAS types an increasing number of biosynthe- pathway, the synthetases (NRPS, sis pathways are described in the literature that show Shen et al. 2001). Both types of enzymes can form hybrid hitherto unknown organization forms (Moss, Martin, and biosynthesis complexes, and modules of both enzyme clas- Wilkinson 2004). Enzymes involved in the biosynthesis ses can even form hybrid synthetases (Duitman et al. 1999; of x-3-polyunsaturated fatty acids (PUFA) in Shewanella Paitan et al. 1999; Silakowski et al. 1999). are authentic bacterial iterative PKS I (Metz et al. 2001, As striking as the number of PKS gene clusters in fig. 1) as well as enzymes involved in avilamycin (Gaitatzis some bacteria is the irregular distribution of metabolites et al. 2001), neocarzinostatin (Liu et al. 2005), and myxo- and the corresponding genes in single strains and genera chromide (Wenzel et al. 2005) biosynthesis in strepto- in all producing families of bacteria. This has raised the hy- mycetes and myxobacteria. Furthermore, a number of pothesis of a horizontal gene transfer (HGT) between bac- multimodular PKS I pathways are described in the literature terial strains. Recent phylogenetic studies of PKS I were based on the highly conserved KS domains. Kroken et al. (2003) have found evidence that fungal KS domains Key words: secondary metabolites, polyketides, multimodular enzymes, fatty acid synthases, Bayesian analysis. cluster according to the reduced or unreduced character of the polyketide products. Furthermore, KS domains from E-mail: [email protected]. hybrid PKS/NRPS complexes form a distinct branch in Mol. Biol. Evol. 22(10):2027–2039. 2005 doi:10.1093/molbev/msi193 phylogenetic trees (Shen et al. 2001). Piel et al. have shown Advance Access publication June 15, 2005 that KS domains fall into a separate group when distinct

Ó The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 2028 Jenke-Kodama et al. Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021

FIG. 1.—Schematic representation of fatty acid and polyketide biosynthesis. (A) Organization types of FAS and PKS. Distinct proteins are indicated as squares and domains integrated within proteins as circles, respectively. Optional domains of PKS I are designated. Enzymes additionally required for the synthesis of the respective end products are not shown. Example structures are provided next to each scheme. The roman numbers in brackets recur in the phylogenetic tree shown in figure 2. (B) Sequence of reactions performed by FAS and PKS. (C) Possibilities that follow each condensation step to give keto, hydroxyl, enoyl, or alkyl functionality, depending on the enyzmatic activities used by a PKS module. Abbreviations: KS, ketosynthase; AT, acyltransferase; DH, dehydratase; ER, enoyl reductase; KR, ketoreductase; ACP, acyl carrier protein; AcT, acetyltransferase; PPT, phosphopantetheinyl transferase. acyltransferases (so-called trans-ATs, fig. 1) are associated on the diversity and genealogy of all available fungal PKS with PKS I systems that lack internal ATs (Piel et al. 2004). sequences. The authors have concluded that the discontin- Whereas most of the studies were based on a limited set of uous distributions of orthologous PKS among fungal spe- data, Kroken et al. (2003) have presented a systematic study cies can be explained by gene duplication, divergence, and Evolution of Secondary Metabolites in Bacteria 2029 gene loss and that HGT among fungi was not necessarily with known substrate specificities from biochemically char- involved in the evolution process. acterized pathways, (2) AT domains with substrate specif- A systematic study on the evolution of bacterial PKS is icities predicted by the SEARCHPKS program, (3) AT still missing. A high number of genomes of eubacteria has domains manually assigned to a substrate by analysis of been completely sequenced within the last few years (http:// amino acid residues assumed to be involved in substrate rec- www.ncbi.nlm.nih.gov/sutils/genom_table.cgi). Bioinfor- ognition, and (4) AT domains with unclear specificity. RNA matic approaches are now being developed for annotation sequences of the small ribosomal subunits (SSU RNA) were and specific analyses of the genomes. Yadav, Gokhale, and retrieved from the European ribosomal RNA database Mohanty (2003) have developed a platform for the analysis (http://www.psb.ugent.be/rRNA/ssu). of PKS megasynthases that includes almost all current The amino acid sequences of FabH and FabF homolo- knowledge about these types of enzymes and that can be gues and annotated FAS and PKS were retrieved from Gen- applied to dissect the arrangement of domains within these Bank(http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi). enzymes and to assign hypothetical substrate specificities of Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 single domains (http://www.nii.res.in/nrps-pks.html). This Alignment allows a fast analysis of all PKS encoded by the microbial A total of 142 AT domains and 137 KS domains de- genomes that are completely available including those that rived from the complete genome survey were subjected to currently cannot be assigned to a polyketide metabolite. We a phylogenetic analysis. We furthermore included se- have chosen the AT and the KS domains for the phyloge- quences of the DEBS proteins and of PKS I involved in netic study of bacterial PKS to get a conclusive picture of the synthesis of the myxobacterial secondary metabolites the evolutionary and functional relationships between epothilone (Sorangium cellulosum So ce90), stigmatellin, domains from the different bacterial groups. myxalamid (both Sg a15), soraphen The aim of this study is to (1) investigate the evolution (S. cellulosum So ce26), and pyoluteorin (Pseudomonas of bacterial PKS I and to relate it to the complex evolution- fluorescens Pf-5) to increase the data set for the myxobac- ary history of the various types of FAS and PKS in the dif- teria and c-proteobacteria and also to increase the number ferent groups of organisms, (2) systematically screen for the of well-characterized protein sequences and the reliability presence and number of PKS genes in all sequenced bac- of tree reconstruction. terial genomes and to test whether the number of PKS mod- Amino acid alignments were created using ClustalW ules can be related to the genome size, (3) reveal the (Thompson, Higgins, and Gibson 1994) and adjusted man- phylogenetic relation between PKS sequences from the dif- ually using the MacClade program version 4.03 (W. R. ferent groups of bacteria, and thereby (4) assess the impact Maddison and W. P. Maddison 2000). For the adjustment of gene duplications, gene loss, and HGT on the distribu- procedure the secondary structure of selected domains tion of bacterial PKS I. A phylogenetic analysis of a com- were predicted by means of the PSIPRED server (McGuffin, plete set of functionally related domains in all groups of Bryson, and Jones 2000), and the prediction results were organisms can lead to the discovery of new evolutionary compared to the crystal structure of the FabD (Serre et al. links and can help us to relate the evolution of the enzymes 1995) and the FabF proteins (Moche et al. 1999) from with the ecology and physiology of the bacteria. Escherichia coli for the alignment of AT and KS domains, respectively. This was done to ensure correct alignment of Materials and Methods secondary structure elements. The FabD and the FabF pro- Data Retrieval and Domain Analysis teins from several bacterial strains served as outgroups in the analysis of AT and KS domains, respectively. The align- The amino acid sequences of PKS I were retrieved from ments are provided as supplementary material (Supplemen- the National Center for Biotechnology Information micro- tary Material online). bial genome platform (http://www.ncbi.nlm.nih.gov/sutils/ genome_table.cgi). A BlastP search with the expected value Phylogenetic Analyses set to the default value of 10 was performed using the protein sequence of DEBS1 from S. erythrea as the query sequence We used different methods to reconstruct phylogenies against 138 complete eubacterial, 20 complete archaebacte- for the amino acid alignment. For a reconstruction based on rial, and 3 unfinished genomes, namely, from the cyanobac- Bayesian statistics we used the MrBayes program version 3 terial strains Anabaena variabilis, Crocosphaera watsonii, (Huelsenbeck 2000). The Bayesian inference method and Nostoc punctiforme, respectively. The latter three ge- employed the JTT amino acid replacement model (Jones, nomes were included into the analysis to increase the data Taylor, and Thornton 1992) and a gamma distribution to set for cyanobacteria that are known to be a rich source of represent among-site rate heterogeneity (JTT 1 c). A dis- secondary metabolite gene clusters (Bode and Mu¨ller 2005). crete gamma distribution with four categories was assumed All BlastP search results were inspected by eye to exclude to approximate the continuous function. In the case of KS improper sequences from the data collection. The obtained domains and proteins taken from FAS and PKS Metropolis- sequences were subsequently analyzed using the coupled Markov chain Monte Carlo analysis (MCMC) was SEARCHPKS program (Yadav, Gokhale, and Mohanty performed with 1.5 million generations and four indepen- 2003) in order to dissect the domain organization, to assign dent chains. The Markov chain was sampled every 100 gen- the substrate specificities, and to extract the sequences of AT erations. In the case of AT and KS domains MCMC analysis and KS domains. Regarding the substrate specificity AT do- was performed with four million generations and four inde- mains were grouped into four categories: (1) AT domains pendent chains. As before, the Markov chain was sampled 2030 Jenke-Kodama et al. every 100 generations. Convergence was judged by plots of two subtypes of KS involved in fatty acid biosynthesis in maximum likelihood (ML) scores and by using the run sta- the majority of eubacteria, FabH and FabF (fig. 1, I), fall into tistics. The MCMC analysis was assumed to have reached distinct subclades. The latter clade splits up in a subclade the convergence state if all acceptance rates for the moves in including all eubacterial KS of the FabF type and a second the ‘‘cold’’ chain were in the range 10%–70% and if the one comprising the Ka and Kb homologues of iterative acceptance rates for the swaps between chains were also PKS II of actinobacteria (fig. 1, I and V). FabF proteins from in the range 10%–70%. All trees sampled before reaching plastids and mitochondria are located near to their eubacte- the convergence state were discarded, and the remaining rial counterparts, consistent with the prokaryotic origin of trees were used to construct a consensus tree and to calculate these organelles. The corresponding genes were transferred the posterior clade probabilities. to the nucleus during eukaryotic evolution. Another branch In addition, we conducted ML, neighbor-joining (NJ), of this subclade is built up from the mycobacterial FabF and maximum parsimony analysis. Details are given in the homologues KasA and KasB, which are involved in the supplementary material (Supplementary Material online) synthesis of mycolic acids, high molecular weight a-alkyl- Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 together with the corresponding phylogenetic trees. b-hydroxy acids unique to the so-called Corynebacterium- Mycobacterium-Nocardia group (CMN group) within Estimation of the Number of Duplications, Losses, and actinobacteria (Brennan and Nikaido 1995). Bacteria of HGT Events from Phylogenetic Trees the CMN group represent a remarkable exception within the prokaryotes because they use, like fungi and animals, For the assessment of the number of putative gene a multidomain FAS I and not the type II enzymes for the duplications, gene losses, and HGT events from the phylo- de novo synthesis of their long-chain fatty acids (Kikuchi, genetic trees we considered two different types of assump- Rainwater, and Kolattukudy 1992) (fig. 1, II). Remarkably, tions. Firstly, the sequence clusters in the phylogenetic tree mycolic acids synthesis involves both enzyme systems. belonging to the same bacterial group and at least partially to First, the multidomain FAS I produces medium–chain the same organism could have been already present in the length C12 to C16 fatty acids which are then transferred to common ancestor of the respective organisms. Alterna- the type II system. This synthase subsequently elongates tively, they could originate from an originally homologous the fatty acids from the first step into the very long meromy- sequence after speciation. In the first case, gene losses must colic acids, the precursors of mycolic acids (Schweizer and be considered and the organism showing the highest number Hofmann 2004). The mycobacterial FAS I cluster in close of gene copies determines the number of duplications. The proximity of fungal FAS I (fig. 2, II and III). This close calculated value was considered as the minimal number of relationship coincides with the very similar architecture duplication events explaining the distribution of sequences of both multienzymes (fig. 1, II and III). The genome of in the tree. In the second case, the assumption of gene losses Mycobacterium tuberculosis comprises 19 genes of proba- is unnecessary. The maximum sum of duplications was ble eukaryotic origin (Gamieldien, Ptitsyn, and Hide 2002). therefore calculated from the sum of duplications in each The FAS proteins, however, were not regarded as having organism. HGT events were deduced from anomalous evolved by HGT. distribution among bacterial groups and incongruities The following clade contains sequences from eubacte- among the sequences in the phylogenetic trees. The direc- rial iterative PKS I and eubacterial glycolipid synthases tion of potential HGT was inferred by considering which (fig. 1, VI and VII). Photobacterium profundum and Shewa- bacterial groups were outnumbered in the respective clades nella oneidensis are marine bacteria capable of producing x– of the tree. 3 PUFAs such as docosahexaenoic acid (22:6x3, DHA) and eicosapentaenoic acid (20:5x3, EPA). It was shown that in Results and Discussion bacteria PUFAs can be synthesized by an iterative PKS I (Metz et al. 2001; Wallis, Watts, and Browse 2002). Simi- Evolutionary Relationships Between PKS and FAS larly, the lipid moiety of some bacterial glycolipids is pro- Fatty acid synthesis is found ubiquitously across all duced by iteratively acting PKS (Campbell, Cohen, and groups of organisms and, thus, is likely a very ancient bio- Meeks 1997). This group includes the heterocyst glycolipid chemical pathway. Because FAS and PKS use the same core synthases of nitrogen-fixing cyanobacteria. The unifying of enzymatic activities (fig. 1), it is reasonable to assume an characteristic of both multienzymes is a special domain ar- evolutionary connection between these two biosynthesis chitecture comprising up to five consecutive ACP domains systems. To prove this hypothesis a data set was created con- (KS-AT-[ACP]2–5-(KR)). The only exceptions are SgcE and taining a selection of KS protein sequences representing all NcsE, iterative PKS proteins involved in the biosynthesis of classes of FAS and the major types of PKS found in bacteria the enediyne C-1027 (Liu et al. 2002) and neo- and fungi. The Bayesian phylogenetic tree derived from carzinostatin (Liu et al. 2005) in Streptomyces globisporus these sequences reflects the long joint evolution process that and Streptomyces carzinostaticus, respectively (fig. 1, VII). FAS and PKS have passed during species development Regarding its position in the phylogenetic tree, the (fig. 2). Similar topologies were obtained with NJ and par- class of eubacterial iterative PKS I could be the ancestor simony methods (see Supplementary Material online). of the whole range of eubacterial modular PKS I which Archaebacterial KS sequences of the FabH type were chosen are combined in the next clade. One subclade contains as the outgroup. The first two clades of the reconstructed tree sequences from ‘‘normal’’ modular PKS possessing inte- comprise sequences representing the dissociative type II grated cis-AT domains in each module (fig. 1, VIII). Besides FAS/PKS systems found in eubacteria and plants. The sequences from modular enzymes, this subclade includes Evolution of Secondary Metabolites in Bacteria 2031 Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021

FIG. 2.—Phylogeny of KS domains and proteins of FAS and PKS, inferred by Bayesian estimation. Numbers above branches indicate posterior clade probability values. Branch length indicates number of inferred amino acid changes per position. For names of clades and subclades and the roman numbers refer to figure 1. The bars at the margin indicate the enzyme architecture and the mode of operation. the iterative orsellinic acid synthase AviM of the avilomycin iteratively acting module of the modular aureothin pathway pathway (Weitnauer et al. 2001) and the highly similar en- (He and Hertweck 2003) clusters in a neighbor branch zyme NcsB suggested to form the naphtoic acid moiety of together with sequences from modular PKS of streptomy- neocarzinostatin (Liu et al. 2005). These two sequences cetes. The other subclade comprises modular PKS acting to- form a subbranch together with the uncharacterized mono- gether with trans-AT proteins, namely, those from Bacillus modular enzyme Pks4 from Streptomyces avermitilis. The subtilis and from the leinamycin biosynthesis cluster of 2032 Jenke-Kodama et al.

S. atroolivaceus (Tang, Cheng, and Shen 2004) and the ped- Table 1 erin cluster of the Paederus fuscipes symbiont (Piel 2002) Distribution of Modular PKS I Proteins Encoded in (fig. 1, IX). This group was described recently as a distinct Completely Sequenced Genomes of Bacteria phylogenetic lineage among modular PKS (Piel et al. 2004). Number Number of Number Number Range of The iterative PKS I from fungi form a side branch of of Positive of PKS I of Modules/ the eubacterial modular PKS I, i.e., they are clearly more Group Genomes Genomes Proteins Modules Protein closely related to those than to the fungal FAS I (fig. I, Actinobacteria 13 10 98 132 1–5 VII). Interestingly, this group of fungal sequences includes Streptomycetes 2 2 25 55 1–5 the bacterial enzyme MchA that has recently been shown to Mycobacteria 5 5 70 74 1–2 be responsible for the formation of the aliphatic side chains Others 6 3 3 3 1 of myxochromides in Streptomyces aurantiaca (Wenzel Chlamydiae 7 0 — — — et al. 2005). The position of MchA in the tree probably in- Cyanobacteria 11 5 34 35 1–2 Firmicutes 39 2 6 13 1–3 dicates an HGT from fungi. The top of the tree is formed by Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 the FAS I of animals showing a remarkable proximity to Proteobacteria (total) 60 13 37 60 1–6 modular PKS of eubacteria (fig. 1, IV). This is an important a-Proteobacteria 11 3 3 3 1 b-Proteobacteria 8 4 4 4 1 clue that may help to solve the controversially discussed c-Proteobacteria 33 5 11 12 1–2 question of how the fusion type FAS found in fungi and d-Proteobacteria 2 0 — — — animals evolved from the originally distinct proteins. There e-Proteobacteria 5 0 — — — are fundamental biochemical differences between the fun- Spirochaetales 4 0 — — — gal and animal FAS regarding the nature of the termination Others 8 1 2 2 1 reactions, the cofactors used by the ER activity, and differ- Sum 141 30 177 242 — ent types of AT domains (McCarthy and Hardie 1984). Ad- ditionally, the domain organization and the phylogenetic and replaced by nonhomologous enzymes. Thus, one can relationships suggest that independent evolutionary events hypothesize that archaebacteria could not develop PKS sys- may have led to the development of fungal and animal FAS tems because of the missing AT necessary to ‘‘construct’’ systems. Whereas the mycobacterial and fungal FAS I may them. The number of genomes containing PKS I genes var- have evolved by protein fusion from bacterial FAS II sys- ied considerably between the different eubacterial groups. tems, animal FAS I shares a common ancestor with PKS I. Whereas none of the genomes from chlamydia and spiro- Moreover, the tree reconstruction indicates that modular chaetales encoded PKS I, between 5% and 77% of the PKS may be the evolutionary link between the primary iter- genomes assigned to the firmicutes, proteobacteria, cyano- ativity apparent in the type II systems and early type I FAS bacteria, and actinobacteria were found to possess PKS I and PKS and the secondary iterativity of fungal PKS I and genes. An overview about the number and distribution animal FAS I that is also described for an increasing num- of PKS I in completely sequenced bacterial genomes is ber of bacterial PKS. From the data set analyzed in this shown in table 1. The majority of PKS I proteins comprised study, these conclusions can only be drawn for KS do- a single PKS module composed of at least the KS, AT, and mains. Each domain in an individual PKS or FAS might ACP domains. Multimodular PKS I genes were abundant in possess a separate evolutionary history. To generalize the the genomes of S. avermitilis and B. subtilis str. 168. The findings obtained for the KS domains it would be necessary latter strain is further exceptional among the completely to analyze all domain types, interdomain regions, and intron sequenced bacterial genomes as it was found to encode ex- sites in the eukaryotic sequences. clusively PKS modules missing an integrated AT domain Taken together, the comprehensive phylogenetic anal- along with trans-AT proteins. A complete list of the domain ysis of the various types of FAS and PKS from different arrangement of PKS I analyzed in this study is available as organismic groups reveals a joint evolution process of these supplementary information (Supplementary Material on- two important biosynthetic pathways. The PKS systems in line). In the majority of cases, those bacterial strains encod- general and the modular PKS I of bacteria in particular seem ing PKS I were also found to encode NRPS. However, to inhere a central position in this evolutionary interplay. In whereas in actinobacteria these two types of enzyme classes the following sections modular PKS I of bacteria will be are mostly encoded on separate gene clusters, in proteobac- analyzed more deeply using a phylogenomic approach. teria and cyanobacteria hybrid PKS I/NRPS gene clusters were dominant. Distribution of PKS I Among Bacteria From the 13 bacterial genomes possessing three or more PKS I genes, seven can be assigned to the actinobac- We could detect PKS I genes in 27 of the 138 bacterial teria, four to the cyanobacteria, one to Bacillales, and one to genomes completely sequenced at the beginning of this sur- the pseudomonads. These results are not representative of vey and the three unfinished genomes included in our anal- the distribution of PKS I genes in bacteria as several bacte- ysis (for details see Materials and Methods) representing rial species and genera are underrepresented in the current 21% of the total number of genomes. None of the available list of completely sequenced microbial genomes, whereas archaebacterial genomes possess potential PKS sequences. other bacterial groups are overrepresented. In particular, Archaebacteria lack a FabD homologue, though all other no myxobacterial genome sequence is currently available FAS II components could be detected (Pereto, Lopez- in the public database. The results of this survey are never- Garcia, and Moreira 2004). The corresponding AT activity theless in agreement with the number of metabolites that in this lineage presumably has been lost early in evolution have been reported for the individual bacterial groups. Evolution of Secondary Metabolites in Bacteria 2033

Table 2 Names of Species and Modular PKS I Proteins Used in the Analysis Number of Known Secondary Group Organism Proteins Protein Names Metabolites Actinobacteria Corynebacterium diphteriae 1 — None Corynebacterium efficiens YS-314 1 — None Corynebacterium glutamicum 1 — None Mycobacterium bovis AF2122/97 18 PpsA–PpsE, Mas, Pks1–Pks9, Phenolphthiocerol, Pks12, Pks13, Pks17 mycocerosic acid Mycobacterium leprae 9 Mas, Pks1 Mycocerosic acid Mycobacterium tuberculosis CDC1551 16 PpsA–PpsE, Mas, Pks1–Pks9, Phenolphthiocerol, Pks12, Pks13, Pks15, Pks17 mycocerosic acid M. tuberculosis H37Rv 16 PpsA–PpsE, Mas, Pks1–Pks9, Phenolphthiocerol, Pks12, Pks13, Pks15, Pks17 mycocerosic acid Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 Streptomyces avermitilis MA-4680 22 AveA1–AveA4, OlmA1–OlmA7, Avermectin, oligomycin, PteA1–PteA5, Pks1–2, Pks1–3, polyene macrolide Pks3-2, Pks4, Pks5, Nrps7 Streptomyces coelicolor A3(2) 3 — Cyanobacteria Anabaena sp. PCC 7120 4 — None Nostoc punctiforme 22 NosB Nostopeptolide Gloeobacter violaceus 2 — None Crocosphaera watsonii 3 — None Anabaena variabilis 3 — None Firmicutes Clostridium acetobutylicum 1 PksE None a-Proteobacteria Mesorhizobium loti 1 — None Rhodopseudomonas palustris CGA009 1 — None Agrobacterium tumefaciens C58 1 — None b-Proteobacteria Bordetella bronchiseptica RB50 1 — None Bordetella parapertussis 12822 1 — None Ralstonia solanacearum 2 — None Nitrosomonas europaea ATCC 19718 1 — None c-Proteobacteria Escherichia coli CFT073 3 — None Photorhabdus luminescens subsp. laum. TT01 2 — none Yersinia pestis CO92 1 Irp1 Yersiniabactin Y. pestis KIM 1 Irp1 Yersiniabactin Coxiella burnetii RSA 493 1 — None Pseudomonas syringae pv. tomato str. DC3000 3 Irp, 2 others Yersiniabactin d-Proteobacteria Stigmatella aurantiaca 14 StiA–StiJ, MxaB–MxaF Stigmatellin, myxalamid Sorangium cellulosum So ce90 5 EpoA–EpoE Epothilone S. cellulosum So ce26 2 SorA, SorB Soraphen Others Pirellula sp. 1 2 — None

An overview about the compounds that can be related to strain, namely, Bradyrhizobium japonicum lacks PKS I modular PKS I encoded by complete bacterial genomes is genes. Thus, a trend toward the maintenance, duplication, shown in table 2. In S. avermitilis, 16 of the 22 proteins and diversification of PKS I genes in bacterial genomes of can be assigned to known polyketide structures, but only larger size and the absence of those genes from reduced 1 out of 22 proteins encoded by the genome of the cyano- bacterial genomes is indicated. However, there are a number bacterium N. punctiforme can be related to a secondary of medium size genomes (;4 Mbp), in particular those metabolite. It is therefore unknown, how many of the from four mycobacteria that encode a high number (8) genes detected in this survey are really functional and how of PKS I modules. These pathogenic bacteria have reduced many of the corresponding enzymes are only induced under some of their metabolic pathways during coevolution with specific environmental conditions. their host cells while maintaining the secondary metabolite genes (Vissa and Brennan 2001). Part of the mycobacterial PKS I genes are involved in the synthesis of specific cell PKS I and the Genome Size of Bacteria wall lipids that play an essential role in host cell–pathogen We have tested whether there is a correlation between interactions (Brennan and Nikaido 1995). This coincides the genome size of the bacteria and the presence of PKS I with the fact that M. tuberculosis possesses about 250 dis- genes. Figure 3 shows the number of single PKS I modules tinct enzymes involved in lipid metabolism compared to encoded by the individual bacterial genomes in relation to only 50 in E. coli (Cole et al. 1998). In the majority of bac- the genome size. These two values showed a statistically teria the biological function and the putative ecological role significant correlation. Small bacterial genomes with less of polyketides are not well understood. Most of the bacteria than 2 Mbp generally lack these genes. From the eight bac- producing multiple PKS metabolites have been rarely in- terial genomes exceeding a genome size of 7 Mbp only one vestigated in the context of their natural ecosystems, and 2034 Jenke-Kodama et al.

60 S. avermitilis 50

40

30 N. punctiforme 20 M. bovis M. tuberculosis M. leprae 10 B. subtilis M. avium P. syringae S. coelicolor Anabaena PCC7120 0 Number of PKS modules 024 6810 12 Genome size in Mbp

FIG. 3.—Correlation between genome size and the number of PKS I modules encoded by 141 bacterial genome sequences. Filled diamonds represent

genomes missing PKS I genes. Empty characters represent bacterial strains possessing PKS I genes. Actinobacterial strains are shown as diamonds, Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 cyanobacterial strains as triangles, and all other strains as quadrates. Strain names are shown for strains encoding three or more PKS I. Test for a non- parametric Spearman correlation gave the correlation coefficient of r 5 0.476 (95% confidence interval 0.33–0.60) and a P value of P , 0.0001. no final conclusion can be drawn about the percentage of CoA (A5). The same topology in the phylogenetic tree was metabolites exhibiting a true ‘‘biological function.’’ It is obtained after the extraction of those residues from the however striking that most bacteria encoding three or more alignment that form part of the active center of the domains PKS I proteins show complex morphological differentia- (corresponding to residues Q11, Q63, G90, H91, L93, G94, tion pattern, as known from actinobacteria, myxobacteria, R117, S200, H201, N231, Q250, and V255 of E. coli FabD, and heterocyst-forming cyanobacteria (Meeks et al. 2002; data not shown). Thus, the distinct subclusters in the phy- Gehring et al. 2004). logenetic tree do not only reflect a functional specialization Even though a minority of bacterial strains has main- of the AT domains but also the evolutionary relationships tained and expanded the ability to produce PKS I the list of between the domains. It can be assumed that primary AT individual strains includes members from all major bacte- domains of bacterial PKS I activated malonyl-CoA as a sub- rial groups. This raises the question whether there are differ- strate and evolved from the malonyl-CoA activating ances- ences between these groups in acquiring, retaining, and tor protein involved in fatty acid biosynthesis. Gene expanding their PKS stock. We have therefore initiated duplications and subsequent functional specialization to- a phylogenetic analysis of these enzymes. ward novel substrates may have led to the evolution of AT domains clustering in the second clade of the tree Phylogenetic Analysis of AT and KS Domains (A5–A8). The similarity between actinobacterial sequences from the first clade of the tree to those of the second clade of A total of 139 AT domains derived from the complete the tree does not exceed 50%, whereas the actinobacterial genome analysis were subjected to a phylogenetic study (for sequences within both clades show at least 70% similarity. details see Materials and Methods). Furthermore, AT se- Probably, the ‘‘invention’’ of AT domains using substrates quences from four myxobacterial PKS pathways and from different from malonyl-CoA occurred only once in the evo- a pseudomonadal pathway were included as these groups lution of modular PKS systems. of proteobacteria were clearly underrepresented in the Apart from the different substrate specificities, none of complete genome survey considering the high number of the eight subgroups can be related to an obvious functional polyketides that have been described. divergence of the corresponding AT domains. AT domains A Bayesian analysis of bacterial PKS I AT domains were found to cluster independently from the domain com- revealed two major clades (fig. 4). Similar topologies were position of a PKS I module, e.g., the presence or absence of obtained using ML, maximum parsimony, and distance a KR domain. Furthermore, the presence of NRPS modules methods. One distinct clade comprising groups A1–A4 at the donor or acceptor side has no impact on the position contains all AT domains presumably activating malonyl- within the phylogenetic tree. We therefore conclude that the CoA (based on characterization or prediction) and a few subgroups of the tree mostly reflect the evolutionary rela- domains with unpredictable substrate. A second clade con- tionships between the AT domains that are not superim- sists of domains presumably activating methylmalonyl- posed by functional differences. The genealogy of AT CoA or rare substrates (groups A6–A8) and of one group domains within and between the major bacterial groups will of domains that are known or predicted to activate malonyl- be discussed below.

! FIG. 4.—Phylogeny of AT domains of bacterial type I PKS, inferred by Bayesian estimation. Numbers above branches indicate posterior clade probability values. Branch length indicates number of inferred amino acid changes per position. Branches are colored according to their affiliation to a bacterial group as shown in the color code. AT domains predicted to use malonyl-CoA are highlighted green, those predicted to use methylma- lonyl-CoA or rare substrates orange. Tips of the tree give the names of the organisms, proteins (if annotated in the database), module number, and substrate specificities (H, malonyl-CoA; C, methylmalonyl-CoA; MB, methylbutyryl-CoA; X, unclear). Biochemically characterized AT domains are indicated with black dots. AT domains with theoretically predicted substrate specificities are indicated with diamonds. Boxes with names of polyketide compounds relate to subgroups exclusively or predominantly involved in the biosynthesis of that compound. Numbers in the side bar indicate group numbers used in the text. Abbreviations of biochemically characterized PKS I are listed in table 2. Evolution of Secondary Metabolites in Bacteria 2035 Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 2036 Jenke-Kodama et al. Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021

FIG. 5.—Species phylogeny based on the SSU RNA, inferred by the NJ method for bacterial strains that were included in the phylogenetic analysis of PKS I. Numbers above branches indicate bootstrap support values using 1,000 pseudosequence replicates.

The tree reconstruction of KS domains of bacterial analysis of PKS I domains (fig. 5). This tree clearly shows PKS I is provided as supplementary material (Supplemen- the monophyletic origin of strains within the major bacterial tary Material online). The phylogenetic relationships are divisions (cyanobacteria, actinobacteria, and proteobacte- superimposed by two factors, which are related to the func- ria). This tree topology is in agreement with bacterial phy- tional environment of the domains. Firstly, hybrid NRPS/ logenies constructed from other SSU RNA data sets (Woese PKS systems require specialized KS domains capable of 1987) and from translational apparatus proteins (Brochier using the peptidyl substrate of the NRPS donor site (Shen et al. 2002). et al. 2001). This domain type is separated into an own sub- The phylogenetic tree of AT domains (fig. 4) reveals group. Secondly, loading modules contain so-called KSQ a varying impact of gene duplications and potential HGT domains where the essential cysteine at the active site events on the evolution of PKS I in the different groups is replaced by glutamine (Kao et al. 1996). Likewise, this of bacteria. In order to assess the influence of these different domain type is found in an own subgroup regardless of factors more systematically, the number of single-gene their evolutionary origin. duplications and possible HGT events were assessed indi- vidually for the different bacterial groups (for details see Effect of Duplications and Gene Transfer on Materials and Methods). The results based on the phyloge- the Evolution of Bacterial PKS I netic tree of AT domains are summarized in table 3. Similar results were obtained for the phylogenetic tree of KS do- Detection of gene duplication is usually based on the mains (see Supplementary Material online). From this anal- identification of homologous sequences within a genome. ysis bacteria possessing PKS I genes can be classified into In contrast, detection of HGT is much more cumbersome three groups: a first group in which most or all PKS I genes and prone to uncertainty (among others Ragan 2001). stem from common ancestors and have evolved by gene The best way to analyze HGT is to use a combination of duplication events; a second group including bacteria that different methods. In our analysis we used anomalous dis- have acquired PKS I genes secondarily by HGT without tribution of genes, phylogenetic tree incongruities, and further advancement by gene duplications; and a third atypical gene compositions as an indication of HGT. group in which PKS I genes may have evolved by a com- To detect incongruities among the phylogenetic trees bination of HGT and gene duplication events. Actinobac- of AT and KS domains we compared the phylogenetic rela- teria and cyanobacteria fall into the first category. The fact tionships to a bacterial species phylogeny. For this purpose that most or all lineages of PKS I genes have evolved from we reconstructed a phylogenetic tree based on SSU RNA for common ancestors within these bacterial groups does not ex- those bacterial strains that were part of the phylogenetic clude HGT events between single strains of actinobacteria Evolution of Secondary Metabolites in Bacteria 2037

Table 3 mids that are generally accepted to be the result of HGT Estimated Numbers of Gene Duplications, Losses and (Dobrindt et al. 2004). In particular, striking are the positions HGT Events Assessed from the Phylogenetic Tree of of two AT domains from Pseudomonas syringiae (1 and 2, AT Domains (fig. 4) fig. 2) that cluster in close proximity of actinobacterial se- Duplications quences in groups A2 and A7. Both proteins are part of Maximal Minimal the coronofacic acid biosynthesis complex that is involved Number Number Losses HGT in the biosynthesis of the phytotoxin coronatine (Bender, Actinobacteria 0 Alarcon-Chaidez, and Gross 1999). The corresponding nu- Streptomycetes 51 43 4 0 cleotide sequences show a GC content of 68%, a value that is Mycobacteria 21 15 34 0 rather similar to the GC content of streptomycetes but sig- Others 0 0 0 0 nificantly deviating from the average GC content of pseudo- Cyanobacteria 21 18 19 1 monades that is 58%. Thus, in this case an HGT from Proteobacteria (a, b~, c)115streptomycetes to P. syringiae is very likely. A third AT do- Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 Myxobacteria (d~)2318154main detected in the genome of P. syringiae (Irp1) clusters in the direct neighborhood of a gene cluster that corresponds to the Irp11 region from Yersinia pestis that is involved in the and cyanobacteria, respectively. However, these intragene- biosynthesis of the iron chelator yersiniabactin. Irp1 consists ric HGT events were not assessed by the phylogenetic ap- of an NRPS module and a PKS module (Miller et al. 2002). proach used in this study. In cyanobacteria, one sequence As seen for coronatine biosynthesis genes, the correspond- shows clear indications for an HGT event. This sequence is ing nucleotide sequences in P. syringiae and Y. pestis sig- further exceptional among cyanobacteria as it represents the nificantly exceed the average GC content with 65% and only cyanobacterial AT domain predicted to activate meth- 61%, respectively. Thus, the yersiniabactin biosynthesis ylmalonyl-CoA. The sequence clusters between a number gene cluster may originate from a host bacterium with a high of myxobacterial sequences in the subbranch A6. Neverthe- GC content, e.g., an actinobacterium and was transferred to less, cyanobacteria show a much stronger impact of internal Y. pestis and P. syringiae via HGT. Altogether, in the dif- sequence duplications and were therefore classified into the ferent groups of proteobacteria we can find three kinds of first category. The second category of sequences derived evidence for HGT of PKS I genes: anomalous distribution from HGT events without further gene duplications was de- of these genes, incongruities between phylogenetic trees, tected in a few genomes of a-, b-, and c-proteobacteria. The and deviating GC contents of PKS I genes. distribution of these sequences in the phylogenetic tree clearly indicates an ancestry from other bacterial groups Multiple PKS I Gene Clusters as Evolutionary Traits to rather than a common origin. Finally, myxobacteria, which Increase Metabolic Diversity belong to the d-group of proteobacteria, show many dupli- cation events as well as substantial gene import by HGT and In the previous sections we have discussed that mul- thus fall into the third category. tiple duplication events are the basis of the evolution of Even though the irregular distribution of sequences can modular PKS systems. In this context the question arises be taken as a first indication of HGT events, additional ev- to which extent modules are duplicated and whether mod- idence is required to finally prove this theory. We have there- ification of duplicated units occurs by means of recombina- fore analyzed the GC contents of those nucleotide sequences tional exchange or the loss of domains. Such recombination that encode AT domains suspected to be the result of an processes generally seem to play an important role in cre- HGT event. Myxobacterial sequences were found to cluster ating variability within bacterial genomes (Smith 1991). either with actinobacterial or cyanobacterial sequences. When we look at the domain structure of modules be- However, no evidence can be obtained for HGT between longing to the same biosynthesis cluster it becomes clear streptomycetes and myxobacteria, as both bacterial groups that duplication alone is not sufficient to reconstruct their are characterized by high GC contents of around 70%. Cya- formation. As an example, the protein AveA4, which is part nobacterial genomes have lower GC contents usually not of the avermectin biosynthesis cluster of S. avermitilis, exceeding 45%. Nevertheless, neither the myxobacterial comprises three complete PKS modules showing the sequences nor the cyanobacterial sequences clustering in domain structures KS-AT-KR-ACP, KS-AT-ACP, and group A3 of the tree show clear deviations from the average KS-AT-DH-KR-ACP, respectively. When we assume that GC contents that are characteristic for these bacterial groups the protein has evolved by duplications these domain struc- (data not shown). This could be attributed to the ameliora- tures can only be explained by subsequent loss or acquisi- tion process after a successful gene transfer (Lawrence and tion of KR and DH domains. Ochman 1997). A high impact of HGT in myxobacteria More information about the impact of recombination could be related to the saprophytic lifestyle of these bacteria events on the evolution of modular PKS enzymes comes (Bode and Mu¨ller 2003). However, no complete genome se- from the analysis of the corresponding AT and KS domains quence could be included in the phylogenetic analysis. The within the phylogenetic trees. The oligomycin biosynthesis conclusions about the occurrence of HGT in myxobacteria protein OlmA6 from S. avermitilis provides an example for are thus somehow preliminary and will require a more a three-modular PKS I that could have evolved by gene du- careful investigation in the future. plication, only. The modules show the same domain orga- In a-, b-, and c-proteobacteria most of the PKS I nization (KS-AT-KR-ACP), they all use the same substrate, sequences are located either on pathogenicity islands or plas- and both the KS domains and the AT domains cluster very 2038 Jenke-Kodama et al. near to each other in the same subgroup of the respective Acknowledgments phylogenetic trees (see fig. 4 and Supplementary Material online). This is supported by the fact that the respective We thank Prof. T. Bo¨rner (Humboldt University, Ber- DNA sequences are also very similar, even in the interdo- lin) for critical reading of the manuscript and Dr. I. Schmitt main regions, which normally show a relatively high degree (Field Museum, Chicago) for helpful suggestions. This of variability (data not shown). Recombination events work was supported by grants of the German Research could explain the different substrate specificities of AT Foundation (DFG-SPP1152) to E.D. and R.M. domains within single biosynthesis proteins or pathways. Literature Cited As discussed above, malonyl and methylmalonyl-CoA using AT domains show extensive amino acid differences, Bender, C. L., F. Alarcon-Chaidez, and D. C. Gross. 1999. Pseu- and therefore an independent evolution of a single methyl- domonas syringae phytotoxins: mode of action, regulation, and malonyl-CoA activating AT domain is very unlikely. biosynthesis by peptide and polyketide synthetases. Microbiol.

Instead, it can be assumed that in those proteins containing Mol. Biol. Rev. 63:266–292. Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 both types of AT domains the duplication of a KS-AT unit Black, T. A., and C. P. Wolk. 1994. Analysis of a Het- mutation in was followed by an exchange of one of the AT domains by Anabaena sp. strain PCC 7120 implicates a secondary metab- means of recombination. The different examples show that olite in the regulation of heterocyst spacing. J. Bacteriol. 176:2282–2292. intra- and intergenomic recombination may have contrib- Bode, H. B., and R. Mu¨ller. 2003. Possibility of bacterial recruit- uted to the evolution of PKS I in bacteria. To address this ment of genes associated with the biosynthesis of second- question more deeply it would be necessary to analyze the ary metabolites. Plant Physiol. 132:1153–1161. nucleotide sequences of all domain types and interdomain ———. 2005. The impact of bacterial genomics on natural prod- regions within the single bacterial genomes. uct research. Angew. Chem. Int. Ed. Engl. (in press). The generation of a potent biomolecular activity can be Brennan, P. J., and H. Nikaido. 1995. The envelope of mycobac- considered as a rare event in evolution, taking into account teria. Annu. Rev. Biochem. 64:29–63. that such an activity is based on very specific interactions Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002. between molecules (Jones and Firn 1991). Likewise, one Eubacterial phylogeny based on translational apparatus pro- can infer that the process of developing bioactive secondary teins. Trends Genet. 18:1–5. Campbell, E. L., M. F. Cohen, and J. C. Meeks. 1997. A polyke- metabolites requires a considerably long span of time. tide-synthase-like gene is involved in the synthesis of hetero- Firn and Jones (2000) proposed a unifying model for the cyst glycolipids in Nostoc punctiforme strain ATCC 29133. evolution of secondary metabolites. In their model they sug- Arch. Microbiol. 167:251–258. gest that organisms may have selected specific evolutionary Cerda-Olmedo, E. 1994. The genetics of chemical diversity. Crit. traits to increase the chances to develop a compound with Rev. Microbiol. 20:151–160. potent biomolecular activity. The appropriate traits should Cole, S. T., R. Brosch, J. Parkhill et al. (42 co-authors). 1998. enhance the generation and retention of chemical diversity Deciphering the biology of Mycobacterium tuberculosis from and concurrently should reduce the fitness costs. Firn and the complete genome sequence. Nature 393:537–544. Jones propose that by changing a single enzyme component Crosa, J. H., and C. T. Walsh. 2002. Genetics and assembly line in a biosynthesis pathway where each of the enzymes exhib- enzymology of siderophore biosynthesis in bacteria. Micro- biol. Mol. Biol. Rev. 66:223–249. its broad substrate specificity an organism can create diverse Dobrindt, U., B. Hochhut, U. Hentschel, and J. Hacker. 2004. new products. The diversity of modular PKS I products, Genomic islands in pathogenic and environmental micro- however, seems to arise rather from frequent recombination organisms. Nat. Rev. Microbiol. 2:414–424. events between the modules than from evolution toward Donadio, S., M. J. Staver, J. B. McAlpine, S. J. Swanson, and broader substrate specificities. Only modular systems like L. Katz. 1991. Modular organization of genes required for bacterial PKS I provide such an extraordinary platform complex polyketide biosynthesis. Science 252:675–679. for recombination events. This could explain the selective Duitman, E. H., L. W. Hamoen, M. Rembold et al (13 co-authors). advantage for bacteria possessing multiple PKS gene clus- 1999. The mycosubtilin synthetase of Bacillus subtilis ters. At first sight it seems to be a genetic burden and paradox ATCC6633: a multifunctional hybrid between a peptide syn- to keep these extremely large gene clusters in the genome, thetase, an amino transferase, and a fatty acid synthase. Proc. Natl. Acad. Sci. USA 96:13294–13299. some of them not even producing a compound with biolog- Firn, R. D., and C. G. Jones. 2000. The evolution of secondary ical activity. However, the valuable evolutionary advantage metabolism—a unifying model. Mol. Microbiol. 37:989–994. is their effect as a ‘‘gene-saving device’’ (Cerda-Olmedo Gaitatzis, N., A. Hans, R. Mu¨ller, and S. Beyer. 2001. The mtaA 1994) because the organisms have the ability to produce gene of the myxothiazol biosynthetic gene cluster from Stig- a large chemical diversity using a very limited number of matella aurantiaca DW4/3-1 encodes a phosphopantetheinyl different genes. This applies not only for molecules with transferase that activates polyketide synthases and polypeptide an antibiotic activity like in the case of streptomycetes synthetases. J. Biochem. (Tokyo) 129:119–124. but also for compounds that may play a role in cell processes Gamieldien, J., A. Ptitsyn, and W. Hide. 2002. Eukaryotic genes in like signaling and communication. Mycobacterium tuberculosis could have a role in pathogenesis and immunomodulation. Trends Genet. 18:5–8. Gehring, A. M., S. T. Wang, D. B. Kearns, N. Y. Storer, and Supplementary Material R. Losick. 2004. Novel genes that influence development in Streptomyces coelicolor. J. Bacteriol. 186:3570–3577. Supplementary materials are available at Molec- He, J., and C. Hertweck. 2003. Iteration as programmed event dur- ular Biology and Evolution online (http://www.mbe. ing polyketide assembly; molecular analysis of the aureothin oxfordjournals.org/). biosynthesis gene cluster. Chem. Biol. 10:1225–1232. Evolution of Secondary Metabolites in Bacteria 2039

Huelsenbeck, J. P. 2000. MrBayes: Bayesian inference of phylog- Piel, J. 2002. A -peptide synthetase gene clus- eny. Distributed by the author. ter from an uncultured bacterial symbiont of Paederus beetles. Jones, C. G., and R. D. Firn. 1991. On the evolution of plant sec- Proc. Natl. Acad. Sci. USA 99:14002–14007. ondary metabolite chemical diversity. Phil. Trans. R. Soc. Piel, J., D. Hui, N. Fusetani, and S. Matsunaga. 2004. Targeting Lond. B Biol. Sci. 333:273–280. modular polyketide synthases with iteratively acting acyltrans- Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid ferases from metagenomes of uncultured bacterial consortia. generation of mutation data matrices from protein sequences. Environ. Microbiol. 6:921–927. Comput. Appl. Biosci. 8:275–282. Ragan, M. A. 2001. Detection of lateral gene transfer among mi- Kao, C. M., R. Pieper, D. E. Cane, and C. Khosla. 1996. Evidence crobial genomes. Curr. Opin. Genet. Dev. 11:620–626. for two catalytically independent clusters of active sites in Schweizer, E., and J. Hofmann. 2004. Microbial type I fatty acid a functional modular polyketide synthase. Biochemistry synthases (FAS): major players in a network of cellular FAS 35:12363–12368. systems. Microbiol. Mol. Biol. Rev. 68:501–517. Kikuchi, S., D. L. Rainwater, and P. E. Kolattukudy. 1992. Serre, L., E. C. Verbree, Z. Dauter, A. R. Stuitje, and Z. S.

Purification and characterization of an unusually large fatty Derewenda. 1995. The E. coli malonyl-CoA: acyl carrier pro- Downloaded from https://academic.oup.com/mbe/article/22/10/2027/1138202 by guest on 24 September 2021 acid synthase from Mycobacterium tuberculosis var. bovis tein transacylase at 1.5 A resolution. Crystal structire of a FAS BCG. Arch. Biochem. Biophys. 295:318–326. component. J. Biol. Chem. 270:12961–12964. Kroken, S., N. L. Glass, J. W. Taylor, O. C. Yoder, and B. G. Shen, B., L. Du, C. Sanchez, D. J. Edwards, M. Chen, and J. M. Turgeon. 2003. Phylogenomic analysis of type I polyketide Murrell. 2001. The biosynthetic gene cluster for the anticancer synthase genes in pathogenic and saprobic ascomycetes. Proc. drug bleomycin from Streptomyces verticillus ATCC15003 as Natl. Acad. Sci. USA 100:15670–15675. a model for hybrid peptide-polyketide natural product biosyn- Lawrence, J. G., and H. Ochman. 1997. Amelioration of bacterial thesis. J. Ind. Microbiol. Biotechnol. 27:378–385. genomes: rates of change and exchange. J. Mol. Evol. 44: Silakowski, B., H. U. Schairer, H. Ehret et al. (11 co-authors). 383–397. 1999. New lessons for combinatorial biosynthesis from myx- Liu, W., S. D. Christenson, S. Standage, and B. Shen. 2002. obacteria. The myxothiazol biosynthetic gene cluster of Biosynthesis of the enediyne antitumor antibiotic C-1027. Stigmatella aurantiaca DW4/3-1. J. Biol. Chem. 274: Science 297:1170–1173. 37391–37399. Liu, W., K. Nonaka, L. Nie et al. (11 co-authors). 2005. The Smith, G. R. 1991. Conjugational recombination in E. coli: myths neocarzinostatin biosynthetic gene cluster from Streptomyces and mechanisms. Cell 64:19–27. carzinostaticus ATCC 15944 involving two iterative type I pol- Staunton, J., and K. J. Weissman. 2001. Polyketide biosynthesis: yketide synthases. Chem. Biol. 12:293–302. a millennium review. Nat. Prod. Rep. 18:380–416. Maddison, W. R., and W. P. Maddison. 2000. MacClade. Version Tang, G. L., Y. Q. Cheng, and B. Shen. 2004. Leinamycin bio- 4.0. Sinauer Associates, Sunderland, Mass. synthesis revealing unprecedented architectural complexity McCarthy, A. D., and D. G. Hardie. 1984. Fatty acid synthase—an for a hybrid polyketide synthase and nonribosomal peptide example of protein fusion by gene fusion. Trends Biochem. synthetase. Chem. Biol. 11:33–45. Sci. 9:60–63. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. ClustalW: McGuffin, L. J., K. Bryson, and D. T. Jones. 2000. The PSIPRED improving the sensitivity of progressive multiple alignment protein structure prediction server. Bioinformatics 16:404–405. through sequence weighting, position-specific gap penalties Meeks, J. C., E. L. Campbell, M. L. Summers, and F. C. Wong. and weight matrix choice. Nucleic Acids Res. 22:4673–4680. 2002. Cellular differentiation in the cyanobacterium Nostoc Vissa, V. D., and P. J. Brennan. 2001. The genome of Mycobac- punctiforme. Arch. Microbiol. 178:395–403. terium leprae: a minimal mycobacterial gene set. Genome Biol. Metz, J. G., P. Roessler, D. Facciotti et al. (13 co-authors). 2001. 2:REVIEWS1023. Production of polyunsaturated fatty acids by polyketide syn- Wallis, J. G., J. L. Watts, and J. Browse. 2002. Polyunsaturated thases in both prokaryotes and eukaryotes. Science fatty acid synthesis: what will they think of next? Trends Bio- 293:290–293. chem. Sci. 27:467. Miller, D. A., L. Luo, N. Hillson, T. A. Keating, and C. T. Walsh. Weitnauer, G., A. Mu¨hlenweg, A. Trefzer, D. Hoffmeister, R. D. 2002. Yersiniabactin synthetase: a four-protein assembly line Su¨ssmuth, G. Jung, K. Welzel, A. Vente, U. Girreser, and A. producing the nonribosomal peptide/polyketide hybrid sidero- Bechthold. 2001. Biosynthesis of the orthosomycin antibiotic phore of Yersinia pestis. Chem. Biol. 9:333–344. avilamycin A: deductions from the molecular analysis of the Moche, M., G. Schneider, P. Edwards, K. Dehesh, and Y. avi biosynthetic gene cluster of Streptomyces viridochromo- Lindqvist. 1999. Structure of the complex between the anti- genes Tu57 and production of new antibiotics. Chem. Biol. biotic cerulenin and its target, beta-ketoacyl-acyl carrier pro- 8:569–581. tein synthase. J. Biol. Chem. 274:6031–6034. Wenzel, S. C., B. Kunze, G. Hofle, B. Silakowski, M. Scharfe, Moss, S. J., C. J. Martin, and B. Wilkinson. 2004. Loss of co- H. Blocker, and R. Mu¨ller. 2005. Structure and biosynthesis linearity by modular polyketide synthases: a mechanism for of myxochromides S1-3 in Stigmatella aurantiaca: evidence the evolution of chemical diversity. Nat. Prod. Rep. 21: for an iterative bacterial type I polyketide synthase and for 575–593. module skipping in nonribosomal peptide biosynthesis. Chem- Ohnishi, Y., S. Kameyama, H. Onaka, and S. Horinouchi. 1999. biochem 6:375–385. The A-factor regulatory cascade leading to streptomycin bio- Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51: synthesis in Streptomyces griseus: identification of a target 221–271. gene of the A-factor receptor. Mol. Microbiol. 34:102–111. Yadav, G., R. S. Gokhale, and D. Mohanty. 2003. SEARCHPKS: Paitan, Y., G. Alon, E. Orr, E. Z. Ron, and E. Rosenberg. 1999. a program for detection and analysis of polyketide synthase The first gene in the biosynthesis of the polyketide antibiotic domains. Nucleic Acids Res. 31:3654–3658. TA of Myxococcus xanthus codes for a unique PKS module coupled to a peptide synthetase. J. Mol. Biol. 286:465–474. William Martin, Associate Editor Pereto, J., P. Lopez-Garcia, and D. Moreira. 2004. Ancestral lipid biosynthesis and early membrane evolution. Trends Biochem. Sci. 29:469–477. Accepted June 10, 2005