Global Analysis of Adenylate-Forming Enzymes Reveals Β-Lactone Biosynthesis Pathway in Pathogenic Nocardia
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/856955; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Global analysis of adenylate-forming enzymes reveals β-lactone biosynthesis pathway in pathogenic Nocardia Serina L. Robinson,1,2,3* Barbara R. Terlouw,4 Megan D. Smith,1,3 Sacha J. Pidot,5 Tim P. Stinear,5 Marnix H. Medema,4 and Lawrence P. Wackett1,2,3 1BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, Saint Paul, MN, 55108, USA 2Graduate Program in Bioinformatics and Computational Biology, University of Minnesota 111 S. Broadway, Suite 300, Rochester, MN, 55904, USA 3Graduate Program in Microbiology, Immunology, and Cancer Biology, University of Minnesota, 689 23rd Ave SE, Minneapolis, MN, 55455, USA 4Bioinformatics Group, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands 5Department of Microbiology and Immunology at the Doherty Institute, University of Melbourne, 792 Elizabeth St, Melbourne, VIC, 3000 Australia *Corresponding author: Serina Robinson E-mail: [email protected] Running title: Global analysis of adenylate-forming enzymes Keywords: natural product biosynthesis, acetyl-CoA synthetase, enzyme catalysis, bioinformatics, machine learning, coenzyme A, adenylate-forming enzymes, Nocardia, β-lactone synthetases, substrate specificity ABSTRACT based predictive tool and used it to comprehensively map the biochemical diversity Enzymes that cleave ATP to activate carboxylic of adenylate-forming enzymes across >50,000 acids play essential roles in primary and secondary candidate biosynthetic gene clusters in bacterial, metabolism in all domains of life. Class I adenylate- fungal, and plant genomes. Ancestral enzyme forming enzymes share a conserved structural fold reconstruction and sequence similarity but act on a wide range of substrates to catalyze networking revealed a ‘hub’ topology suggesting reactions involved in bioluminescence, radial divergence of the adenylate-forming nonribosomal peptide biosynthesis, fatty acid superfamily from a core enzyme scaffold most activation, and β-lactone formation. Despite their related to contemporary aryl-CoA ligases. Our metabolic importance, the substrates and catalytic classifier also predicted β-lactone synthetases in functions of the vast majority of adenylate-forming novel biosynthetic gene clusters conserved across enzymes are unknown without tools available to >90 different strains of Nocardia. To test our accurately predict them. Given the crucial roles of computational predictions, we purified a adenylate-forming enzymes in biosynthesis, this candidate β-lactone synthetase from Nocardia also severely limits our ability to predict natural brasiliensis and reconstituted the biosynthetic product structures from biosynthetic gene clusters. pathway in vitro to link the gene cluster to the β- Here we used machine learning to predict lactone natural product, nocardiolactone. We adenylate-forming enzyme function and substrate anticipate our machine learning approach will aid specificity from protein sequence. We built a web- in functional classification of enzymes and advance natural product discovery. 1 bioRxiv preprint doi: https://doi.org/10.1101/856955; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. INTRODUCTION longer supported or available, showed that this class of enzymes is amenable to computational Adenylation is a widespread and essential reaction predictions of substrate and function (10). in nature to transform inert carboxylic acid groups Previously, bioinformatics tools were also into high energy acyl-AMP intermediates. Class I developed to predict substrates for NRPS adenylate-forming enzymes catalyze reactions for adenylation (A) domains (11, 12). Genome natural product biosynthesis, firefly mining approaches using NRPS A domain bioluminescence, and the activation of fatty acids prediction tools have proved useful to access the with coenzyme A (1). The conversion of acetate to biosynthetic potential of unculturable organisms acetyl-CoA by a partially purified acetyl-CoA and link ‘orphan’ natural products with their ligase was first described by Lipmann in 1944 (2). biosynthetic gene clusters. For example, NRPS A Since then, enzymes with this conserved structural domain predictions guided the discovery of the fold have been found to activate over 200 different biosynthetic machinery for the leinamycin family substrates including aromatic, aliphatic, and amino of natural products (13) and enabled sequence- acids. To encompass the major functional enzyme based structure prediction of several novel classes, the term ‘ANL’ superfamily was proposed lipopeptides by Zhao et al. (14). However, Zhao based on the Acyl-CoA ligases, Nonribosomal et al. reported limitations in existing tools in that peptide synthetases (NRPS), and Luciferases (3). they could not predict the chain length of lipid tails incorporated into lipopeptides. Lipid tails are Most ANL superfamily enzymes catalyze two-step prevalent in natural products and are incorporated reactions: adenylation followed by by fatty acyl-AMP or acyl-CoA ligase enzymes, thioesterification. During the thioesterification both in the ANL superfamily. These enzymes are step, ANL enzymes undergo a dramatic among the most well-studied subclasses of ANL conformational change involving a 140° domain enzymes. For less-studied subclasses, enzyme rotation of the C-terminal domain (3). Thioester functions and substrates are even more bond formation results from nucleophilic attack, challenging to predict. Hence, a computational typically by a phosphopantetheine thiol group. A tool encompassing all classes of ANL enzymes notable exception to phosphopantetheine is the use would constitute a major step towards more of molecular oxygen by firefly luciferase to convert accurate structural prediction of natural product D-luciferin to a light-emitting oxidized intermediate scaffolds. (3). Other interesting exceptions include functionally-divergent ANL enzymes within the Here, we used machine learning to develop a same pathway that catalyze the adenylation and predictive platform for ANL superfamily thioesterification partial reactions separately (4, 5). enzymes and map their substrate-and-function The ANL superfamily has recently expanded to landscape across 50,064 candidate biosynthetic include several new classes of enzymes including gene clusters in bacterial, fungal, and plant the fatty-acyl AMP ligases (6), aryl polyene genomes. We detected candidate β-lactone adenylation enzymes (7), and β-lactone synthetases synthetases in uncharacterized biosynthetic gene (8). Strikingly, an ANL enzyme in cremeomycin clusters from pathogenic Nocardia spp. and biosynthesis was shown to use nitrite to catalyze experimentally validated the gene cluster in vitro late stage N-N bond formation in diazo-containing to link it to the orphan β-lactone compound, natural products (9). The discovery of such novel nocardiolactone. Overall, this research provides a enzymatic reactions nearly 75 years after the proof-of-principle towards the use of machine Lipmann’s initial report suggests the ANL learning for classification of enzyme substrates to superfamily still has unexplored catalytic potential, guide natural product discovery. particularly in specialized metabolism. No computational tools currently exist for prediction and functional classification of ANL enzymes at the superfamily level. Still, the development of one previous platform, which is no 2 bioRxiv preprint doi: https://doi.org/10.1101/856955; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. RESULTS Previously, 34 amino acids within 8 Å of the Machine learning can accurately predict ANL gramicidin synthetase active site were shown enzyme function and substrate specificity to be critical for accurate prediction of NRPS A domain substrate specificity (11). Due to the high A global analysis of protein family domains level of structural conservation between NRPS A revealed that ANL superfamily enzymes (PF00501; domains and other proteins in the superfamily, we AMP-binding domains) are the third most abundant hypothesized these 34 active site residues would domain in known natural product biosynthetic also be important features for ANL substrate pathways (Table S1). Despite the essential and prediction. Using an AMP-binding profile varied roles of ANL enzymes, there is no database Hidden Markov Model (pHMM), we extracted 34 that catalogs their biosynthetic diversity. Therefore, active site residues from our >1,700 training set we mined the literature, MIBiG (15), and sequences. Further inspection of the superfamily- UniProtKB (16) for ANL enzymes with known wide pHMM alignment revealed the presence of substrate specificities. We then constructed a a fatty acyl-AMP ligase-specific insertion (FSI) training set of >1,700 ANL protein sequences of 20 amino acids not present in other paired with their functional class, substrate(s), superfamily members (Fig. S1). The FSI has been kinetic data, and crystal structures if solved. As suggested to inhibit the 140° domain rotation of reported previously by Gulick