Prediction of Function for the Polyprenyl Transferase Subgroup in the Isoprenoid Synthase Superfamily
Total Page:16
File Type:pdf, Size:1020Kb
Prediction of function for the polyprenyl transferase subgroup in the isoprenoid synthase superfamily Frank H. Wallrappa,b, Jian-Jung Panc, Gurusankar Ramamoorthyc, Daniel E. Almonacidb,d, Brandan S. Hilleriche, Ronald Seidele, Yury Patskovskye, Patricia C. Babbittb,d, Steven C. Almoe, Matthew P. Jacobsona,b,1, and C. Dale Poulterc,1 aDepartment of Pharmaceutical Chemistry, School of Pharmacy and bCalifornia Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158; cDepartment of Chemistry, University of Utah, Salt Lake City, UT 84112; dDepartment of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158-2330; and eDepartment of Biochemistry, Albert Einstein College of Medicine, Bronx, NY 10461 Contributed by C. Dale Poulter, January 23, 2013 (sent for review November 20, 2012) The number of available protein sequences has increased expo- dominant products under nonforced conditions. As revealed by nentially with the advent of high-throughput genomic sequencing, a variety of high-quality crystal structures, the enzymes typically creating a significant challenge for functional annotation. Here, are homodimers (5–7). In each monomer, the allylic binding we describe a large-scale study on assigning function to unknown region (S1) in the active site contains three Mg2+ ions ligated trans members of the -polyprenyl transferase (E-PTS) subgroup in by two pairs of aspartates from both Asp-rich regions, which in the isoprenoid synthase superfamily, which provides substrates turn holds the diphosphate of the allylic substrate in place, as for the biosynthesis of the more than 55,000 isoprenoid metabo- illustrated with DMAPP in Fig. 1. The hydrocarbon tail of DMAPP lites. Although the mechanism for determining the product chain length for these enzymes is known, there is no simple relation- extends into a pocket that accommodates the growing poly- ship between function and primary sequence, so that assigning isoprenoid chain. The IPP binding region (S2) positions C4 of function is challenging. We addressed this challenge through IPP near C1 of the allylic substrate. The isoprenoid moieties of large-scale bioinformatics analysis of >5,000 putative polyprenyl the substrates are joined by a dissociative electrophilic alkylation transferases; experimental characterization of the chain-length initiated by cleavage of the carbon–oxygen bond of the allylic specificity of 79 diverse members of this group; determination of diphosphate to form a carbocation, which then attacks the C3– 27 structures of 19 of these enzymes, including seven cocrystal- C4 double bond of allylic IPP to form a tertiary carbocationic lized with substrate analogs or products; and the development intermediate that loses a proton from C2 to give the allylic and successful application of a computational approach to predict product (8). The sequential addition of IPP to the growing chain function that leverages available structural data through homol- proceeds through release of diphosphate from S1, rearrange- ogy modeling and docking of possible products into the active site. ment of the product from S2 to S1, and binding of another IPP The crystallographic structures and computational structural models A – in S2, followed by the same reaction as before (Fig. 1 ). The of the enzyme ligand complexes elucidate the structural basis of fl specificity. As a result of this study, the percentage of E-PTS polyisoprenoid chain grows into the elongation cavity anked by sequences similar to functionally annotated ones (BLAST e-value helices D, F, G, and H of the protein near the dimer interface ≤ 1e−70) increased from 40.6 to 68.8%, and the percentage of (Fig. 1 B and C) (7, 9). Following this elongation mechanism, sequences similar to available crystal structures increased from these enzymes are named either according to their final product 28.9 to 47.4%. The high accuracy of our blind prediction of newly (e.g., farnesyl diphosphate synthase) or according to the longest characterized enzymes indicates the potential to predict function to the complete polyprenyl transferase subgroup of the isoprenoid Significance synthase superfamily computationally. This paper reports a large-scale collaborative study of an ap- chain-elongation | prenyltransferase proach for predicting the function of chain elongation prenyl- transferases from genetic data. A diverse set of genes for he five-carbon molecules isopentenyl diphosphate (IPP) and enzymes in the isoprenoid synthase superfamily was identified Tdimethylallyl diphosphate (DMAPP) are the fundamental for cloning, expression, X-ray structural analysis, and prediction building blocks for isoprenoid compounds. Beginning with of function by docking to homology models. Blind predictions, fi DMAPP, a series of polyprenyl diphosphates with C10 (geranyl later veri ed biochemically, were accurate to within one isoprene diphosphate, GPP), C15 (farnesyl diphosphate, FPP), C20 (ger- unit for all but a few of the 74 enzymes studied, an extraordinarily anylgeranyl diphosphate, GGPP), C25 (farnesylgeranyl diphosphate, high level of prediction given that the enzymes often give FGPP), and higher molecular weight isoprenoid chains are syn- products whose chain lengths vary by one isoprene unit. thesized by polyprenyl transferases (PTSs). With only a few exceptions, PTSs provide substrates for all but a few branch point Author contributions: F.H.W., J.-J.P., G.R., D.E.A., P.C.B., S.C.A., M.P.J., and C.D.P. designed research; F.H.W., J.-J.P., G.R., D.E.A., B.S.H., R.S., and Y.P. performed research; F.H.W., enzymes for the biosynthesis of the more than 55,000 known J.-J.P., G.R., D.E.A., B.S.H., R.S., Y.P., M.P.J., and C.D.P. analyzed data; and F.H.W., J.-J.P., isoprenoid metabolites, including monoterpenes, sesquiterpenes, G.R., D.E.A., M.P.J., and C.D.P. wrote the paper. diterpenes, sterols, carotenoids, ubiquinones, and prenylated pro- Conflict of interest statement: M.J.P. is a consultant for Schrodinger LLC, a company teins and peptides fulfilling essential roles in cells (Fig. S1)(1,2). whose software was used for a part of the study. There are two distinct classes of PTSs, E-PTS forming trans Freely available online through the PNAS open access option. bonds and Z-PTS forming cis bonds throughout chain elongation. Data deposition: Crystallography, atomic coordinates, and structure factors reported in The carbon skeletons of the great majority of isoprenoid metabo- this paper have been deposited in the Protein Data Bank, www.pdb.org (PDB ID codes E 3LOM, 3LVS, 3MZV, 3NF2, 3OYR, 3PDE; 3PKO; 3P8L, 3P8R, 3P41, 3QQV, 3Q1O, 3Q2Q, 3RMG, lites are derived from products of -PTSs, which share a common 3TS7, 3UCA, 4FP4, 4F62, 4DHD). protein fold and two functionally important Asp-rich (DDXXD) 1To whom correspondence may be addressed. E-mail: [email protected] or motifs (1, 3, 4). E-PTSs synthesize linear allylic diphosphates [email protected]. ranging from C10 to C50, where functional assignment of these This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. enzymes is linked to the chain length of their respective pre- 1073/pnas.1300632110/-/DCSupplemental. E1196–E1202 | PNAS | Published online March 14, 2013 www.pnas.org/cgi/doi/10.1073/pnas.1300632110 Downloaded by guest on September 27, 2021 PNAS PLUS Fig. 1. Crystal structure of GGPP synthase (PDB ID 1RQI). (A) Active site S1 with DMAPP, Mg2+ ions, and Asp-rich motifs and active site S2 with IPP are highlighted. The electrophilic attack of the C1 atom of DMAPP against the double bond of IPP after cleavage of diphosphate is indicated by the black arrow. (B and C) Side view (B) and top view (C) of the bioactive dimer with the active site and elongation cavity displayed. Helices D–H are identified by capital letters. ligand being transferred to the final product (in this case, we determined 27 crystal structures and used these structures (in geranyl transferase). addition to available structures in the PDB) to create comparative Considerable effort has been made to investigate how chain models for 61 PTS enzymes for which structural information is elongation is terminated, establishing that steric hindrance of the lacking. Structures and models then were used to make predictions growing chain in the elongation cavity is the main factor (1, 3, 9, of chain-length specificity, using a ligand-docking method that 10). However, because the number of predicted sequences in the evaluates the steric complementarity between various polyprenyl superfamily is so large, structural and enzymatic studies can be products and the elongation cavity. This structure-based approach performed on only a small fraction of the sequences with likely to predicting function is validated through blind predictions on E-PTS activity. We identified 5,839 such sequences at the initi- 74 PTS enzymes as well as on a subsequently obtained crystal ation of this study in May 2011 (see below). At that time, only 46 structure in complex with a long-chain polyprenyl product. individual sequences from 35 UniProt entries were functionally annotated based on published biochemical essays and thus avail- Results able in the Gene Ontology Annotation (UniProt-GOA) database Sequence Analysis and Clustering. In May 2011 we identified a group (11), and 61 had been structurally characterized with crystal of 5,839 sequences from the National Center for Biotechnology structures available in the Protein Data Bank (PDB) database Information (NCBI) protein database that, based on sequence (12). Moreover, the rate of genome sequencing continues to and structural information, may have E-PTS activity. These BIOCHEMISTRY increase exponentially, and even with on-going advances in high- sequences, spanning all domains of life, were stored in the throughput structural biology and in vitro screening methods, the Structure-Function Linkage Database (SFLD) (15) as the poly- gap between the number of known sequences and the number prenyl transferase-like subgroup.