Simultaneous Prediction of Protein Folding and Docking at High Resolution
Total Page:16
File Type:pdf, Size:1020Kb
Simultaneous prediction of protein folding and docking at high resolution Rhiju Dasa,1,2, Ingemar Andre´ a,1, Yang Shenb, Yibing Wuc, Alexander Lemakd, Sonal Bansale, Cheryl H. Arrowsmithd, Thomas Szyperskic, and David Bakera,3 aDepartment of Biochemistry, University of Washington, Seattle WA 98195; bLaboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892; cDepartments of Chemistry and Structural Biology and Northeast Structural Genomics Consortium, State University of New York, Buffalo, NY 14260; dOntario Cancer Institute, Department of Medical Biophysics, and Northeast Structural Genomics Consortium, University of Toronto, Toronto, ON, Canada M5G IL5; and eDepartment of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Saint Louis, MO 63108 Edited by Ken A. Dill, University of California, San Francisco, CA, and approved August 7, 2009 (received for review April 21, 2009) Interleaved dimers and higher order symmetric oligomers are ubiq- number of degrees of freedom that need to be sampled. The new uitous in biology but present a challenge to de novo structure approach is based on the Rosetta framework for molecular mod- prediction methodology: The structure adopted by a monomer can be eling (10). stabilized largely by interactions with other monomers and hence not the lowest energy state of a single chain. Building on the Rosetta Results and Discussion framework, we present a general method to simultaneously model Overview of Rosetta ‘‘Fold-and-Dock’’ Protocol. For each molecular the folding and docking of multiple-chain interleaved homo-oli- complex, we start from fully extended monomer chains in randomly gomers. For more than a third of the cases in a benchmark set of generated symmetric rigid body arrangements (Fig. 1). The first of interleaved homo-oligomers, the method generates near-native mod- two stages of automated de novo modeling alternates sets of els of large ␣-helical bundles, interlocking  sandwiches, and inter- fragment insertion moves that perturb the backbone conformation leaved ␣/ motifs with an accuracy high enough for molecular of protein monomers (11) with moves that perturb the symmetric replacement based phasing. With the incorporation of NMR chemical docking arrangement of the monomers (12). The conformational shift information, accurate models can be obtained consistently for energy is determined by the low-resolution energy function previ- symmetric complexes with as many as 192 total amino acids; a blind ously developed for monomer protein structure prediction (6, 11) with coarse-grained terms for backbone hydrogen bonds and prediction was within 1 Å rmsd of the traditionally determined NMR hydrophobic interactions. Cyclic or dihedral symmetry is main- structure, and fit independently collected RDC data equally well. tained by cloning the moves to separate monomers, and each move Together, these results show that the Rosetta ‘‘fold-and-dock’’ pro- is accepted or rejected based on the standard Metropolis criterion tocol can produce models of homo-oligomeric complexes with near- (12). In the second, high-resolution refinement stage, side chains atomic-level accuracy and should be useful for crystallographic phas- are built into the backbone conformations in all-atom detail, and ing and the rapid determination of the structures of multimers with small, symmetric perturbations are tested in the context of a limited NMR information. reasonably accurate, high-resolution energy function that includes van der Waals interactions, the costs of desolvating atoms, and homo-oligomers ͉ molecular replacement ͉ NMR structure inference ͉ hydrogen bonds (Fig. 1) (10, 13). The lowest energy models are then protein structure prediction ͉ symmetry clustered, and the five most populous clusters are compared with the experimentally determined structures. he majority of expressed proteins function within symmetrical We first tested the fold-and-dock protocol on a benchmark set of Thomomeric complexes (1–3). Although a boon for evolving 27 protein complexes with structures previously determined by functional diversity (4), this ubiquity of oligomeric structures poses high-resolution crystallography. Most of these molecules form numerous challenges for modern structural biology. The phasing of intertwined structures in which the number of intermolecular crystallographic data by molecular replacement and NMR struc- contacts approaches one third of the number of intramolecular tural inference are complicated by the increasing number of degrees contacts (Table 1). We have assumed that the stoichiometry of the of freedom and spectral degeneracy, respectively, in multimeric crystallized complex is known (as can be rapidly ascertained systems. The ability to predict de novo the structures of multiple experimentally by, for example, analytical ultracentrifugation), but tested all possible symmetries available to a fixed number of interacting molecular chains would potentially alleviate these prob- monomers. For example, tetramer complexes were modeled with lems, allowing unphased diffraction measurements or ambiguously both D2 and C4 symmetries. assigned NMR spectra to be resurrected as constraints or as independent validation for in silico structural inference. Benchmark of Homo-Oligomeric Proteins Solved by X-Ray Crystallog- Accurate modeling of previously unseen homomeric structures raphy. Initial tests were carried out on coiled coil structures with has not been demonstrated at high resolution. Significant progress simple cyclic geometries as well as more complex dihedral symme- has occurred in modeling the folds of individual monomeric soluble proteins (5–7) and the docking arrangements of predefined mono- mers (8, 9). However, as proteins interact with other proteins, Author contributions: R.D., I.A., and D.B. designed research; R.D., I.A., Y.S., Y.W., A.L., and nucleic acids, or smaller molecules, their lowest free-energy back- S.B. performed research; R.D., I.A., Y.S., Y.W., A.L., S.B., C.H.A., T.S., and D.B. analyzed data; bone conformations typically shift in response to their partners. and R.D., I.A., and D.B. wrote the paper. These often dramatic structural changes have been a persistent and The authors declare no conflict of interest. unsolved issue in blind prediction trials of recent years (8). This article is a PNAS Direct Submission. In this report, we show how two different methods developed for Freely available online through the PNAS open access option. de novo conformational sampling (for folding and for docking) can 1R.D. and I.A. contributed equally to this work. be melded into a more general procedure that permits the blind 2Present address: Departments of Biochemistry and Physics, Stanford University, Stanford, CA prediction of intertwined complexes of proteins at near-atomic 94035. resolution. Some of the test systems involve up to several hundred 3To whom correspondence should be addressed. E-mail: [email protected]. residues, much larger than previous targets of high-resolution de This article contains supporting information online at www.pnas.org/cgi/content/full/ novo modeling; symmetry plays a crucial role in reducing the 0904407106/DCSupplemental. 18978–18983 ͉ PNAS ͉ November 10, 2009 ͉ vol. 106 ͉ no. 45 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0904407106 Downloaded by guest on September 25, 2021 Fig. 1. Overview of the Rosetta fold-and-dock proto- col. Starting with extended protein chains, the protein complex is assembled through alternating cycles of frag- ment insertions, which change the internal coordinates of the monomer, and rigid body perturbations while using a low-resolution representation of the protein. The assembled complexes are then subjected to symmetric all-atom refinement. tries. These molecules include biologically important complexes the sampled models is beyond the 2- to 3-Å radius of convergence such the trimeric coiled coil region of coronin-1 (C3; Fig. 2B) and required to robustly discriminate native-like conformations (8 out engineered helical proteins with dihedral symmetry (D2; Fig. 2K). of 16 do not sample a single structure better than 3 Å) (Table 1; Fig. In each of these cases, at least one of the five most populous S1) (5, 10). Systems with the largest number of degrees of freedom clusters produced by the symmetric folding and docking of multiple were the most difficult; our protocol failed in all eight cases in which chains achieves a backbone accuracy approaching2Åorbetter (C␣ the number of monomer residues exceeded 60. As with the separate rmsd; Table 1). Also, in each of these cases, the fine details of problems of monomer structure modeling and docking of prestruc- hydrophobic side-chain packing are reproduced with near-atomic tured monomers, conformational sampling appears to be the major accuracy (Fig. 2 A–D). As a further, blind test of the method, we bottleneck in the simultaneous modeling of the folding and docking predicted the structure of a Golgi-associated coiled coil region (14) of symmetric complexes. whose atomic coordinates were released after our modeling trials. The crystal structure and the single Rosetta dimer prediction agree Estimating Phases for Molecular Replacement from High-Resolution withaC␣ rmsd of 0.94 Å (Fig. 3A). These results demonstrate that Models. The benchmark tests described above suggest that predict- the basic principles underlying Crick’s ‘‘knobs-into-holes’’-based ing the structure of multimeric